wiki:BIOS_ReferenceFiles/MetaExonAnnotation_documentation

Version 1 (modified by jamverlouw, 8 years ago) (diff)

--

Meta-exon annotation

To create the meta-exon annotation the following steps were taken:

  1. The exon annotation from Ensembl Biomart v.71 was downloaded. The file contained the following columns:
    chromosome, exon start, exon end, Ensembl exon id, Ensembl gene id, gene name, strand.
  2. All additional contigs (GL*, LRG* etc) were removed, so that only ordinary chromosomes (1-22, X, Y, MT) remained. This was done by a custom script cutStrangeChr.py (see attachment).
  3. The Biomart file was converted to bed format and sorted by start coordinate:
  4. Exons were merged using mergeBed tools from BEDTools suite:
  5. The resulting file was converted to gtf format, retaining the strand information by a custom script mergedBed_to_gtf.py (see attachment).

The final commands to generate the meta-exon annotation were the following:

./cutStrangeChr.py biomart_export.txt | awk 'BEGIN {FS="\t"}; {OFS="\t"}; {if ($7 == "-1") $7 = "-"; else $7 = "+"}; {print $1, $2 - 1, $3, $4 ":" $5 ":" $6, ".", $7}' | sort -k1,1n -k2,2n | mergeBed -nms -d -1 -i stdin > biomart_export.merged.tmp

./mergedBed_to_gtf.py biomart_export.merged.tmp biomart_export.txt | sort -k1,1n -k4,4n > meta-exons_v71_cut_sorted_18-04-14.gtf