Changes between Initial Version and Version 1 of BIOS_ReferenceFiles


Ignore:
Timestamp:
Sep 19, 2016 4:46:54 PM (8 years ago)
Author:
jamverlouw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BIOS_ReferenceFiles

    v1 v1  
     1= Reference and annotation =
     2
     3== File naming policy ==
     4Inlcude md5sum files by running  '$md5sum [filename] > [filename].md5sum'.
     5
     6== Annotation source: Ensembl v.71 ==
     7
     8We use Ensembl as our primary source of annotation and fix it at v.71 ([http://apr2013.archive.ensembl.org/biomart/martview link to archived v.71 BioMart]). To get this version of Ensemble in the R package biomaRt:\\
     9ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL", host = "apr2013.archive.ensembl.org", path = "/biomart/martservice" , dataset = "hsapiens_gene_ensembl")
     10G=getBM(c("chromosome_name", "start_position","end_position","ensembl_gene_id"), mart=ensembl)
     11
     12== This table describes the reference and annotation files. ==
     13
     14||=Name=||=Location=||=Contact=||=Notes||
     15||Transcript GTF||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/Homo_sapiens.GRCh37.71.cut.sorted.gtf.gz||!dasha.zhernakova@gmail.com||1||
     16||Meta Exon GTF||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/meta-exons_v71_cut_sorted_05-06-13.gtf.gz||!dasha.zhernakova@gmail.com||2||
     17||Masked genome||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/maskedGenome/||!dasha.zhernakova@gmail.com||3||
     18||STAR index||!srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/bbmri.nl/RP3/dzhernakova/maskedGenome/STARindex/||!dasha.zhernakova@gmail.com||4||
     19
     20== Reference and annotation files description. ==
     21
     22'''1. Transcript annotation.'''[[BR]]
     23
     24To create this transcript annotation the human gtf annotation was downloaded from Ensembl v.71 (containing Gencode v.16: ftp://ftp.ensembl.org/pub/release-71/gtf/homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz). Only genes on chromosomes for 1-22, X, Y, MT were retained. Then for each chromosome genes were sorted by their start position.
     25
     26'''2. Meta-exon annotation.'''[[BR]]
     27
     28To create the meta-exon annotation we merged all overlapping exons from Ensembl version 71 (see Transcript annotation section) using mergeBed tool from BEDTools suite. Overlapping exons belonging to different genes or different strands were also merged into one meta-exon. See [wiki:FgReferenceFiles/MetaExonAnnotation_documentation Meta-exon annotation documentation] for a detailed description on how the meta-exon annotation has been created.
     29
     30See [wiki:BIOS_ReferenceFiles/MetaExonAnnotation-05-06-13 Meta-exon annotation 05-06-13] for issues with this file.
     31
     32'''3. Masked genome.'''[[BR]]
     33
     34To mask the genome we took all SNPs called in GoNL project that had a MAF > 1% and replaced them with “N” in genome fasta files using maskFastaFromBed tool from BEDTools suite.
     35
     36'''4. STAR genome index.'''[[BR]]
     37
     38To make the masked genome index we run STAR in genomeGenerate mode on the masked genome fasta files, setting the --sjdbOverhang parameter to 100 and using the transcript annotation from Ensembl v.71.
     39
     40''' GTF to BED conversion '''
     41
     42Paste magic to get from the GTF to required BED format with columns in correct order for downstream quantification.
     43
     44{{{
     45paste <(cut -f1,4,5 meta-exons_v71_cut_sorted_05-06-13.gtf)  <(cut -f9 meta-exons_v71_cut_sorted_05-06-13.gtf | cut -d';' -f2) | paste - <(cut -f9 meta-exons_v71_cut_sorted_05-06-13.gtf ) > meta-exons_v71_cut_sorted_05-06-13.bed
     46}}}