Changes between Initial Version and Version 1 of PipelinePlan

Sep 5, 2010 9:04:12 PM (11 years ago)
Morris Swertz



  • PipelinePlan

    v1 v1  
     3= Analysis pipeline =
     5BGI will provide us with SAM/BAM standard formatted files containing the raw, uncleaned reads.  This is the main input for our pipeline.
     6We will construct a data processing and analysis pipeline, based on the recent experiences from 1000 Genomes Project, using (where available) existing tools already developed (such as GATK, SAMTOOLS, PICARD). See Genome Analysis Pipeline document drafted by Morris for individual steps for alignment, quality score recalibration, indel cleaning. The aim is to compare the outputs (sites called, genotypes called) between this pipeline and BGI.
     8== Action items for pilot data: (URGENT) ==
     9Input: QC'd read data (SAM/BAM format) with genotypes (in VCF format) from BGI
     10 * Genome-wide coverage (? 12X)
     11 * Variant sites called (all, novel, dbSNP, HapMap, 1KG pilot 2): number and Ti/Tv ratio (as crude proxy for false positive rate)
     12 * Concordance with immunochip data (have genotypes been deposited?).  Special focus on low-frequency SNPs.  Need to pay attention to the genotype calling of the immunochip data.
     13 * Check genotypes for Mendel errors (trio problems?)
     14 * Visual inspection of variant sites using IGV
     16== Action items for main project: ==
     17Input: raw read data (SAM/BAM format)
     18 * This needs a fully working pipeline as detailed in Morris' analysis document
     19 * more details to come