wiki:PipelinePlan

Context Navigation

Version 3 (modified by Morris Swertz, 14 years ago) (diff)
--

Analysis pipeline

BGI will provide us with SAM/BAM standard formatted files containing the raw, uncleaned reads. This is the main input for our pipeline. We will construct a data processing and analysis pipeline, based on the recent experiences from 1000 Genomes Project, using (where available) existing tools already developed (such as GATK, SAMTOOLS, PICARD).

See Genome Analysis Pipeline document drafted by Morris for individual steps for alignment, quality score recalibration, indel cleaning. The aim is to compare the outputs (sites called, genotypes called) between this pipeline and BGI.

Action items for pilot data: (URGENT)

Input: QC'd read data (SAM/BAM format) with genotypes (in VCF format) from BGI

Genome-wide coverage (? 12X)
Variant sites called (all, novel, dbSNP, HapMap?, 1KG pilot 2): number and Ti/Tv? ratio (as crude proxy for false positive rate)
Concordance with immunochip data (have genotypes been deposited?). Special focus on low-frequency SNPs. Need to pay attention to the genotype calling of the immunochip data.
Check genotypes for Mendel errors (trio problems?)
Visual inspection of variant sites using IGV

Action items for main project:

Input: raw read data (SAM/BAM format)

This needs a fully working pipeline as detailed in Morris' analysis document
more details to come

Attachments (1)

BBMRI-NL Genome Analysis Pipeline Manual 2010-08-23.doc (317.0 KB) - added by Morris Swertz 14 years ago.

Download all attachments as: .zip

Download in other formats:

Plain Text