Changes between Initial Version and Version 1 of SnpCallingPipeline/AlignmentAndCleaning


Ignore:
Timestamp:
Oct 18, 2010 4:44:50 AM (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SnpCallingPipeline/AlignmentAndCleaning

    v1 v1  
     1= Workflow 2: Alignment per Lane, per Chr =
     2[[TOC()]]
     3
     4This workflow aligns reads per lane and chromosome, including:
     5* re-alignment to prevend false SNP calls caused by indels (using known indels)
     6* markduplicates to prevend false coverage caused by PCR errors (per library = lane)
     7* base quality recalibration to correct for false low scores caused by true variation
     8
     9Workflow Inputs:
     10* lane.1.fq.gz - raw reads for lane, pair end 1
     11* lane.2.fq.gz - raw reads for lane, pair end 2
     12* genome.chr.fasta - reference genome split on chromosome
     13* genome.chr.realign.intervals - targets for realignment per chromosome
     14* genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp
     15* genome.chr.indelsXYZ.vcf - known indels from, here from 1KG
     16
     17Workflow ouputs:
     18* lane.chr.1.sai - alignment index for first pair
     19* lane.chr.2.sai - alignment index for second pair
     20* lane.chr.sam - alignment map for
     21* lane.chr.bam - alignment map in binary format
     22* lane.chr.sorted.bam - sorted alignment map
     23* lane.chr.sorted.bai - sorted alignment index
     24* lane.chr.dedup.bam - marked duplicate PCR elements
     25* lane.chr.dedup.metrics - metrics describing deduplication
     26* lane.chr.realigned.bam - realigned based on known indels
     27* lane.chr.matefixed.bam - fixed the mate pair ends
     28* lane.chr.covariate_table.csv - table of countcovariates output for recalibration
     29* lane.chr.recal.bam - alignment map with recalibrated quality scores
     30
     31== align ==
     32Align each end of paired end.
     33 
     34||tool:   ||bwa-align ||
     35||input:  ||chr.fasta, lane.1.fq.gz, lane.2.fq.gz ||
     36||output: ||lane.chr.1.sai, lane.chr.2.sai ||
     37||docs:   ||http://bio-bwa.sourceforge.net/bwa.shtml ||
     38
     39== align-pe ==
     40Align the pairs as one
     41
     42||tool:    ||bwa sampe ||
     43||inputs:  ||chr.fasta [[BR]] lane.1.fq.gz [[BR]] lane.2.fq.gz [[BR]] lane.chr.1.sai [[BR]] lane.chr.2.sai ||
     44||outputs: ||lane.chr.sam ||
     45||docs:    ||http://bio-bwa.sourceforge.net/bwa.shtml ||
     46
     47== sam-to-bam ==
     48Convert sam to bam
     49
     50||tool:    ||samtools view ||
     51||inputs:  ||lane.chr.sam ||
     52||outputs: ||lane.chr.bam ||
     53||docs:    ||http://samtools.sourceforge.net/samtools.shtml ||
     54
     55(Question: can this not index and sort?)
     56
     57== sam-sort ==
     58Sort bam file on coordinate
     59
     60||tool:    ||samtools sort ||
     61||inputs:  ||lane.chr.bam ||
     62||outputs: ||lane.chr.sorted.bam ||
     63||docs:    ||http://samtools.sourceforge.net/samtools.shtml ||
     64
     65== sam-index ==
     66Index bam file for quicker access
     67
     68||tool:    ||samtools index ||
     69||inputs:  ||lane.chr.sorted.bam ||
     70||outputs: ||lane.chr.sorted.bai ||
     71||docs:    ||http://samtools.sourceforge.net/samtools.shtml ||
     72
     73== !MarkDuplicates ==
     74Mark duplicate PCR fragments to be filtered in analysis
     75
     76||tool:    ||MarkDuplicates.jar ||
     77||inputs:  ||lane.chr.sorted.bam ||
     78||outputs: ||lane.chr.dedup.bam [[BR]] lane.chr.dedup.metrics ||
     79||docs:    ||http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates ||
     80
     81== !IndelRealigner-!KnownsOnly ==
     82Improve the alignment using known indel information (will reduce false SNP calls)
     83
     84||tool:    ||GenomeAnalysisTK.jar -T IndelRealigner ||
     85||inputs:  ||lane.chr.dedup.bam [[BR]] genome.chr.realign.intervals [[BR]] genome.chr.dbsnpXYZ.rod [[BR]] genome.chr.indelsXYZ.vcf ||
     86||outputs: ||lane.chr.realigned.bam ||
     87||docs     ||http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Running_the_Indel_Realigner_only_at_known_sites ||
     88
     89== !FixMateInformation ==
     90Fix the paired end information as consequence of the realignment.
     91
     92||tool:    ||FixMateInformation.jar ||
     93||inputs:  ||lane.chr.realigned.bam
     94||outputs: ||lane.chr.matefixed.bam ||
     95||docs:    ||http://picard.sourceforge.net/command-line-overview.shtml#FixMateInformation,
     96
     97http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Fixing_Mate_Pairs ||
     98
     99== !CountCovariates ==
     100Count covariants, such as machine cycle and bp position, to be used as basis for quality recalibration.
     101Optionally: plot the results to pdf using AnalyzeCovariates
     102
     103||tool:    ||GenomeAnalysisTK.jar -T CountCovariates, AnalyzeCovariates.jar ||
     104||inputs:  ||lane.chr.matefixed.bam [[BR]] genome.chr.dbsnpXYZ.rod ||
     105||outputs: ||lane.chr.covariate_table.csv ||
     106||docs:    ||http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#CountCovariates [[BR]]
     107
     108http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#AnalyzeCovariates.jar ||
     109
     110== !TableRecalibration ==
     111Recalibrate quality scores based on the covariate table
     112||tool:    ||GenomeAnalysisTK.jar -T TableRecalibration ||
     113||inputs:  ||lane.chr.matefixed.bam [[BR]]lanec.chr.recal_table.csv [[BR]]chr.fasta ||
     114||outputs: ||lane.chr.recal.bam
     115||docs:    ||http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration#TableRecalibration ||
     116
     117== Repeat: sam-sort, sam-index, countcovariates ==
     118See steps above for commands and docs.
     119
     120||inputs:  ||lane.chr.recal.bam ||
     121||outputs: ||lane.chr.recal.sorted.bam, lane.chr.recal.sorted.bam.bai, lane.chr.recal.covariate_table.csv ||
     122
     123Discussion:
     124> wy do we need to sort and index after recalibration? does it mess up the order of things?