Changes between Version 1 and Version 2 of Impute2Pipeline


Ignore:
Timestamp:
Apr 19, 2013 5:16:28 PM (11 years ago)
Author:
freerkvandijk
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Impute2Pipeline

    v1 v2  
    3131 * Location: https://github.com/molgenis/molgenis-pipelines/blob/master/compute4/Imputation_imputationtool_studyQC/protocols/studyQC.ftl
    3232 [[br]]This protocol applies QC to the study data using imputationTool. This tool employs a binary format, called [http://genenetwork.nl/wordpress/trityper/ TriTyper] for rapid loading of big genotypic data. ImputationTool performs the following checks:
    33  1. Assesses strand alignment of alleles and swap SNPs if needed. For example, if a SNP which is in LD with multiple SNPs has a negative score the alleles are swapped and LD is calculated again. If the score of the SNP is still negative the SNP is removed from the study data.
    34  1. Regular Quality Checks done during routine processing of GWAS data. These checks include: Hardy-Weinberq equilibrium should be higher that 10^-4, minor allele frequency should be higher than 0.01 and call rate should be higher than 0.95. If any of these criteria fails the SNP is removed from the study panel. These SNPs are removed because of the high likelihood that they contain erroneous genotypes.
    35  1. Simple sanity checks like check if the SNP is present in the reference panel or if it has null alleles. In both cases the SNP is removed from the study panel.
    36  1. Check if there are significant differences between the allele frequencies in the two panels. Difference higher than 25% indicates that there is an important qualitative difference that guides this contrast. A possible imputation of this SNP is prone to introduce invalid information. For this reason these SNPs are removed from the study panel.
    37  1. Assesses if the haplotype structure is comparable between reference and GWAS data. This is performed by pairwise comparison of r-squared between SNPs in both reference and GWAS. For SNPs in LD (r-squared > 0.1), the allele frequencies are compared. SNPs are removed from the GWAS data when the major allele differs more often than it is identical.
     33   1. Assesses strand alignment of alleles and swap SNPs if needed. For example, if a SNP which is in LD with multiple SNPs has a negative score the alleles are swapped and LD is calculated again. If the score of the SNP is still negative the SNP is removed from the study data.
     34   1. Regular Quality Checks done during routine processing of GWAS data. These checks include: Hardy-Weinberq equilibrium should be higher that 10^-4, minor allele frequency should be higher than 0.01 and call rate should be higher than 0.95. If any of these criteria fails the SNP is removed from the study panel. These SNPs are removed because of the high likelihood that they contain erroneous genotypes.
     35   1. Simple sanity checks like check if the SNP is present in the reference panel or if it has null alleles. In both cases the SNP is removed from the study panel.
     36   1. Check if there are significant differences between the allele frequencies in the two panels. Difference higher than 25% indicates that there is an important qualitative difference that guides this contrast. A possible imputation of this SNP is prone to introduce invalid information. For this reason these SNPs are removed from the study panel.
     37   1. Assesses if the haplotype structure is comparable between reference and GWAS data. This is performed by pairwise comparison of r-squared between SNPs in both reference and GWAS. For SNPs in LD (r-squared > 0.1), the allele frequencies are compared. SNPs are removed from the GWAS data when the major allele differs more often than it is identical.
    3838 [[br]]
    3939 [[br]]