== Introduction == The purpose if this run is to test the efficiency of the existing imputation pipelines in the Grid. == Datasets == === Reference === The reference dataset has been created from the raw VCF data of 1000 Genomes data. * Download VCF files from : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521 * Export only the SNPs (filter out the indels and SVs) from VCF data by using [http://vcftools.sourceforge.net/ vcftools] and convert to impute2 format (hap and legend format). {{{ vcftools \ --gzvcf ALL.chr1.phase1_release_v2.20101123.snps_indels_svs.vcf.gz \ --keep-INFO LCSNP --keep-INFO EXSNP --keep-INFO SNP \ --IMPUTE \ --out ALL.chr1.phase1_release_v2.20101123.snps_indels_svs. }}} * Alternatively we coud have used the 1000 Genomes reference panel in impute2 format (legend and hap files) from the impute2 website: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data === Study panel === The study panel is an artificial genotype dataset. The dataset contains the SNPs set of the Illumina Hap550 platform. To generate it we followed the following steps: * Download the genetic map of b37 release of human genome from impute2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data * Download and install hapgen2: https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html * Download the list of SNPs in the Hap550 platform: * Go to : http://genome.ucsc.edu/cgi-bin/hgTables?command=start * From the Table combo box select: snpArrayIllumina550 * Click the "get output" button * For the hapgen2 to run we need to specify at least one causal SNP. We selected the first SNP of the dataset : {{{ hapgen2 \ -h ALL.chr6.merged_beagle_mach.20101123.snps_indels_svs.genotypes.exported.impute.hap \ -l ALL.chr6.merged_beagle_mach.20101123.snps_indels_svs.genotypes.exported.impute.legend \ -m genetic_map_chr6_combined_b37.txt \ -o chr6 \ -dl 16539175 1 1.0 1.0 \ -n 5000 0 }}} The study panel is an artificial genotype dataset. Created by The study panel is an artificial genotype dataset. Created by The study panel is an artificial genotype dataset. Created by