wiki:ImputationTool

Version 10 (modified by a.kanterakis, 13 years ago) (diff)

--

Introduction

ImputationTool is a collection of methods to perform pre- and post- analysis for imputation related tasks.

Implementation

ImputationTool has been created by:

  • Dr. Lude Franke (Lude@…) : Format design, Initial methods.
  • Harm-Jan Westra (harm-jan@…): Extensions, Format converters, SNPs checks.

It has been written in java, NetBeans.

Documentation

From the ImputationTool help screen:

ImputationTool v0.2


------------------------
PreProcessing
------------------------

# Create random batches of cases and controls from a TriTyper dataset. Creates a file called batches.txt in outdir.
--mode batch --in TriTyperdir --out outdir --size batchsize

------------------------
Imputation
------------------------

# Convert Impute Imputed data into TriTyper
--mode itt --in ImputeDir --out TriTyperDir
------------------------
Beagle
------------------------

# Convert beagle files (one file/chromosome) to TriTyper. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number.
--mode btt --in BeagleDir --tpl template --ext ext --out TriTyperDir [--fam famfile]

# Convert batches of beagle files (multiple files / chromosome) to trityper files. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number, BATCH by the batchname.
--mode bttb --in BeagleDirdir --tpl template --out TriTyperDir --size numbatches

------------------------
Ped+Map (Plink files)
------------------------

# Converts Ped and Map files created by ttpmh to Beagle format
--mode pmbg --in indir --batch-file batches.txt

# Converts TriTyper file to Plink Dosage format. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number, BATCH by the batchname.
--mode ttpd --in indir --beagle beagledir --tpl template --batchdesc batchdescriptor --out outdir --fam famfile

# Converts PED and MAP files to TriTyper.
--mode pmtt --in Ped+MapDir --out TriTyperDir

# Converts TriTyper file to PED and MAP files. The FAM file is optional. --split splits the ped and map files per chromosome
--mode ttpm --in indir --out outdir [--fam famfile] [--split]

# Converts TriTyper dataset to Ped+Map concordant to reference (hap) dataset. Supply a batchfile if you want to export in batches. Supply a chromosome if you want to export a certain chromosome.
--mode ttpmh --in TriTyperDir --hap TriTyperReferenceDir --out outdir [--fam famfile] [--batch-file batchfile] [--chr chromosome] [--exclude fileName]

---------------------
PostProcessing
---------------------

# Correlates genotypes of imputed vs non-imputed datasets. Saves a file called correlationOutput.txt in outdir, containing correlation per chromosome as well as correlation distribution.
--mode corr --in TriTyperDir --name datasetname --in2 TriTyperDir2 --name2 datasetname2 --out outdir [--snps snplist]

# Correlates genotypes of imputed vs non-imputed datasets. Also take Beagle imputation score (R2) into account. Saves a file called correlationOutput.txt in outdir, containing correlation per chromosome as well as correlation distribution.
--mode corrb --in TriTyperDir --name datasetname --in2 TriTyperDir2 --name2 datasetname2 --out outdir --beagle beagleDir --tpl template --size numBatches 

# Gets all the excluded snps from chrx.excludedsnps.txt with a certain call-rate threshold (0 < threshold < 1.0)
--mode ecra --in TriTyperDir --threshold threshold

# Generates R2 distribution (beagle quality score) for each batch and chromosome, and tests each batch against chromosome R2 distribution, using WilcoxonMannWhitney test
--mode r2dist --in BeagleDir --template template --out outdir --size numbatches

# Merge two TriTyper datasets
--mode merge --in TriTyper1Dir --in2 TryTyper2Dir --out outdir

The TriTyper Format

TriTyper is a binary format to store genotype information, including insertion, deletion and expression data, providing very efficient read/write/seek methods. It has been written in java and managed by NetBeans.