= Introduction = This is a SNP annotation pipeline. {{{#!graphviz digraph g { size="13,13" node [shape=box,style=filled,color=lightgrey] "GFF Files from BGI" "Prepared GFF FIles files for SeattleSeq Annotation" "SeattleSeq Annotationed Files" "Polyphen Annotation" "GO From BioMart" "GO Annotation" "Intersection (Chr) Files" "VCF Files from 1000GP" "Allele Frequency Annotation" "Annotated Intersection Files" node [shape=ellipse,color=yellow] PrepareGFFFilesFromBGIForSeattleSeqAnnotation AnnotateVarianListFileViaSeattleSeqAnnotation AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl "AnnotationTool (Patrick)" CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames MergeHorizontallyFilesAccordingToCommonColumns "GFF Files from BGI" -> PrepareGFFFilesFromBGIForSeattleSeqAnnotation -> "Prepared GFF FIles files for SeattleSeq Annotation" -> AnnotateVarianListFileViaSeattleSeqAnnotation -> "SeattleSeq Annotationed Files" "SeattleSeq Annotationed Files" -> AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs -> "Polyphen Annotation" "GO From BioMart" -> AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl "GFF Files from BGI" -> "AnnotationTool (Patrick)" -> "Intersection (Chr) Files" "Intersection (Chr) Files" -> AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl -> "GO Annotation" "VCF Files from 1000GP" -> CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames "Intersection (Chr) Files" -> CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames -> "Allele Frequency Annotation" "Polyphen Annotation" -> MergeHorizontallyFilesAccordingToCommonColumns "GO Annotation" -> MergeHorizontallyFilesAccordingToCommonColumns "Allele Frequency Annotation" -> MergeHorizontallyFilesAccordingToCommonColumns "Intersection (Chr) Files" -> MergeHorizontallyFilesAccordingToCommonColumns MergeHorizontallyFilesAccordingToCommonColumns -> "Annotated Intersection Files" } }}} = PrepareGFFFilesFromBGIForSeattleSeqAnnotation = Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool. Replace `alleles` with `allele` and adds the line: `# autoFile testAuto.txt` in the top of the file. == Parameters == * GFFFilename : Input filename * outputGFFFilename: Output filename == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff") }}} }}} == Source Code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/PrepareGFFFilesFromBGIForSeattleSeqAnnotation.py = !AnnotateVarianListFileViaSeattleSeqAnnotation = Annotate Files with Variants through Seattle Seq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files: * httpunit.jar * js-1.6R5.jar * junit-3.8.1.jar * nekohtml-0.9.5.jar * xercesImpl-2.6.1.jar == Parameters == For a complete list of parameters please check the [http://gvs.gs.washington.edu/SeattleSeqAnnotation/|SeattleSeq Annotation website] and the example below == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python AnnotateVarianListFileViaSeattleSeqAnnotation( inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff, outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt, eMail=alexandros.kanterakis@gmail.com, fileFormat=GFF, geneData=CCDS2008, allelesMaq=true, allelesDBSNP=true, scorePhastCons=true, scorePhastCons=true, consScoreGERP=true, chimpAllele=true, CNV=true, geneList=true, HapMapFreqType=HapMapFreqMinor, geneList=true, hasGenotypes=true, dbSNPValidation=true, repeats=true, geneList=true, proteinSequence=true, polyPhen=true, clinicalAssociation=true ) }}} }}} == Source Code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation.py = AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs = This method takes a list of files that have been generated from [http://gvs.gs.washington.edu/SeattleSeqAnnotation/ SeattleSeq Annotation] tool and a list of tabular files that contain Chromosome and position columns. It adds the [http://genetics.bwh.harvard.edu/pph/ polyphen] annotation that is contained in the former list of files to the later. == Parameters == * listOfSeattleSeqAnnotationOutputs: list of SeattleSeq Annotation files that we want to take the polyphen annotation from * listOfFileToBeAnnotated: List of files with chromosome and position information. * chromosomeColumn: The Chromosome column of the files to be annotated * positionColumn: The position column of the files to be annotated * outputDir: The directory where the generated files will be stored * outputSuffix: The suffix of the output files. == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python listOfSeattleSeqAnnotationOutputs = [ "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt" ] filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs( listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs, listOfFileToBeAnnotated=filesToBeAnnotated, chromosomeColumn=2, positionColumn=3, outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection", outputSuffix="_poluphenExample.txt", numberOfFirstLinesToIgnore=1 ) }}} }}} == Source code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs.py = AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl = Create Gene Ontology (http://www.geneontology.org/) annotation for a list of files that contain at least a position column and a chromosome column. == Parameters == * listOfFilesToAnnotate: Python list of filenames to be annotated * numberOfFirstLinesToIgnoreInFileToAnnotate: * chromosomeColumnOfFilesToAnnotate: The # of the chromosome column in the file to be annotated (starting from 0) * positionColumnOfFilesToAnnotate: The # of the position column in the file to be annotated (starting from 0) * resolveDuplicateValuesFunctionInFileToBeAnnotated: What should we do if we found 2 lines in the file to be annotated that has the same position and chromosome? If not set to None it will call the function assigned to this parameter * fileWithGOAnnotation: The file that has been downloaded from BioMart and contains the GO annotation. * fileWithGOAnnotationChromosomeColumn: The column that contain the chromosome in the fileWithGOAnnotation * fileWithGOAnnotationStartColumn: The column that contain the start of the transcript in the fileWithGOAnnotation * fileWithGOAnnotationEndColumn: The columns that contain the end of the transcript in the fileWithGOAnnotation * columnsWithGOAnnotationComaSeparated: The columns that contain the annotations that we want to add in the fileWithGOAnnotation. Example: "2,3,4" * numberOfFirstLinesToIgnoreInGOAnnotationFile * outputDirectory * outputSuffix: The output file will be: outputDirectory/(basename of inputFile)+outputSuffix == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python fileList= [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl( listOfFilesToAnnotate=fileList, numberOfFirstLinesToIgnoreInFileToAnnotate=1, chromosomeColumnOfFilesToAnnotate=2, positionColumnOfFilesToAnnotate=3, fileWithGOAnnotation="/Users/alexandroskanterakis/Data/Ensembl/GENE_START_END_GO_FROM_ENSEMBL_36.txt", fileWithGOAnnotationChromosomeColumn=1, fileWithGOAnnotationStartColumn=2, fileWithGOAnnotationEndColumn=3, columnsWithGOAnnotationComaSeparated="4,5,6,7,8,9", numberOfFirstLinesToIgnoreInGOAnnotationFile=1, outputDirectory="/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02", outputSuffix="_GO.txt" ) }}} }}} == Source Code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl.py = CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames = Creates Allele Frequency annotation from a list of VCFFilenames for tabular files that contain at least a chromosome and a position column. It requires the xapian (http://xapian.org) python package and vcftools (http://vcftools.sourceforge.net/) == Parameters == * pathToVCFTools: Path where vcftools is installed * listOfVCFFiles: python list of VCF files where the annotation will come from * listOfFilenamesToBeAnnotated * outputPreffix * xapianIndexDirectory. If None it will create a system temporary directory. == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python import wikipl from wikipl import CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames VCFFilenames_Example = [ "/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/CEU.exon.2010_03.genotypes.vcf", "/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/YRI.exon.2010_03.genotypes.vcf" ] filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames( pathToVCFTools = "/Users/alexandroskanterakis/Tools/vcftools/cpp/vcftools", listOfVCFFiles=VCFFilenames_Example, listOfFilenamesToBeAnnotated=filesToBeAnnotated, outputPreffix="_AlleleFrequencyExample.txt", xapianIndexDirectory="/Users/alexandroskanterakis/Data/CD_china/Intersection/xapianDB_Example" ) }}} }}} == Source code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames.py = MergeHorizontallyFilesAccordingToCommonColumns = Merge horizontally files according to common columns == Parameters == * listOfFilenamesToBeAnnotated: Python list of filenames to be annotated. * listOfColumnsFromFileToBeAnnotated: Python list of columns that we want to keep from the files to be annotated * listOfListsOfInputFilenames: Python list of python list of input filenames * listOfAnnotationFileColumns * listOfFirstLinesToIgnore: Python list of first lines to ignore from each annotation file * listOfOutputFilenames == Example == {{{ #!div style="font-size: 80%" Code highlighting: {{{#!python filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] filesAnnotation1 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_polyphen.txt" ] filesAnnotation2 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_GO.txt" ] filesAnnotation3 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_AlleleFrequency.txt" ] filesOutput123Annotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_Annotated.txt" ] MergeHorizontallyFilesAccordingToCommonColumns( listOfFilenamesToBeAnnotated=filesToBeAnnotated, # listOfColumnsFromFileToBeAnnotated=range(39), listOfColumnsFromFileToBeAnnotated = [2,3], listOfListsOfInputFilenames=[filesAnnotation1,filesAnnotation2,filesAnnotation3], listOfAnnotationFileColumns=[[2],[2,3,4,5,6,7],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]], listOfOutputFilenames=filesOutput123Annotated ) }}} }}} == Source Code == http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/MergeHorizontallyFilesAccordingToCommonColumns.py = ANNOVAR Annotation pipeline = * About ANNOVAR: http://www.openbioinformatics.org/annovar/ * Download ANNOVAR http://www.openbioinformatics.org/annovar/download/annovar.latest.tar.gz more information: http://www.openbioinformatics.org/annovar/annovar_download.html * ANNOVAR is already installed and configured in gbicdev: /data/home/data/alex/ANNOVAR/ * Download (for both hg18 and hg18 release) annotation files (most of them from UCSC): * Gene based: http://www.openbioinformatics.org/annovar/annovar_gene.html * Region based: http://www.openbioinformatics.org/annovar/annovar_region.html * Filter based: http://www.openbioinformatics.org/annovar/annovar_filter.html * These files are have already been downloaded in gbicdev: /data/home/data/alex/ANNOVAR/humandb_hg18/ , /data/home/data/alex/ANNOVAR/humandb_hg19/ * To Annotate a VCF (version 4.0) file, use: http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/ANNOVAR.py This is a wrapper for the ANNOVAR tool. * Usage: * Example: python ANNOVAR.py --pathToANNOVAR /data/home/akanterakis/tools/ANNOVAR/annovar --VCFFilename /data/home/data/pdeelen/Celiac40ExomsProject/SequenceData/sequence0605_41.index_hg18.snps.filtered.vcf --outputFilename /data/home/data/alex/ANNOVAR/annotated/output.txt --outputDirectory /data/home/data/alex/ANNOVAR/annotated/ --buildver hg18 --annotationDirectory /data/home/data/alex/ANNOVAR/humandb_hg18/ --geneBasedAnnotations refgene,knowngene,ensgene --regionBasedAnnotations band,segdup,dgv,gwascatalog --filterBasedAnnotations snp130 --customAnnotations kantale * Options: * --pathToANNOVAR: The path to the installed ANNOVAR tool * --VCFFilename: The path to the VCF file to be annotated * --outputFilename: The output annotated file * --outputDirectory: output directory * --buildver: Could be either hg18 or hg18 * --annotationDirectory: The directory where the annotation files are * --geneBasedAnnotations: Gene Based Annotations according to ANNOVAR (coma separated no spaces) * --regionBasedAnnotations: Region Based Annotations according to ANNOVAR (coma separated no spaces) * --filterBasedAnnotations: Filter Based Annotations according to ANNOVAR (coma separated no spaces) * --customAnnotations: Custom annotations. These should be GFF3 files http://www.sequenceontology.org/gff3.shtml. The list of files should be comma separated no spaces * --dummy: If set True, the script will not do anything just print the script commands. Default value: False * --verbose True/False (Default: True) * There are additional options for GeneOntology Annoatation and AlleleFrequency from VCF files under development. = Related work = * http://www.svaproject.org/ * http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001074 = Links and resources = * ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ Genomic tracks from UCSC * http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001074