wiki:SnpAnnotationPipeline

Version 25 (modified by a.kanterakis, 13 years ago) (diff)

--

Introduction

This is a SNP annotation pipeline.

Error: Failed to load processor graphviz
No macro or processor named 'graphviz' found

PrepareGFFFilesFromBGIForSeattleSeqAnnotation

Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool?. Replace alleles with allele and adds the line: # autoFile testAuto.txt in the top of the file.

Parameters

  • GFFFilename : Input filename
  • outputGFFFilename: Output filename

Example

Code highlighting:

PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff")

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/PrepareGFFFilesFromBGIForSeattleSeqAnnotation.py

AnnotateVarianListFileViaSeattleSeqAnnotation

Annotate Files with Variants through Seattle Seq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq? Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files:

  • httpunit.jar
  • js-1.6R5.jar
  • junit-3.8.1.jar
  • nekohtml-0.9.5.jar
  • xercesImpl-2.6.1.jar

Parameters

For a complete list of parameters please check the Annotation website and the example below

Example

Code highlighting:

AnnotateVarianListFileViaSeattleSeqAnnotation(
        inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff,
        outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt,
        eMail=alexandros.kanterakis@gmail.com,
        fileFormat=GFF,
        geneData=CCDS2008,
        allelesMaq=true,
        allelesDBSNP=true,
        scorePhastCons=true,
        scorePhastCons=true,
        consScoreGERP=true,
        chimpAllele=true,
        CNV=true,
        geneList=true,
        HapMapFreqType=HapMapFreqMinor,
        geneList=true,
        hasGenotypes=true,
        dbSNPValidation=true,
        repeats=true,
        geneList=true,
        proteinSequence=true,
        polyPhen=true,
        clinicalAssociation=true
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation/AnnotateVarianListFileViaSeattleSeqAnnotation.py

AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs

This method takes a list of files that have been generated from SeattleSeq Annotation tool and a list of tabular files that contain Chromosome and position columns. It adds the polyphen annotation that is contained in the former list of files to the later.

Parameters

  • listOfSeattleSeqAnnotationOutputs: list of SeattleSeq? Annotation files that we want to take the polyphen annotation from
  • listOfFileToBeAnnotated: List of files with chromosome and position information.
  • chromosomeColumn: The Chromosome column of the files to be annotated
  • positionColumn: The position column of the files to be annotated
  • outputDir: The directory where the generated files will be stored
  • outputSuffix: The suffix of the output files.

Example

Code highlighting:

listOfSeattleSeqAnnotationOutputs = [
"/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt"
]

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]


AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs(
        listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs,
        listOfFileToBeAnnotated=filesToBeAnnotated,
        chromosomeColumn=2,
        positionColumn=3,
        outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection",
        outputSuffix="_poluphenExample.txt",
        numberOfFirstLinesToIgnore=1
        )

Source code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs.py

AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl

Create Gene Ontology (http://www.geneontology.org/) annotation for a list of files that contain at least a position column and a chromosome column.

Parameters

  • listOfFilesToAnnotate: Python list of filenames to be annotated
  • numberOfFirstLinesToIgnoreInFileToAnnotate:
  • chromosomeColumnOfFilesToAnnotate: The # of the chromosome column in the file to be annotated (starting from 0)
  • positionColumnOfFilesToAnnotate: The # of the position column in the file to be annotated (starting from 0)
  • resolveDuplicateValuesFunctionInFileToBeAnnotated: What should we do if we found 2 lines in the file to be annotated that has the same position and chromosome? If not set to None it will call the function assigned to this parameter
  • fileWithGOAnnotation: The file that has been downloaded from BioMart? and contains the GO annotation.
  • fileWithGOAnnotationChromosomeColumn: The column that contain the chromosome in the fileWithGOAnnotation
  • fileWithGOAnnotationStartColumn: The column that contain the start of the transcript in the fileWithGOAnnotation
  • fileWithGOAnnotationEndColumn: The columns that contain the end of the transcript in the fileWithGOAnnotation
  • columnsWithGOAnnotationComaSeparated: The columns that contain the annotations that we want to add in the fileWithGOAnnotation. Example: "2,3,4"
  • numberOfFirstLinesToIgnoreInGOAnnotationFile
  • outputDirectory
  • outputSuffix: The output file will be: outputDirectory/(basename of inputFile)+outputSuffix

Example

Code highlighting:

fileList= [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl(
        listOfFilesToAnnotate=fileList,
        numberOfFirstLinesToIgnoreInFileToAnnotate=1,
        chromosomeColumnOfFilesToAnnotate=2,
        positionColumnOfFilesToAnnotate=3,
        fileWithGOAnnotation="/Users/alexandroskanterakis/Data/Ensembl/GENE_START_END_GO_FROM_ENSEMBL_36.txt",
        fileWithGOAnnotationChromosomeColumn=1,
        fileWithGOAnnotationStartColumn=2,
        fileWithGOAnnotationEndColumn=3,
        columnsWithGOAnnotationComaSeparated="4,5,6,7,8,9",
        numberOfFirstLinesToIgnoreInGOAnnotationFile=1,
        outputDirectory="/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02",
        outputSuffix="_GO.txt"
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl.py

CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames

Creates Allele Frequency annotation from a list of VCFFilenames for tabular files that contain at least a chromosome and a position column. It requires the xapian (http://xapian.org) python package and vcftools (http://vcftools.sourceforge.net/)

Parameters

  • pathToVCFTools: Path where vcftools is installed
  • listOfVCFFiles: python list of VCF files where the annotation will come from
  • listOfFilenamesToBeAnnotated
  • outputPreffix
  • xapianIndexDirectory. If None it will create a system temporary directory.

Example

Code highlighting:

import wikipl
from wikipl import CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames

VCFFilenames_Example = [
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/CEU.exon.2010_03.genotypes.vcf",
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/YRI.exon.2010_03.genotypes.vcf"
]

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames(
        pathToVCFTools = "/Users/alexandroskanterakis/Tools/vcftools/cpp/vcftools",
        listOfVCFFiles=VCFFilenames_Example,
        listOfFilenamesToBeAnnotated=filesToBeAnnotated,
        outputPreffix="_AlleleFrequencyExample.txt",
        xapianIndexDirectory="/Users/alexandroskanterakis/Data/CD_china/Intersection/xapianDB_Example"
        )

Source code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames.py

MergeHorizontallyFilesAccordingToCommonColumns?

Merge horizontally files according to common columns

Parameters

  • listOfFilenamesToBeAnnotated: Python list of filenames to be annotated.
  • listOfColumnsFromFileToBeAnnotated: Python list of columns that we want to keep from the files to be annotated
  • listOfListsOfInputFilenames: Python list of python list of input filenames
  • listOfAnnotationFileColumns
  • listOfFirstLinesToIgnore: Python list of first lines to ignore from each annotation file
  • listOfOutputFilenames

Example

Code highlighting:

filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]

filesAnnotation1 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_polyphen.txt"
]

filesAnnotation2 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_GO.txt"
]

filesAnnotation3 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_AlleleFrequency.txt"
]

filesOutput123Annotated = [

"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_Annotated.txt"

]

MergeHorizontallyFilesAccordingToCommonColumns(
        listOfFilenamesToBeAnnotated=filesToBeAnnotated,
#       listOfColumnsFromFileToBeAnnotated=range(39),
        listOfColumnsFromFileToBeAnnotated = [2,3],
        listOfListsOfInputFilenames=[filesAnnotation1,filesAnnotation2,filesAnnotation3],
        listOfAnnotationFileColumns=[[2],[2,3,4,5,6,7],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]],
        listOfOutputFilenames=filesOutput123Annotated
        )

Source Code

http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/MergeHorizontallyFilesAccordingToCommonColumns.py

ANNOVAR Annotation pipeline

Related work

Links and resources

Attachments (2)

Download all attachments as: .zip