Changes between Version 13 and Version 14 of DataManagement


Ignore:
Timestamp:
Jul 15, 2011 8:47:26 AM (13 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataManagement

    v13 v14  
    1616   * All resources should be put in a folder precising their version. Normally, should follow resource-version.
    1717     * Ex: Human Genome build 19 should be found in'' /data/gcc/resources/hg-19/''
    18 
    19 === GoNL-level Directory structure ===
    20 The root for all subsequent directories is /data/gcc/projects/gonl/
    21 
    22  * /rawdata
    23    * Contains all the raw unprocessed data by batch
    24      * Ex: All raw data for the 1st batch is located in'' /data/gcc/projects/gonl/rawdata/first_batch/''
    25  * /results
    26    * Contains all the results after processing the data
    27  * /results/BGI
    28    * Contains all the results from the BGI pipeline (snps, indels, metrics, etc.)
    29    * The XXX.snp.sorted.vcf files are sorted according to position.
    30    * The XXX.annotated.txt are the sorted files annotated from ANNOVAR. The options that were used for the annotation are:
    31        * --buildver hg18
    32        * --annotationDirectory /data/gcc/tools/annovar_2011Jan31/annovar/humandb/
    33        * --geneBasedAnnotations refgene,knowngene,ensgene
    34        * --regionBasedAnnotations band,segdup,dgv,gwascatalog
    35        * --filterBasedAnnotations snp130
    36        
    37        Please refer to ANNOVAR website for a thorough description of the annotations: http://www.openbioinformatics.org/annovar/annovar_db.html
    38 
    39 
    40  * /results/immunochip
    41    * Contains all the results from the immunochip data (cleaned/QCed data, metrics, etc.)
    42  * /results/pipeline
    43    * Contains all the results from the sequence data through the GoNL pipeline by batch
    44      * Ex: Results on the first batch are in'' /gcc/data/projects/gonl/results/pipeline/first_batch''
    45    * The subdirectory structure for each of the batches should be the following:
    46      * All results related to a sample shoud go in /sample_name
    47        * Ex: All results related to sample A2a (first batch) should go in'' /data/gcc/projects/gonl/results/pipeline/first_batch/A2a''
    48      * All results related to a lane of a sample should go in /sample_name/lane_name
    49        * Ex: All results related to sample A2a (first batch), Lane FC20005_L1 should go in'' /data/gcc/projects/gonl/results/pipeline/first_batch/A2a''/FC20005_L1/
    5018
    5119=== Pipeline Result Files Naming Convention ===