Changes between Initial Version and Version 1 of BIOS_Pipeline/ReplicationTest


Ignore:
Timestamp:
Sep 19, 2016 5:00:20 PM (8 years ago)
Author:
jamverlouw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BIOS_Pipeline/ReplicationTest

    v1 v1  
     1
     2= Replication test =
     3
     4The replication test was done on the Shark cluster in Leiden on 20 LLS samples. The pipeline can be found attached to this page. There are four files that are checked on similarity based on the md5sum. We checked the exon, gene and transcript counts as well as the bam file. (sorted and marked for duplicates).
     5
     6'''Exon and gene counts'''
     7
     8The initial result showed the exon counts to be similar. The contents of the gene counts, after removing the header, were identical also.
     9
     10'''BAM files'''
     11
     12For comparisson the BAMs were converted to SAM files. These are easier to read and showed different reads on the same location being ordered in different ways. The content was the same and the numerical sort on position was still correct, yet Samtools seemed to have a random order when multiple reads are reported to a single position. A sort with Linux's powertools confirmed this, output hereof was identical for random selected test sample BC1KBKACXX-4-22 LLS-346-130804.
     13
     14'''Transcript counts'''
     15
     16Files differ by 227 out of 193865 lines. There are actual different values in the file, so it's not due to headers or other static properties inside the files. An example of a difference:
     17
     18{{{
     19#!div style="font-size: 80%"
     20 1      protein_coding  transcript      44435680        44438393        .       +       .       transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 0.057163; length 1596; RPKM 0.000932
     21 1      protein_coding  transcript      44435680        44438393        .       +       .       transcript_id "ENST00000412950"; locus_id "1:44435672-44439041W"; gene_id "ENSG00000132768"; reads 2.282137; length 1596; RPKM 0.037210
     22}}}
     23
     24'''Replication outcome'''
     25
     26||= Bam file =||= 20x deviant =||
     27||= Sam from BAM =||= 20x deviant =||
     28||= Linux sorted SAM =||= '''Identical for tested sample''' =||
     29||  Exon count  ||  '''20x identical'''  ||
     30||= Transcript count =||= 20x deviant =||
     31||  Gene count  ||  '''20x identical''' (beside header)  ||
     32
     33Samples used for test:
     34{{{
     35#!div style="font-size: 80%"
     36AD1NAMACXX-1-6 LLS-453-130804,
     37AD1NAMACXX-5-14 LLS-640-130804,
     38AD1NAMACXX-8-4 LLS-786-130804,
     39AD1NE2ACXX-2-27 LLS-815-130804,
     40AD1NE2ACXX-3-11 LLS-64-130804,
     41AD1NE2ACXX-3-8 LLS-81-130804,
     42AD1NE2ACXX-4-15 LLS-128-130804,
     43AD1NE2ACXX-5-1 LLS-368-130804,
     44AD1NE2ACXX-6-11 LLS-113-130804,
     45AD1NE2ACXX-6-7 LLS-187-130804,
     46AD1NFNACXX-4-10 LLS-90-130804,
     47AD1NFNACXX-6-1 LLS-15-130804,
     48AD1NFNACXX-7-12 LLS-731-130804,
     49BC1KBKACXX-2-19 LLS-499-130804,
     50BC1KBKACXX-4-22 LLS-346-130804,
     51BC1KBKACXX-5-5 LLS-345-130804,
     52BC1KBKACXX-8-16 LLS-375-130804,
     53BD1NW4ACXX-7-12 LLS-34-130804,
     54BD1NW4ACXX-8-25 LLS-79-130804,
     55BD1NYRACXX-1-5 LLS-435-130804
     56}}}