wiki:BIOS_SampleBlacklist2

Version 1 (modified by jamverlouw, 8 years ago) (diff)

--

List of flagged samples run 1

Author: Peter-Bram 't Hoen
Date: 5-February-2014

There are three fields in the metadatabase to flag samples that should not be used in the final analysis: QC of the sequencing, genotype concordance and contamination. Here we report the final list of samples which are flagged in the metadatabase.

QC flag

These three samples are flagged because of too low number of reads:
Too few reads:
AC1JV9ACXX-1-10
AD1NE2ACXX-5-22
BD1NW4ACXX-3-27

Other reasons: See http://www.bbmriwiki.nl/wiki/FgQualityControl/FgQualityControlRun1
BD1NYRACXX-6-10 too low percentage of mapped reads, outlier on principal component 1,4,5,6
AD2CJPACXX-8-9 low exon correlation, outlier on principal component 1,11,14
BD1NR9ACXX-7-27 low percentage of mapped reads, outlier on principal component 4, likely degraded

Genotype concordance

This is the list of the samples from the different cohorts that have a mismatch between RNA and DNA genotypes that could not be corrected since there is no additional lab evidence suggesting a sample swap. For more information, see http://www.bbmriwiki.nl/wiki/FgSampleBlacklist
CODAM:
AD10W1ACXX-5-18
AD10W1ACXX-8-11
LL:
AC1C40ACXX-4-4
AD1GWFACXX-4-15
RS:
AD1NNNACXX-4-18
AC1JV9ACXX-1-13
BC1JTJACXX-6-7
BC1KAVACXX-8-13
LLS:
BD1NW4ACXX-7-13
BD1NYRACXX-2-27
BC1KBKACXX-5-8
AD1NAMACXX-7-19
BC1KBKACXX-5-6
AD2DATACXX-8-1
BD2D5MACXX-4-15
BD2D5MACXX-3-7
AD2DATACXX-3-21
AD2DATACXX-4-5
AD1NFNACXX-8-25
BD24PGACXX-7-8
BD2CPRACXX-1-12
AD2DATACXX-3-22
BD24PGACXX-7-10
BD2CPRACXX-1-21
BC1KBKACXX-5-3
AD1NFNACXX-8-27
BD24PGACXX-8-25
BC1KBKACXX-5-7
BD24PGACXX-6-12
BD1NW4ACXX-8-5
BC1KBKACXX-3-12
BC1KBKACXX-5-1
AD2CJPACXX-6-9
BD1NYRACXX-2-16
AD2CJPACXX-5-1
AD1NE2ACXX-4-22
BC1KBKACXX-5-2
BC1KBKACXX-5-5
BD1NYRACXX-3-1
BD2D5MACXX-2-25
BD1NYRACXX-4-19
AD1NFNACXX-4-8
BD1NYRACXX-2-15
BC1KBKACXX-5-4
AD1NFNACXX-5-15

Contaminations

No contaminations have been detected for CODAM, RS, LL. Only outlier samles in heterozygosity rates have already been spotted as mixups. See: http://www.bbmriwiki.nl/wiki/FgSampleBlacklist
The list of LLS samples that have high heterozygosity rate based on genotypes called from RNAseq data (all of them have low genotype concordance - see the list above):
BD1NYRACXX-2-27
BC1KBKACXX-5-8
BC1KBKACXX-5-6
BD24PGACXX-7-10
BC1KBKACXX-5-3
BC1KBKACXX-5-7
BD1NW4ACXX-8-5
BC1KBKACXX-5-1
BD1NYRACXX-2-16
AD1NE2ACXX-4-22
BC1KBKACXX-5-2
BC1KBKACXX-5-5
BD1NYRACXX-3-1
BD1NYRACXX-4-19
AD1NFNACXX-4-8
BC1KBKACXX-5-4
AD1NFNACXX-5-15

Overview RNAseq data April 14th 2014

Table I: Overview of the RNAseq data.


Biobank

Person Id

RNAseq run Id

Data part of

RP31

Passed QC

CODAM

192

191

191

188

LL

761

630

630

628

LLS

821

720

697(23)2

664

NTR

3300

-

-

-

RS

768

658

658

652

Total:


5842

2199

2176

2132

1All these samples are also part of the data freeze 1. 223 samples are not part of RP3.


There are 29 merged runs in the passed qc rnaseq data: 2 for CODAM, 17 for LL, 4 for LLS and 6 for RS.


Table II: Detail of samples that did not passed quality control.


Biobank

Not Passed QC

Bad quality

Genotype Discordance

Contaminated

CODAM

3

1

2

-

LL

2

-

2

-

LLS

341

3

322

17

RS

6

2

4

-

Total:

45

6

40

17


1One sample not part of RP3. 2Three LLS swaps identified by MixupMapper have been corrected. Additional evidence confirmed that these LLS samples were indeed swapped. These six samples are not included in the Genotype Discordance counts for LLS.


Table III: Overview genotype data (GoNL version 5 imputed)


Biobank

Person Id

Genotype1

Overlapping with Passed QC

RNAseq data

CODAM

192

188 (4)2

184 (4)3

LL

761

756 (5)

626 (2)4

LLS

821

765 (56)

654 (10)5

NTR

3300

1886 (1414)

-

RS

768

768

652

Total:


5842

4363 (1479)

2116 (16)

1Genotype information was extracted from the biobank genotype sample information files GoNL version 5 imputed data, directly extracted from the SRM. 2Numbers between brackets indicated the differences between number of person ids and genotype data available. 3Missing genotypes for: CODAM-2037, CODAM-2074, CODAM-2175 and CODAM-2555. 4Missing genotypes for: LL-LLDeep_0127 and LL-LLDeep_0519. 5These 10 samples are GoNL samples for which no genotype data was provided.

LLS update

For LLS the detected sample swaps were swapped back. The mixup mapper and genotype concordance check performed using the final sample sheets show no mixups and no strong outliers.