java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 15 -R. /project/production/Indexes/samtools/hsapiens.hs37d5.fasta ...
Supp. Table S1. Compute Node Specifications.
Node Type Intel Xeon E52650 @ 2.00GHz Intel Xeon E5– 2670 @ 2.60 GHz
Threads 2 sockets x 8 cores x 2 threads = 32 hardware threads 2 sockets x 8 cores x 1 thread = 16 hardware threads
RAM 2 NUMA domains x 128 GB = 256 GB 2 NUMA domains x 64 GB = 128 GB RAM
1
Process Alignment Variant Calling
Supp. Table S2. Commands used to run tools. Tool and version BWA-MEM 0.7.8
Example Commands bwa-0.7.8 index -p $INDEX $REFERENCE bwa-0.7.8 mem -t 32 -R "$RGHEADER" $INDEX ${file}_1.fastq.gz ${file}_2.fastq.gz
GEM3
gem-indexer-3.1.0 -i $REFERENCE -o $INDEX gem-mapper-3.1.0 -I $INDEX -r "$RGHEADER" --i1 ${file}_1.fastq.gz --i2 ${file}_2.fastq.gz -o ${MAP_PATH}/${NAME}.sam
FreeBayes 0.9.20
freebayes-parallel NA12878_BWA_50xWGS.vcf
GATK 3.3 PrintReads
BaseRecalibrator
/
java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 15 -R /project/production/Indexes/samtools/hsapiens.hs37d5.fasta --input_file /NA12878_BWA_50xWGS.bam --knownSites GATK/bundle_1.5/hg19/dbsnp_135.hg19.no.chr.vcf.gz --knownSites GATK/bundle_1.5/hg19/Mills_and_1000G_gold_standard.indels.hg19.no.chr.vcf -dt NONE -et NO_ET -o NA12878_BWA_50xWGS.grp java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T PrintReads -nct 15 -R hsapiens.hs37d5.fasta- -input_file NA12878_BWA_50xWGS.bam -dt NONE -et NO_ET -o NA12878_BWA_50xWGS.bqsr.bam
GATK 3.3 HaplotypeCaller
java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T HaplotypeCaller --num_cpu_threads_per_data_thread 16 -I NA12878_BWA_Exome.bqsr.bam -R hsapiens.hs37d5.fasta --min_base_quality_score 10 -ERC GVCF -variant_index_type LINEAR --variant_index_parameter 128000 -GQB 20 -GQB 25 -GQB 30 -GQB 35 -GQB 40 -GQB 45 -GQB 50 -GQB 70 GQB 90 -GQB 99 -standard_call_conf 30 -standard_emit_conf 10 -o NA12878_BWA_Exome.g.vcf.gz
GATK 3.3 GenotypeGVCFs
java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T GenotypeGVCFs -R hsapiens.hs37d5.fasta -V NA12878_BWA_Exome.g.vcf.gz -o NA12878_BWA_Exome.vcf.gz -nt 16 -standard_call_conf 30 -standard_emit_conf 30
GATK 3.3 DepthOfCoverage
java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T DepthOfCoverage -nt 4 -R $FASTA -L $ROI_BED -o $OUTPUT_PREFIX --outputFormat table --omitLocusTable --omitDepthOutputAtEachBase -omitIntervalStatistics --nBins 9999 --start 1 --stop 10000 --countType COUNT_FRAGMENTS --includeRefNSites -minMappingQuality 20 -ct 8 -ct 10 -ct 20 -ct 30 -ct 50 -I $INPUT_BAM
SAMTOOLS/ BCFTOOLS 1.2 (normal)
samtools mpileup -ug -t DP,SP -f hsapiens.hs37d5.fasta -d 10000 -L 10000 /path/to/indel-realigned.bam | bcftools call -mv -f gq -O v -o NA12878_BWA_50xWGS.SAMTOOLS1.2_Bug_mv.vcf
SAMTOOLS/ BCFTOOLS 1.2 (fast)
samtools mpileup -Bug -t DP,SP -f hsapiens.hs37d5.fasta -d 10000 -L 10000 /path/to/indel-realigned.bam | bcftools call -mv -f gq -O v -o NA12878_BWA_50xWGS.SAMTOOLS1.2_Bug_mv.vcf
Control-FREEC v9.1 DELLY2 v0.7.3
freec -conf FREEC_ExomeTumourNormalConfigFile.NA12878MedExome.txt
delly call -t DEL -n -g hsapiens.hs37d5.fasta -o NA12878_BWA_50xWGS.delly2.DEL.bcf -x /delly/excludeTemplates/human.hg19.excl.tsv /path/to/indel-realigned.bam
2
Supp. Table S3. NIST Gold standard reference set source files.
NIST data set BED file of Reliably Callable regions VCF file of confidently called variants
Size of region/ Number of Variant positions
NIST/GIAB Source File
2,195,078,292nt
ftp://ftptrace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv2.18/union13callableMQonly merged_addcert_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRep SeqSTRs_noCNVs_v2.18_2mindatasets_5minYesNoRatio.bed
2,915,731 variant positions (2,915,728 unique)
ftp://ftptrace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv2.18/NISTIntegratedCalls_1 4datasets_131103_allcall_UGHapMerge_HetHomVarPASS_VQSRv2.18_all_nouncert_excl udesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs.vcf.gz
3
Supp. Table S4. Sequencing and mapping metrics (for BWA-MEM and GEM3) for the NA12878 Platinum Whole Genome and the Nimblegen MedExome. % Mismatch is the percentage of bases from the reads that mismatch the reference in reads with mapping quality >= 20. Strand Balance is the percentage of reads mapped to the positive strand of the reference.
Read Length
Number of Reads
% Mapped reads
% Mapped reads >=Q20
Average Insert size
SD insert size
BAM size (Mb)
% INDEL
% Reads Mapped in pairs
Strand Balance
0.3140
0.0181
99.81
50.0023
128036
0.2332
0.0242
99.19
50.0656
47.18
6643
0.2396
0.0105
99.88
49.9950
46.75
6754
0.2283
0.0243
99.69
50.0023
Experiment
Mapper
NA12878 Platinum Whole Genome
BWA-MEM
101
1708169546
99.71
94.28
317.82
73.95
120867
GEM3
101
1708169546
99.04
91.93
319.26
73.87
BWA-MEM
101/126
96588728
99.69
92.80
203.15
GEM3
101/126
96588728
99.51
92.26
204.11
NA12878 MedExome
4
% Mismatch
Supp. Table S5. Overlap between the Reliably Callable and Non-reliably Callable regions (defined by NIST v2.18), and the Medically Interpretable Genome (Patwardhan et al, 2015), and the mappable and non-mappable regions of the genome. ROI - region of interest. *Genome here refers to chromosomes 1-22, X, Y and mitochondrion of GRCh37.
ROI Genome Nistv2.18 Reliably Callable Nistv2.18 Non-reliably callable MedExome MedExome Nistv2.18 Reliably Callable MedExome Nistv2.18 Non-reliably callable MedExome MIG MedExome Non-MIG MIG MIG Nistv2.18 Reliably Callable MedExome MIG Nistv2.18 Non-reliably callable MedExome
ROI_size
%genome
3,095,693,981 2,195,098,847 900,593,687 46,584,178 34,714,576 11,869,602 11,572,064 35,012,114 11,733,933 8,961,946
100.0000 70.9081 29.0918 1.5048 1.1214 0.3834 0.3738 1.1310 0.3790 0.2895
2,610,118
0.0843
MAPPABLE RL300 MM2 size %ROI %genome 2,787,387,041 90.0408 90.0408 2,193,210,211 99.9140 70.8471 594,179,925 65.9765 19.1938 45,483,317 97.6368 1.4692 34,701,515 99.9624 1.1210 10,781,802 90.8354 0.3483 11,456,976 99.0055 0.3701 34,026,341 97.1845 1.0992 11,605,917 98.9090 0.3749 8,957,718 99.9528 0.2894 2,499,258
5
95.7527
0.0807
NON_MAPPABLE RL300 MM2 size %ROI %genome 308,306,940 9.9592 9.9592 1,888,658 0.0860 0.0610 306,413,784 34.0235 9.8981 1,100,861 2.3632 0.0356 13,061 0.0376 0.0004 1,087,800 9.1646 0.0351 115,088 0.9945 0.0037 985,773 2.8155 0.0318 128,016 1.0910 0.0041 4,228 0.0472 0.0001 110,860
4.2473
0.0036
Supp. Table S6. Overlap between the Reliably Callable and Non-reliably Callable regions (defined by NIST v2.18) and the mappable and nonmappable regions of the Nimblegen MedExome. ROI - region of interest.
Mappable Region of Interest MedExome MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Non-reliably Callable
Length 46,584,178 34,714,576 11,869,602
% Genome / Exome 100 74.52 25.48
6
Length 45,483,317 34,701,515 10,781,802
%ROI 97.64 99.96 90.84
Non-mappable % Genome 97.64 74.49 23.14
Length 1,100,861 13,061 1,087,800
%ROI 2.36 0.04 9.16
% Genome 2.36 0.03 2.34
Supp. Table S7. Coverage metrics obtained for BWA-MEM and GEM3 with the WGS data on all combinations of Reliably Callable (defined by NIST v2.18), mappable, and the Medically Interpretable Genome (Patwardhan et al, 2015) regions. ROI - region of interest; C8, C10, C20, C30 and C50 - percentage of ROI covered by at least 8, 10, 20, 30 or 50 reads respectively.
Mapper GEM3 BWA-MEMMEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM
Genome
Mappability All
ROI size 3,095,693,981
Mean coverage 49.22
Median coverage 52
C8 90.32
C10 90.24
C20 89.69
C30 88.32
C50 55.81
Genome Genome Genome Genome Genome Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable MedExome MedExome
All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All
3,095,693,981 2,787,387,041 2,787,387,041 308,306,940 308,306,940 2,195,098,847 2,195,098,847 2,193,209,437 2,193,209,437 1,889,410 1,889,410 900,595,134 900,595,134 594,177,604 594,177,604 306,417,530 306,417,530 46,584,178 46,584,178
49.94 54.19 55.03 4.30 3.90 54.94 55.66 54.94 55.66 53.19 51.54 35.28 35.98 51.42 52.67 4.00 3.61 51.24 51.88
53 53 54 0 0 54 54 54 54 44 42 44 45 52 53 0 0 51 51
90.27 99.20 99.23 10.04 9.28 99.99 99.99 99.99 99.99 99.63 99.41 66.76 66.58 96.29 96.41 9.49 8.73 98.37 98.36
90.19 99.17 99.20 9.52 8.74 99.99 99.99 99.99 99.99 99.42 99.09 66.48 66.30 96.13 96.27 8.97 8.18 98.30 98.29
89.73 98.81 98.95 7.21 6.41 99.91 99.96 99.91 99.96 96.03 94.15 64.78 64.81 94.75 95.20 6.66 5.86 97.84 97.89
88.79 97.51 98.10 5.25 4.57 99.22 99.56 99.23 99.58 85.00 80.33 61.74 62.52 91.12 92.65 4.76 4.10 96.23 96.63
58.55 61.76 64.83 2.03 1.78 63.72 66.57 63.75 66.60 29.82 25.80 36.54 39.02 54.43 58.29 1.86 1.63 50.99 53.55
ROI
7
GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM
MedExome MedExome MedExome MedExome MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MIG MIG MIG MIG MIG MIG MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable
Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable
45,483,317 45,483,317 1,100,861 1,100,861 11,572,064 11,572,064 11,456,976 11,456,976 115,088 115,088 35,012,114 35,012,114 34,026,341 34,026,341 985,773 985,773 11,733,933 11,733,933 11,605,917 11,605,917 128,016 128,016
52.08 52.75 16.64 15.93 51.60 52.23 51.90 52.54 21.44 20.94 51.12 51.77 52.14 52.83 16.08 15.35 51.56 52.19 51.90 52.54 20.99 20.49
51 52 2 2 51 51 51 51 15 14 51 51 51 52 1 1 51 51 51 51 14 13
99.74 99.75 41.90 40.72 99.51 99.51 99.93 99.94 57.26 56.44 97.99 97.98 99.67 99.69 40.11 38.89 99.44 99.45 99.92 99.93 56.50 55.64
99.72 99.74 39.63 38.35 99.48 99.48 99.92 99.94 55.12 54.01 97.91 97.89 99.65 99.67 37.82 36.52 99.41 99.41 99.91 99.92 54.17 53.06
99.50 99.58 29.24 27.99 99.21 99.26 99.78 99.84 43.02 42.33 97.38 97.43 99.40 99.49 27.63 26.32 99.11 99.16 99.74 99.80 41.67 41.04
98.05 98.50 20.87 19.61 97.79 98.18 98.45 98.85 32.19 31.21 95.71 96.12 97.92 98.38 19.55 18.26 97.63 98.04 98.37 98.79 30.68 29.75
52.05 54.69 7.30 6.68 50.56 53.11 50.96 53.54 10.02 9.58 51.14 53.70 52.41 55.07 6.98 6.34 50.41 52.96 50.86 53.45 9.63 9.18
All
8,961,946
52.23
51
100.00
100.00
99.96
99.03
52.16
All
8,961,946
52.84
52
100.00
100.00
99.98
99.31
54.70
Mappable
8,957,718
52.24
51
100.00
100.00
99.96
99.03
52.17
Mappable
8,957,718
52.84
52
100.00
100.00
99.98
99.32
54.71
4,228 4,228
42.52 42.35
42 41
100.00 99.98
99.76 99.65
97.02 96.74
80.82 80.44
20.93 20.44
Non-mappable Non-mappable
8
GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM
MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome
All
2,610,118
49.42
49
97.83
97.70
96.65
93.55
45.06
All
2,610,118
50.14
50
97.85
97.71
96.81
94.32
47.65
Mappable
2,499,258
50.70
50
99.70
99.66
99.12
96.36
46.64
Mappable
2,499,258
51.47
50
99.76
99.72
99.32
97.20
49.35
Non-mappable
110,860
20.63
13
55.63
53.42
40.96
30.34
9.61
Non-mappable
110,860
20.13
12
54.78
52.27
40.26
29.34
9.16
9
Supp. Table S8. Coverage metrics obtained by BWA-MEM and GEM3 with the WES data on all combinations of Reliably Callable (defined by NIST v2.18), mappable, and the Medically Interpretable Genome (Patwardhan et al, 2015) regions for the Nimblegen MedExome. ROI - region of interest; C8, C10, C20, C30 and C50 - percentage of ROI covered by at least 8, 10, 20, 30 or 50 reads respectively.
Mapper GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3
ROI MedExome MedExome MedExome MedExome MedExome MedExome MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome MIG
ROI size 46,584,178 46,584,178 45,483,317 45,483,317 1,100,861 1,100,861 34,714,576 34,714,576 34,701,515 34,701,515 13,061 13,061
Mean coverage 91.52 88.37 89.70 89.59 166.85 38.04 88.45 88.79 88.43 88.78 127.66 113.08
Median coverage 79 79 79 79 93 0 79 80 79 80 119 100
All
11,869,602
100.53
All
11,869,602
Mappable Mappable
Mappability All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable
98.87 97.71 99.21 99.14 84.75 38.55 99.59 99.59 99.59 99.59 99.53 98.35
C10 98.65 97.47 99.00 98.93 84.15 37.51 99.43 99.43 99.43 99.44 99.32 97.79
C20 96.96 95.75 97.34 97.27 81.23 33.14 98.03 98.05 98.03 98.05 98.24 95.92
C30 93.62 92.43 94.00 93.95 77.87 29.57 94.91 94.97 94.91 94.97 97.00 93.57
C50 79.63 78.58 79.88 79.91 69.49 23.33 80.77 80.96 80.77 80.96 90.76 85.89
80
96.76
96.36
93.81
89.85
76.29
87.16
75
92.19
91.74
89.01
85.01
71.61
10,781,802
93.79
79
97.98
97.61
95.10
91.08
77.00
10,781,802
92.20
78
97.68
97.29
94.73
90.68
76.55
Non-mappable
1,087,800
167.32
92
84.57
83.97
81.02
77.64
69.24
Non-mappable All
1,087,800 11,572,064
37.14 102.95
0 90
37.84 99.77
36.78 99.72
32.39 99.31
28.80 98.26
22.58 89.91
10
C8
BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM
MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MIG MIG MIG MIG MIG MIG MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome
All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable
11,572,064 11,456,976 11,456,976 115,088 115,088 35,012,114 35,012,114 34,026,341 34,026,341 985,773 985,773 11,733,933 11,733,933 11,605,917 11,605,917 128,016 128,016
102.29 102.64 102.79 133.48 52.24 87.75 83.77 85.34 85.14 170.75 36.39 103.04 102.26 102.70 102.82 133.81 51.31
90 90 90 98 13 76 75 75 76 92 0 90 90 90 90 98 13
99.33 99.82 99.79 95.12 52.72 98.57 97.17 99.00 98.92 83.54 36.90 99.74 99.24 99.79 99.75 95.15 52.48
99.26 99.77 99.74 94.80 51.57 98.29 96.88 98.74 98.65 82.91 35.86 99.68 99.17 99.73 99.70 94.83 51.27
98.81 99.37 99.34 93.51 46.86 96.18 94.74 96.65 96.57 79.79 31.54 99.24 98.68 99.30 99.26 93.28 46.43
97.76 98.34 98.31 90.41 43.04 92.09 90.67 92.54 92.49 76.40 28.00 98.17 97.60 98.26 98.20 89.97 42.36
89.50 90.01 90.05 80.00 34.72 76.23 74.96 76.46 76.50 68.26 22.00 89.76 89.29 89.88 89.90 79.22 34.15
All
8,961,946
101.82
90
99.92
99.88
99.60
98.72
90.54
All
8,961,946
102.23
90
99.92
99.88
99.60
98.73
90.67
Mappable
8,957,718
101.79
90
99.92
99.88
99.60
98.72
90.54
Mappable
8,957,718
102.21
90
99.92
99.89
99.60
98.73
90.66
Non-mappable
4,228
155.81
150
100.00
100.00
100.00
99.93
97.99
Non-mappable
4,228
140.12
139
99.01
98.72
98.37
98.11
94.42
All
2,610,118
106.82
91
99.28
99.16
98.30
96.71
87.77
All
2,610,118
102.49
89
97.30
97.13
96.10
94.40
85.51
11
GEM3 BWA-MEM GEM3 BWA-MEM
MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome
Mappable
2,499,258
105.67
91
99.47
99.36
98.53
97.01
88.15
Mappable
2,499,258
104.87
91
99.35
99.23
98.37
96.77
87.87
Non-mappable
110,860
132.63
96
94.93
94.60
93.27
90.04
79.32
Non-mappable
110,860
48.89
10
50.96
49.77
44.90
40.94
32.44
12
MedExome Insertions
MedExome Deletions
MedExome SNVs
Supp. Table S9. Summary of Variant calling for 8 pipelines for the WES sample. TP – true positives ; FP – false positives ; FN – false negatives ; specificity - number of TP calls as a proportion of Total Calls ; sensitivity - number of TP calls as a proportion of the number of NIST reference set calls ; F1-score – measure of overall accuracy calculated as (2 x TP) / ( (2 x TP) + FP + FN). Dataset Total Calls NIST v2.18 Gold Standard 24343 BWA + FreeBayes 24310 BWA + HaplotypeCaller 24346 BWA + SAMtools fast 24347 BWA + SAMtools normal 24306 GEM3 + FreeBayes 24347 GEM3 + HaplotypeCaller 24635 GEM3 + SAMtools fast 24396 GEM3 + SAMtools normal 24362 NIST v2.18 Gold Standard 292 BWA + FreeBayes 288 BWA + HaplotypeCaller 389 BWA + SAMtools fast 278 BWA + SAMtools normal 281 GEM3 + FreeBayes 284 GEM3 + HaplotypeCaller 391 GEM3 + SAMtools fast 286 GEM3 + SAMtools normal 287 NIST v2.18 Gold Standard 355 BWA + FreeBayes 330 BWA + HaplotypeCaller 392 BWA + SAMtools fast 370 BWA + SAMtools normal 367 GEM3 + FreeBayes 322 GEM3 + HaplotypeCaller 396 GEM3 + SAMtools fast 423 GEM3 + SAMtools normal 422
TP
FP FN Specificity Sensitivity F1 score
24274 36 69 24285 61 58 24292 55 51 24271 35 72 24248 99 95 24264 371 79 24269 127 74 24256 106 87
0.9985 0.9975 0.9977 0.9986 0.9959 0.9849 0.9948 0.9956
0.9972 0.9976 0.9979 0.997 0.9961 0.9968 0.997 0.9964
0.9978 0.9976 0.9978 0.9978 0.996 0.9908 0.9959 0.996
256 32 36 279 110 13 251 27 41 252 29 40 254 30 38 275 116 17 250 36 42 251 36 41
0.8889 0.7172 0.9029 0.8968 0.8944 0.7033 0.8741 0.8746
0.8767 0.9555 0.8596 0.863 0.8699 0.9418 0.8562 0.8596
0.8828 0.8194 0.8807 0.8796 0.8819 0.8053 0.8651 0.867
311 19 44 343 49 12 310 60 45 310 57 45 304 18 51 345 51 10 305 118 50 305 117 50
0.9424 0.875 0.8378 0.8447 0.9441 0.8712 0.721 0.7227
0.8761 0.9662 0.8732 0.8732 0.8563 0.9718 0.8592 0.8592
0.908 0.9183 0.8552 0.8587 0.8981 0.9188 0.7841 0.7851
13
Supp. Table S10. Summary of Variant calling for 8 analysis pipelines for the WGS sample, without the requirement of genotype called to be equivalent to the NIST call. TP – true positives ; FP – false positives ; FN – false negatives ; specificity - number of TP calls as a proportion of the Total Calls ; sensitivity - number of TP calls as a proportion of the number of NIST reference set calls ; F1-score – measure of overall accuracy calculated as (2 x TP) / ((2 x TP)+FP+FN) ; % reduction FP/TP indicates the reduction in the number of FP or TP with respect to the
Dataset
Total Calls
NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal NIST v2.18 Gold Standard BWA + FreeBayes
2740732 2744545 2748582 2748866 2736410 2742937 2745423 2749554 2736871 85958 82263 86323 77671 77712 81602 86132 80905 80955 84583 78592
WGS Inser tions
WGS Deletions
WGS SNVs
requirement of GT equivalence (compare with Main Table 1).
TP
FP
FN
2738639 2738661 2739295 2733400 2736360 2738808 2738181 2732989
5906 9921 9571 3010 6577 6615 11373 3882
79354 85070 69444 69469 78172 84973 69861 69891 76350
% Reduction FP
% Reduction FN
0.99854 0.99782 0.99799 0.99811 0.99800 0.99844 0.99746 0.99788
6.9 2.3 7.8 14.7 10.6 5.6 11.4 14.8
17.3 10.2 35.9 6.6 15.1 17.0 36.4 8.0
0.92317 0.98967 0.80788 0.80817 0.90942 0.98854 0.81273 0.81308
0.94345 0.98757 0.84880 0.84889 0.93306 0.98754 0.83735 0.83745
55.9 18.3 9.4 9.4 38.8 14.1 6.5 6.5
35.8 24.0 4.9 4.9 21.8 16.2 4.5 4.6
0.90266
0.93581
52.3
23.0
Specificity
Sensitivity
2093 2071 1437 7332 4372 1924 2551 7743
0.99785 0.99639 0.99652 0.99890 0.99760 0.99759 0.99586 0.99858
0.99924 0.99924 0.99948 0.99732 0.99840 0.99930 0.99907 0.99717
2909 1253 8227 8243 3430 1159 11044 11064
6604 888 16514 16489 7786 985 16097 16067
0.96464 0.98548 0.89408 0.89393 0.95797 0.98654 0.86349 0.86333
2242
8233
0.97147
14
F1 score
BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal
84521 79762 79762 78417 83973 92775 92795
83600 69835 69842 74771 83314 72602 72613
921 9927 9920 3646 659 20173 20182
983 14748 14741 9812 1269 11981 11970
0.98910 0.87554 0.87563 0.95350 0.99215 0.78256 0.78251
15
0.98838 0.82564 0.82572 0.88400 0.98500 0.85835 0.85848
0.98874 0.84986 0.84994 0.91744 0.98856 0.81871 0.81874
12.1 4.3 4.3 30.7 15.9 3.2 3.2
11.4 2.9 2.9 14.1 9.0 5.3 5.3
Supp. Table S11. Summary of Copy Number Variant (CNV) event detection using Control-FREEC on the WGS alignments individually and run as a pseudo tumour-normal pair. Significant events are those for which the p-value < (0.05/Total Observed Events), i.e. Bonferroni correction. Three identical significant events were identified for each alignment set individually, and no significant events were identified when run as a pseudo tumour-normal pair.
Total Observed Events
Total Shared Events
Total Gains
Total Losses
Total Significant Observed Events
Total Shared Significant Events
BWA-MEM BAM GEM3 BAM
31 33
27 27
10 12
21 21
3 3
3 3
Total Significant Observed Gains 1 1
BWA-MEM v. GEM3 BAM
7
NA
7
0
0
NA
0
Sample
16
Total Significant Observed Losses 2 2 0
Supp. Table S12. Summary of structural variant (SV) event detection using DELLY2 on the WGS alignments. “PASS” and “PRECISE” are provided in the output VCF produced by DELLY2, where PASS indicate that DELLY2 believes the event is bona fide, while PRECISE indicates that the break-points could be exactly identified. The intersect column is the number of events described identically as PASS and PRECISE in both the GEM3 and BWA-MEM alignments for a particular event type. The concordance column shows the proportion of the intersect value for each event type. Alignment
Event Type
Total
“PASS”
PASS & PRECISE
BWA-MEM BAM GEM3 BAM BWA-MEM BAM GEM3 BAM BWA-MEM BAM GEM3 BAM
Deletion Deletion Duplication Duplication Inversion Inversion
9508 7455 2041 738 6194 1907
2755 2490 780 390 897 442
1477 1407 186 113 217 143
Intersect of PASS & PRECISE 1217 1217 69 69 112 112
17
Concordance between samples 0.82 0.86 0.37 0.61 0.52 0.78