Supp. Table S1. Compute Node Specifications.

14 downloads 0 Views 781KB Size Report
java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 15 -R. /project/production/Indexes/samtools/hsapiens.hs37d5.fasta ...
Supp. Table S1. Compute Node Specifications.

Node Type Intel Xeon E52650 @ 2.00GHz Intel Xeon E5– 2670 @ 2.60 GHz

Threads 2 sockets x 8 cores x 2 threads = 32 hardware threads 2 sockets x 8 cores x 1 thread = 16 hardware threads

RAM 2 NUMA domains x 128 GB = 256 GB 2 NUMA domains x 64 GB = 128 GB RAM

1

Process Alignment Variant Calling

Supp. Table S2. Commands used to run tools. Tool and version BWA-MEM 0.7.8

Example Commands bwa-0.7.8 index -p $INDEX $REFERENCE bwa-0.7.8 mem -t 32 -R "$RGHEADER" $INDEX ${file}_1.fastq.gz ${file}_2.fastq.gz

GEM3

gem-indexer-3.1.0 -i $REFERENCE -o $INDEX gem-mapper-3.1.0 -I $INDEX -r "$RGHEADER" --i1 ${file}_1.fastq.gz --i2 ${file}_2.fastq.gz -o ${MAP_PATH}/${NAME}.sam

FreeBayes 0.9.20

freebayes-parallel NA12878_BWA_50xWGS.vcf

GATK 3.3 PrintReads

BaseRecalibrator

/

java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 15 -R /project/production/Indexes/samtools/hsapiens.hs37d5.fasta --input_file /NA12878_BWA_50xWGS.bam --knownSites GATK/bundle_1.5/hg19/dbsnp_135.hg19.no.chr.vcf.gz --knownSites GATK/bundle_1.5/hg19/Mills_and_1000G_gold_standard.indels.hg19.no.chr.vcf -dt NONE -et NO_ET -o NA12878_BWA_50xWGS.grp java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T PrintReads -nct 15 -R hsapiens.hs37d5.fasta- -input_file NA12878_BWA_50xWGS.bam -dt NONE -et NO_ET -o NA12878_BWA_50xWGS.bqsr.bam

GATK 3.3 HaplotypeCaller

java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T HaplotypeCaller --num_cpu_threads_per_data_thread 16 -I NA12878_BWA_Exome.bqsr.bam -R hsapiens.hs37d5.fasta --min_base_quality_score 10 -ERC GVCF -variant_index_type LINEAR --variant_index_parameter 128000 -GQB 20 -GQB 25 -GQB 30 -GQB 35 -GQB 40 -GQB 45 -GQB 50 -GQB 70 GQB 90 -GQB 99 -standard_call_conf 30 -standard_emit_conf 10 -o NA12878_BWA_Exome.g.vcf.gz

GATK 3.3 GenotypeGVCFs

java -Xmx120g -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T GenotypeGVCFs -R hsapiens.hs37d5.fasta -V NA12878_BWA_Exome.g.vcf.gz -o NA12878_BWA_Exome.vcf.gz -nt 16 -standard_call_conf 30 -standard_emit_conf 30

GATK 3.3 DepthOfCoverage

java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar -T DepthOfCoverage -nt 4 -R $FASTA -L $ROI_BED -o $OUTPUT_PREFIX --outputFormat table --omitLocusTable --omitDepthOutputAtEachBase -omitIntervalStatistics --nBins 9999 --start 1 --stop 10000 --countType COUNT_FRAGMENTS --includeRefNSites -minMappingQuality 20 -ct 8 -ct 10 -ct 20 -ct 30 -ct 50 -I $INPUT_BAM

SAMTOOLS/ BCFTOOLS 1.2 (normal)

samtools mpileup -ug -t DP,SP -f hsapiens.hs37d5.fasta -d 10000 -L 10000 /path/to/indel-realigned.bam | bcftools call -mv -f gq -O v -o NA12878_BWA_50xWGS.SAMTOOLS1.2_Bug_mv.vcf

SAMTOOLS/ BCFTOOLS 1.2 (fast)

samtools mpileup -Bug -t DP,SP -f hsapiens.hs37d5.fasta -d 10000 -L 10000 /path/to/indel-realigned.bam | bcftools call -mv -f gq -O v -o NA12878_BWA_50xWGS.SAMTOOLS1.2_Bug_mv.vcf

Control-FREEC v9.1 DELLY2 v0.7.3

freec -conf FREEC_ExomeTumourNormalConfigFile.NA12878MedExome.txt

delly call -t DEL -n -g hsapiens.hs37d5.fasta -o NA12878_BWA_50xWGS.delly2.DEL.bcf -x /delly/excludeTemplates/human.hg19.excl.tsv /path/to/indel-realigned.bam

2

Supp. Table S3. NIST Gold standard reference set source files.

NIST data set BED file of Reliably Callable regions VCF file of confidently called variants

Size of region/ Number of Variant positions

NIST/GIAB Source File

2,195,078,292nt

ftp://ftptrace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv2.18/union13callableMQonly merged_addcert_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRep SeqSTRs_noCNVs_v2.18_2mindatasets_5minYesNoRatio.bed

2,915,731 variant positions (2,915,728 unique)

ftp://ftptrace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv2.18/NISTIntegratedCalls_1 4datasets_131103_allcall_UGHapMerge_HetHomVarPASS_VQSRv2.18_all_nouncert_excl udesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs.vcf.gz

3

Supp. Table S4. Sequencing and mapping metrics (for BWA-MEM and GEM3) for the NA12878 Platinum Whole Genome and the Nimblegen MedExome. % Mismatch is the percentage of bases from the reads that mismatch the reference in reads with mapping quality >= 20. Strand Balance is the percentage of reads mapped to the positive strand of the reference.

Read Length

Number of Reads

% Mapped reads

% Mapped reads >=Q20

Average Insert size

SD insert size

BAM size (Mb)

% INDEL

% Reads Mapped in pairs

Strand Balance

0.3140

0.0181

99.81

50.0023

128036

0.2332

0.0242

99.19

50.0656

47.18

6643

0.2396

0.0105

99.88

49.9950

46.75

6754

0.2283

0.0243

99.69

50.0023

Experiment

Mapper

NA12878 Platinum Whole Genome

BWA-MEM

101

1708169546

99.71

94.28

317.82

73.95

120867

GEM3

101

1708169546

99.04

91.93

319.26

73.87

BWA-MEM

101/126

96588728

99.69

92.80

203.15

GEM3

101/126

96588728

99.51

92.26

204.11

NA12878 MedExome

4

% Mismatch

Supp. Table S5. Overlap between the Reliably Callable and Non-reliably Callable regions (defined by NIST v2.18), and the Medically Interpretable Genome (Patwardhan et al, 2015), and the mappable and non-mappable regions of the genome. ROI - region of interest. *Genome here refers to chromosomes 1-22, X, Y and mitochondrion of GRCh37.

ROI Genome Nistv2.18 Reliably Callable Nistv2.18 Non-reliably callable MedExome MedExome Nistv2.18 Reliably Callable MedExome Nistv2.18 Non-reliably callable MedExome MIG MedExome Non-MIG MIG MIG Nistv2.18 Reliably Callable MedExome MIG Nistv2.18 Non-reliably callable MedExome

ROI_size

%genome

3,095,693,981 2,195,098,847 900,593,687 46,584,178 34,714,576 11,869,602 11,572,064 35,012,114 11,733,933 8,961,946

100.0000 70.9081 29.0918 1.5048 1.1214 0.3834 0.3738 1.1310 0.3790 0.2895

2,610,118

0.0843

MAPPABLE RL300 MM2 size %ROI %genome 2,787,387,041 90.0408 90.0408 2,193,210,211 99.9140 70.8471 594,179,925 65.9765 19.1938 45,483,317 97.6368 1.4692 34,701,515 99.9624 1.1210 10,781,802 90.8354 0.3483 11,456,976 99.0055 0.3701 34,026,341 97.1845 1.0992 11,605,917 98.9090 0.3749 8,957,718 99.9528 0.2894 2,499,258

5

95.7527

0.0807

NON_MAPPABLE RL300 MM2 size %ROI %genome 308,306,940 9.9592 9.9592 1,888,658 0.0860 0.0610 306,413,784 34.0235 9.8981 1,100,861 2.3632 0.0356 13,061 0.0376 0.0004 1,087,800 9.1646 0.0351 115,088 0.9945 0.0037 985,773 2.8155 0.0318 128,016 1.0910 0.0041 4,228 0.0472 0.0001 110,860

4.2473

0.0036

Supp. Table S6. Overlap between the Reliably Callable and Non-reliably Callable regions (defined by NIST v2.18) and the mappable and nonmappable regions of the Nimblegen MedExome. ROI - region of interest.

Mappable Region of Interest MedExome MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Non-reliably Callable

Length 46,584,178 34,714,576 11,869,602

% Genome / Exome 100 74.52 25.48

6

Length 45,483,317 34,701,515 10,781,802

%ROI 97.64 99.96 90.84

Non-mappable % Genome 97.64 74.49 23.14

Length 1,100,861 13,061 1,087,800

%ROI 2.36 0.04 9.16

% Genome 2.36 0.03 2.34

Supp. Table S7. Coverage metrics obtained for BWA-MEM and GEM3 with the WGS data on all combinations of Reliably Callable (defined by NIST v2.18), mappable, and the Medically Interpretable Genome (Patwardhan et al, 2015) regions. ROI - region of interest; C8, C10, C20, C30 and C50 - percentage of ROI covered by at least 8, 10, 20, 30 or 50 reads respectively.

Mapper GEM3 BWA-MEMMEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM

Genome

Mappability All

ROI size 3,095,693,981

Mean coverage 49.22

Median coverage 52

C8 90.32

C10 90.24

C20 89.69

C30 88.32

C50 55.81

Genome Genome Genome Genome Genome Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable Genome NIST v2.18 Non-reliably Callable MedExome MedExome

All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All

3,095,693,981 2,787,387,041 2,787,387,041 308,306,940 308,306,940 2,195,098,847 2,195,098,847 2,193,209,437 2,193,209,437 1,889,410 1,889,410 900,595,134 900,595,134 594,177,604 594,177,604 306,417,530 306,417,530 46,584,178 46,584,178

49.94 54.19 55.03 4.30 3.90 54.94 55.66 54.94 55.66 53.19 51.54 35.28 35.98 51.42 52.67 4.00 3.61 51.24 51.88

53 53 54 0 0 54 54 54 54 44 42 44 45 52 53 0 0 51 51

90.27 99.20 99.23 10.04 9.28 99.99 99.99 99.99 99.99 99.63 99.41 66.76 66.58 96.29 96.41 9.49 8.73 98.37 98.36

90.19 99.17 99.20 9.52 8.74 99.99 99.99 99.99 99.99 99.42 99.09 66.48 66.30 96.13 96.27 8.97 8.18 98.30 98.29

89.73 98.81 98.95 7.21 6.41 99.91 99.96 99.91 99.96 96.03 94.15 64.78 64.81 94.75 95.20 6.66 5.86 97.84 97.89

88.79 97.51 98.10 5.25 4.57 99.22 99.56 99.23 99.58 85.00 80.33 61.74 62.52 91.12 92.65 4.76 4.10 96.23 96.63

58.55 61.76 64.83 2.03 1.78 63.72 66.57 63.75 66.60 29.82 25.80 36.54 39.02 54.43 58.29 1.86 1.63 50.99 53.55

ROI

7

GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM

MedExome MedExome MedExome MedExome MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MIG MIG MIG MIG MIG MIG MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable

Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable

45,483,317 45,483,317 1,100,861 1,100,861 11,572,064 11,572,064 11,456,976 11,456,976 115,088 115,088 35,012,114 35,012,114 34,026,341 34,026,341 985,773 985,773 11,733,933 11,733,933 11,605,917 11,605,917 128,016 128,016

52.08 52.75 16.64 15.93 51.60 52.23 51.90 52.54 21.44 20.94 51.12 51.77 52.14 52.83 16.08 15.35 51.56 52.19 51.90 52.54 20.99 20.49

51 52 2 2 51 51 51 51 15 14 51 51 51 52 1 1 51 51 51 51 14 13

99.74 99.75 41.90 40.72 99.51 99.51 99.93 99.94 57.26 56.44 97.99 97.98 99.67 99.69 40.11 38.89 99.44 99.45 99.92 99.93 56.50 55.64

99.72 99.74 39.63 38.35 99.48 99.48 99.92 99.94 55.12 54.01 97.91 97.89 99.65 99.67 37.82 36.52 99.41 99.41 99.91 99.92 54.17 53.06

99.50 99.58 29.24 27.99 99.21 99.26 99.78 99.84 43.02 42.33 97.38 97.43 99.40 99.49 27.63 26.32 99.11 99.16 99.74 99.80 41.67 41.04

98.05 98.50 20.87 19.61 97.79 98.18 98.45 98.85 32.19 31.21 95.71 96.12 97.92 98.38 19.55 18.26 97.63 98.04 98.37 98.79 30.68 29.75

52.05 54.69 7.30 6.68 50.56 53.11 50.96 53.54 10.02 9.58 51.14 53.70 52.41 55.07 6.98 6.34 50.41 52.96 50.86 53.45 9.63 9.18

All

8,961,946

52.23

51

100.00

100.00

99.96

99.03

52.16

All

8,961,946

52.84

52

100.00

100.00

99.98

99.31

54.70

Mappable

8,957,718

52.24

51

100.00

100.00

99.96

99.03

52.17

Mappable

8,957,718

52.84

52

100.00

100.00

99.98

99.32

54.71

4,228 4,228

42.52 42.35

42 41

100.00 99.98

99.76 99.65

97.02 96.74

80.82 80.44

20.93 20.44

Non-mappable Non-mappable

8

GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM

MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome

All

2,610,118

49.42

49

97.83

97.70

96.65

93.55

45.06

All

2,610,118

50.14

50

97.85

97.71

96.81

94.32

47.65

Mappable

2,499,258

50.70

50

99.70

99.66

99.12

96.36

46.64

Mappable

2,499,258

51.47

50

99.76

99.72

99.32

97.20

49.35

Non-mappable

110,860

20.63

13

55.63

53.42

40.96

30.34

9.61

Non-mappable

110,860

20.13

12

54.78

52.27

40.26

29.34

9.16

9

Supp. Table S8. Coverage metrics obtained by BWA-MEM and GEM3 with the WES data on all combinations of Reliably Callable (defined by NIST v2.18), mappable, and the Medically Interpretable Genome (Patwardhan et al, 2015) regions for the Nimblegen MedExome. ROI - region of interest; C8, C10, C20, C30 and C50 - percentage of ROI covered by at least 8, 10, 20, 30 or 50 reads respectively.

Mapper GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3

ROI MedExome MedExome MedExome MedExome MedExome MedExome MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome NIST v2.18 Non-reliably Callable MedExome MIG

ROI size 46,584,178 46,584,178 45,483,317 45,483,317 1,100,861 1,100,861 34,714,576 34,714,576 34,701,515 34,701,515 13,061 13,061

Mean coverage 91.52 88.37 89.70 89.59 166.85 38.04 88.45 88.79 88.43 88.78 127.66 113.08

Median coverage 79 79 79 79 93 0 79 80 79 80 119 100

All

11,869,602

100.53

All

11,869,602

Mappable Mappable

Mappability All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable

98.87 97.71 99.21 99.14 84.75 38.55 99.59 99.59 99.59 99.59 99.53 98.35

C10 98.65 97.47 99.00 98.93 84.15 37.51 99.43 99.43 99.43 99.44 99.32 97.79

C20 96.96 95.75 97.34 97.27 81.23 33.14 98.03 98.05 98.03 98.05 98.24 95.92

C30 93.62 92.43 94.00 93.95 77.87 29.57 94.91 94.97 94.91 94.97 97.00 93.57

C50 79.63 78.58 79.88 79.91 69.49 23.33 80.77 80.96 80.77 80.96 90.76 85.89

80

96.76

96.36

93.81

89.85

76.29

87.16

75

92.19

91.74

89.01

85.01

71.61

10,781,802

93.79

79

97.98

97.61

95.10

91.08

77.00

10,781,802

92.20

78

97.68

97.29

94.73

90.68

76.55

Non-mappable

1,087,800

167.32

92

84.57

83.97

81.02

77.64

69.24

Non-mappable All

1,087,800 11,572,064

37.14 102.95

0 90

37.84 99.77

36.78 99.72

32.39 99.31

28.80 98.26

22.58 89.91

10

C8

BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM GEM3 BWA-MEM

MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MedExome non-MIG MIG MIG MIG MIG MIG MIG MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome

All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable All All Mappable Mappable Non-mappable Non-mappable

11,572,064 11,456,976 11,456,976 115,088 115,088 35,012,114 35,012,114 34,026,341 34,026,341 985,773 985,773 11,733,933 11,733,933 11,605,917 11,605,917 128,016 128,016

102.29 102.64 102.79 133.48 52.24 87.75 83.77 85.34 85.14 170.75 36.39 103.04 102.26 102.70 102.82 133.81 51.31

90 90 90 98 13 76 75 75 76 92 0 90 90 90 90 98 13

99.33 99.82 99.79 95.12 52.72 98.57 97.17 99.00 98.92 83.54 36.90 99.74 99.24 99.79 99.75 95.15 52.48

99.26 99.77 99.74 94.80 51.57 98.29 96.88 98.74 98.65 82.91 35.86 99.68 99.17 99.73 99.70 94.83 51.27

98.81 99.37 99.34 93.51 46.86 96.18 94.74 96.65 96.57 79.79 31.54 99.24 98.68 99.30 99.26 93.28 46.43

97.76 98.34 98.31 90.41 43.04 92.09 90.67 92.54 92.49 76.40 28.00 98.17 97.60 98.26 98.20 89.97 42.36

89.50 90.01 90.05 80.00 34.72 76.23 74.96 76.46 76.50 68.26 22.00 89.76 89.29 89.88 89.90 79.22 34.15

All

8,961,946

101.82

90

99.92

99.88

99.60

98.72

90.54

All

8,961,946

102.23

90

99.92

99.88

99.60

98.73

90.67

Mappable

8,957,718

101.79

90

99.92

99.88

99.60

98.72

90.54

Mappable

8,957,718

102.21

90

99.92

99.89

99.60

98.73

90.66

Non-mappable

4,228

155.81

150

100.00

100.00

100.00

99.93

97.99

Non-mappable

4,228

140.12

139

99.01

98.72

98.37

98.11

94.42

All

2,610,118

106.82

91

99.28

99.16

98.30

96.71

87.77

All

2,610,118

102.49

89

97.30

97.13

96.10

94.40

85.51

11

GEM3 BWA-MEM GEM3 BWA-MEM

MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome MIG NIST v2.18 Non-reliably Callable MedExome

Mappable

2,499,258

105.67

91

99.47

99.36

98.53

97.01

88.15

Mappable

2,499,258

104.87

91

99.35

99.23

98.37

96.77

87.87

Non-mappable

110,860

132.63

96

94.93

94.60

93.27

90.04

79.32

Non-mappable

110,860

48.89

10

50.96

49.77

44.90

40.94

32.44

12

MedExome Insertions

MedExome Deletions

MedExome SNVs

Supp. Table S9. Summary of Variant calling for 8 pipelines for the WES sample. TP – true positives ; FP – false positives ; FN – false negatives ; specificity - number of TP calls as a proportion of Total Calls ; sensitivity - number of TP calls as a proportion of the number of NIST reference set calls ; F1-score – measure of overall accuracy calculated as (2 x TP) / ( (2 x TP) + FP + FN). Dataset Total Calls NIST v2.18 Gold Standard 24343 BWA + FreeBayes 24310 BWA + HaplotypeCaller 24346 BWA + SAMtools fast 24347 BWA + SAMtools normal 24306 GEM3 + FreeBayes 24347 GEM3 + HaplotypeCaller 24635 GEM3 + SAMtools fast 24396 GEM3 + SAMtools normal 24362 NIST v2.18 Gold Standard 292 BWA + FreeBayes 288 BWA + HaplotypeCaller 389 BWA + SAMtools fast 278 BWA + SAMtools normal 281 GEM3 + FreeBayes 284 GEM3 + HaplotypeCaller 391 GEM3 + SAMtools fast 286 GEM3 + SAMtools normal 287 NIST v2.18 Gold Standard 355 BWA + FreeBayes 330 BWA + HaplotypeCaller 392 BWA + SAMtools fast 370 BWA + SAMtools normal 367 GEM3 + FreeBayes 322 GEM3 + HaplotypeCaller 396 GEM3 + SAMtools fast 423 GEM3 + SAMtools normal 422

TP

FP FN Specificity Sensitivity F1 score

24274 36 69 24285 61 58 24292 55 51 24271 35 72 24248 99 95 24264 371 79 24269 127 74 24256 106 87

0.9985 0.9975 0.9977 0.9986 0.9959 0.9849 0.9948 0.9956

0.9972 0.9976 0.9979 0.997 0.9961 0.9968 0.997 0.9964

0.9978 0.9976 0.9978 0.9978 0.996 0.9908 0.9959 0.996

256 32 36 279 110 13 251 27 41 252 29 40 254 30 38 275 116 17 250 36 42 251 36 41

0.8889 0.7172 0.9029 0.8968 0.8944 0.7033 0.8741 0.8746

0.8767 0.9555 0.8596 0.863 0.8699 0.9418 0.8562 0.8596

0.8828 0.8194 0.8807 0.8796 0.8819 0.8053 0.8651 0.867

311 19 44 343 49 12 310 60 45 310 57 45 304 18 51 345 51 10 305 118 50 305 117 50

0.9424 0.875 0.8378 0.8447 0.9441 0.8712 0.721 0.7227

0.8761 0.9662 0.8732 0.8732 0.8563 0.9718 0.8592 0.8592

0.908 0.9183 0.8552 0.8587 0.8981 0.9188 0.7841 0.7851

13

Supp. Table S10. Summary of Variant calling for 8 analysis pipelines for the WGS sample, without the requirement of genotype called to be equivalent to the NIST call. TP – true positives ; FP – false positives ; FN – false negatives ; specificity - number of TP calls as a proportion of the Total Calls ; sensitivity - number of TP calls as a proportion of the number of NIST reference set calls ; F1-score – measure of overall accuracy calculated as (2 x TP) / ((2 x TP)+FP+FN) ; % reduction FP/TP indicates the reduction in the number of FP or TP with respect to the

Dataset

Total Calls

NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal NIST v2.18 Gold Standard BWA + FreeBayes BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal NIST v2.18 Gold Standard BWA + FreeBayes

2740732 2744545 2748582 2748866 2736410 2742937 2745423 2749554 2736871 85958 82263 86323 77671 77712 81602 86132 80905 80955 84583 78592

WGS Inser tions

WGS Deletions

WGS SNVs

requirement of GT equivalence (compare with Main Table 1).

TP

FP

FN

2738639 2738661 2739295 2733400 2736360 2738808 2738181 2732989

5906 9921 9571 3010 6577 6615 11373 3882

79354 85070 69444 69469 78172 84973 69861 69891 76350

% Reduction FP

% Reduction FN

0.99854 0.99782 0.99799 0.99811 0.99800 0.99844 0.99746 0.99788

6.9 2.3 7.8 14.7 10.6 5.6 11.4 14.8

17.3 10.2 35.9 6.6 15.1 17.0 36.4 8.0

0.92317 0.98967 0.80788 0.80817 0.90942 0.98854 0.81273 0.81308

0.94345 0.98757 0.84880 0.84889 0.93306 0.98754 0.83735 0.83745

55.9 18.3 9.4 9.4 38.8 14.1 6.5 6.5

35.8 24.0 4.9 4.9 21.8 16.2 4.5 4.6

0.90266

0.93581

52.3

23.0

Specificity

Sensitivity

2093 2071 1437 7332 4372 1924 2551 7743

0.99785 0.99639 0.99652 0.99890 0.99760 0.99759 0.99586 0.99858

0.99924 0.99924 0.99948 0.99732 0.99840 0.99930 0.99907 0.99717

2909 1253 8227 8243 3430 1159 11044 11064

6604 888 16514 16489 7786 985 16097 16067

0.96464 0.98548 0.89408 0.89393 0.95797 0.98654 0.86349 0.86333

2242

8233

0.97147

14

F1 score

BWA + HaplotypeCaller BWA + SAMTOOLS fast BWA + SAMTOOLS normal GEM3 + FreeBayes GEM3 + HaplotypeCaller GEM3 + SAMTOOLS fast GEM3 + SAMTOOLS normal

84521 79762 79762 78417 83973 92775 92795

83600 69835 69842 74771 83314 72602 72613

921 9927 9920 3646 659 20173 20182

983 14748 14741 9812 1269 11981 11970

0.98910 0.87554 0.87563 0.95350 0.99215 0.78256 0.78251

15

0.98838 0.82564 0.82572 0.88400 0.98500 0.85835 0.85848

0.98874 0.84986 0.84994 0.91744 0.98856 0.81871 0.81874

12.1 4.3 4.3 30.7 15.9 3.2 3.2

11.4 2.9 2.9 14.1 9.0 5.3 5.3

Supp. Table S11. Summary of Copy Number Variant (CNV) event detection using Control-FREEC on the WGS alignments individually and run as a pseudo tumour-normal pair. Significant events are those for which the p-value < (0.05/Total Observed Events), i.e. Bonferroni correction. Three identical significant events were identified for each alignment set individually, and no significant events were identified when run as a pseudo tumour-normal pair.

Total Observed Events

Total Shared Events

Total Gains

Total Losses

Total Significant Observed Events

Total Shared Significant Events

BWA-MEM BAM GEM3 BAM

31 33

27 27

10 12

21 21

3 3

3 3

Total Significant Observed Gains 1 1

BWA-MEM v. GEM3 BAM

7

NA

7

0

0

NA

0

Sample

16

Total Significant Observed Losses 2 2 0

Supp. Table S12. Summary of structural variant (SV) event detection using DELLY2 on the WGS alignments. “PASS” and “PRECISE” are provided in the output VCF produced by DELLY2, where PASS indicate that DELLY2 believes the event is bona fide, while PRECISE indicates that the break-points could be exactly identified. The intersect column is the number of events described identically as PASS and PRECISE in both the GEM3 and BWA-MEM alignments for a particular event type. The concordance column shows the proportion of the intersect value for each event type. Alignment

Event Type

Total

“PASS”

PASS & PRECISE

BWA-MEM BAM GEM3 BAM BWA-MEM BAM GEM3 BAM BWA-MEM BAM GEM3 BAM

Deletion Deletion Duplication Duplication Inversion Inversion

9508 7455 2041 738 6194 1907

2755 2490 780 390 897 442

1477 1407 186 113 217 143

Intersect of PASS & PRECISE 1217 1217 69 69 112 112

17

Concordance between samples 0.82 0.86 0.37 0.61 0.52 0.78