Metagenomic sequencing reveals microbiota and its functional potential associated with periodontal disease Jinfeng Wang1§, Ji Qi2§, Hui Zhao1, Shu He3, Yifei Zhang3, Shicheng Wei3*, Fangqing Zhao1* 1
Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101,
China 2
Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai
200433, China 3
Laboratory of Interdisciplinary Studies, School and Hospital of Stomatology, Peking
University, Beijing 100081, China §
These authors contributed equally to this work;
*
To whom correspondence should be addressed.
Fangqing Zhao Tel: 8610-64869325, Fax: 8610-64874346, Email:
[email protected] Shicheng Wei Tel & Fax: 8610-82195780, Email:
[email protected]
Supplementary Figures Figure S1. A flowchart of taxonomic and functional analysis pipeline used in this study. (1) Quality control. High-quality paired-end reads were obtained after filtering human and low-quality reads. (2) Taxonomic and functional analyses. High-quality reads were merged to ~180 bp sequences according to the overlap of the 2 × 100 bp PE reads. The 16S rRNA gene sequences were extracted from the datasets of each sample group (Z, PZ, H-1, and H-2) and were classified using the RDP classifier. Non-rDNA sequences were BLASTed with the NCBI non-redundant (NR) sequence database and the BLASTX files were imported into the MEGAN to statistically analyze the taxonomic distributions in each sample. (3) De novo assembly of metagenomic reads. All high-quality reads (unmerged) were pooled and then used to perform de novo assembly and gene prediction. After filtering the short ORFs (< 80 aa), the predicted genes were BLASTed against the NR database to quantify the sequence identities of functional genes. (4) Comparative metagenomic analysis. Figure S2. Human DNA contamination in 9 plaque samples. Human DNA in samples PY2, PY4, PY5, PZ2, PZ3, PZ8, H4, H6, and H7 was assessed before metagenomic sequencing by quantifying the absolute level of the β-actin gene via real-time quantitative polymerase chain reaction. The proportion of human DNA in the samples was calculated by dividing the concentration of human DNA by the overall concentration of the template DNA. This proportion (x-axis) was then compared with that calculated by the sequencing datasets (y-axis). Figure S3. Relative frequency of the most abundant genera in 4 different periodontal states. A comparison between groups Z (green bar, including Z11, Z14, and Z15), PZ (red bar, including PY2, PY4, PY5, PZ2, PZ3, and PZ8), H-1 (orange bar, including H9-1 and H14-1), and H-2 (blue bar, including H4, H6, H7, H9-2, and H14-2) was performed using the mean value of relative abundance of each genus. The number of reads was normalized to 100,000 for each sample. (A) NR-BLASTX-based classification. (B) 16S rRNA-based classification. Both approaches showed similar community structures and inter-group divergence. Figure S4. Intraspecific diversity of Porphyromonas gingivalis and Treponema denticola. Unassembled paired-end reads of all swab and plaque samples of periodontal disease were aligned to currently available reference genomic sequences to identify SNVs. (A) P. gingivalis W83 (NC_002950.2) and (B) T. denticola ATCC 35405 (NC_002967.9) were used as genomic references.
Figure S5. Functional divergence between microbiomes of periodontal health and disease. Functional annotation was performed according to the KEGG assignments. The number of reads was normalized to 100,000 for each sample. (A) Comparison of KEGG categories at the first 3 levels in the microbial communities of groups PZ (plaque of periodontal health, y-axis) and H-2 (plaque of periodontitis, x-axis). The 3 dots (cell motility, bacterial chemotaxis, and flagellar assembly) highlighted by a red circle were deviated from the diagonal. (B) A box-plot of KEGG category abundance of groups PZ (red box) and H-2 (blue box). The KEGG categories of the second level are shown. The boxes represent the interquartile range between the first and third quartiles. The whiskers denote the lowest and highest values within the interquartile ranges of the first and third quartiles. The dots inside the boxes represent the median. The upward arrows in red and downward arrows in blue represented over- and under-represented pathways in the microbiomes of plaque of periodontal disease (group H-2), respectively. Figure S6. Major bacterial clades contributed to functional variations. (A-B) As for bacterial chemotaxis and flagellar assembly, Treponema, Selenomonas, and Campylobacter were the top 3 donors contributing to the increase of flagellar genes in plaque of periodontal disease (H-2 group). (C) Over-representation of LPS biosynthesis genes in the H-2 group was due to the vigorous propagation of the genera Prevotella, Porphyromonas, Fusobacterium, Campylobacter, Selenomonas, and Aggregatibacter. (D) The over-representation of the PrtC gene in the H-2 group was mainly provided by genera Prevotella, Treponema, Selenomonas, Porphyromonas, and Fusobacterium. Figure S7. The sequence similarities between the assigned taxa and their reference sequences. (A) Amino acid sequence identity between metagenomic reads and their reference sequences in the NR database. Only the reads of the top 30 genera from sample H4 were used in the calculation. (B) The colored curves in the inset graph highlight 5 relatively less abundant genera (Clostridium, Parabacteroides, Eubacterium, Ruminococcus, and Lactobacillus), which shared a low sequence similarity with known protein sequences in the NR database. Figure S8. The percentage of novel functions in the periodontal microbiota. In total, 498,886 predicted ORFs > 80 aa were obtained from the metagenomic assembly of all samples. Of these, 35.1% were highly similar to known proteins (≥ 90% sequence identities), whereas 8.9% had no hits in the NR database.
Figure S1
Figure S2
Figure S3
Figure S4
Figure S5
Figure S6
Figure S7
Figure S8
Supplementary Tables Table S1. Statistics of metagenomic sequencing of the swab and plaque samples Sample
Yield (bp)
# human PE-reads
# microbial PE-reads
# merged reads
Average read length (bp)
Z11
3,070,921,400
26,871,620
3,837,596
367,194
175
Z14
2,195,593,600
14,492,544
7,463,390
533,892
179
Z15
2,165,983,000
18,615,524
3,044,306
322,067
178
PY2
1,551,294,600
5,855,774
9,657,172
904,087
178
PY4
1,506,676,400
4,585,620
10,481,144
485,904
182
PY5
1,601,036,000
7,263,276
8,747,084
1,051,782
177
PZ2
1,860,468,000
10,683,714
7,920,966
429,990
181
PZ3
1,898,875,200
7,200,372
11,788,380
1,146,658
178
PZ8
2,115,805,000
6,135,584
15,022,466
1,693,870
176
H9-1
2,361,258,000
11,481,304
12,131,278
1,113,711
178
H14-1
2,510,421,200
14,041,956
11,062,254
605,105
175
H4
2,315,682,000
4,743,384
18,413,436
2,786,241
178
H6
1,976,678,600
12,426,666
7,340,120
947,659
177
H7
2,986,001,000
22,235,928
7,624,082
907,492
175
H9-2
2,529,206,400
20,348,020
4,944,042
921,190
174
H14-2
2,205,713,600
10,911,256
11,145,880
361,898
173
Table S2. The relative abundance of the major genera in the microbiomes of periodontal health and disease. Genus
H4
H6
H7
Prevotella
32459
16797
10861
34150
10138
19695
21346
12300
479
9304
6833
4103
8310
281
3941
2706
7987
9356
3726
14203
11009
11614
3255
16701
14671
10174
10989
18532
3340
11030
2528
395
1375
2107
1963
3179
5097
3497
485
16835
5081
7133
3217
5696
9435
23759
33320
Capnocytophaga Streptococcus
H9-2
H14-2
H9-1
H14-1
PY2
PY4
PY5
PZ2
PZ3
PZ8
Z11
Z14
Z15 1620
20
1633
4098
1216
5143
4123
3651
14415
1476
4035
15641
18705
9153
1991
6157
892
Actinomyces
135
963
1287
476
910
1262
1531
18423
8404
3102
13438
10839
6221
3271
3357
658
Veillonella
528
666
48
6263
10586
3873
6936
1161
761
9086
1111
1259
1525
186
5684
3876
3466
1728
247
4299
6521
3466
8274
4599
1523
1427
8631
Corynebacterium
Neisseria Leptotrichia
117
5684
5008
2094
2909
1064
1784
1589
3529
2210
7381
5627
1524
87
1291
333
2539
1994
465
2358
1556
3958
4055
2416
45
2286
844
702
2382
556
3493
3377 49
Fusobacterium
4554
2847
2970
3836
1507
Selenomonas
6128
4131
330
2962
1214
1821
369
7652
92
2231
742
2746
4218
10
1073
8
422
184
70
20
1848
548
1272
1926
184
1911
1369
1191
11526
11453
704
13
506
294
94
581
743
348
72
2569
540
3880
260
780
17747
1664
2961
Porphyromonas
8167
7409
3248
2742
1859
541
301
714
30
1203
98
238
178
164
82
532
Campylobacter
2418
2122
1560
1415
1701
1644
1070
2016
206
2032
1109
2298
1790
52
1471
264
Treponema
7235
2309
7039
3691
1619
914
313
1054
40
745
149
276
640
109
163
97
86
270
722
2587
1354
2048
2049
1918
57
175
38
Rothia Haemophilus
Cardiobacterium
15
876
1755
66
222
Aggregatibacter
16
2316
1115
247
1068
337
838
169
229
791
522
306
1252
439
414
1048
1215
749
409
25
677
24
54
109
69
1921
710
8
46
73
27
48
12
145
302
7287
Eubacterium
133
235
245
1266
2005
Gemella
137
30
84
193
1577
445
1879
Granulicatella Bacteroides Kingella Coprobacillus Dialister Abiotrophia Eikenella Filifactor Megasphaera
12
27
202
70
516
153
626
16
1480
145
101
178
55
276
428
4457
1115
864
758
777
483
624
363
508
242
339
227
253
379
45
181
202
2
256
141
46
32
78
21
38
3507
412
246
358
347
205
38
34
38
309
537
197
288
156
778
136
83
121
344
117
65
2677
350
904
1323
1102
51
306
299
78
272
611
8
41
266
95
12
0
21
9
418
333
194
431
800
397
216
355
21
262
30
130
316
19
65
90
28
213
249
37
103
85
56
74
396
778
129
115
446
47
30
21
404
63
68
25
4
37
5
7
7
26
7
32
689
384
608
616
1020
86
17
606
57
195
50
374
4
27
27
64
8
0
16
5
247
112
185
69
123
71
105
110
16
114
64
116
166
163
42
82
87
53
293
14
57
18
Clostridium
231
190
211
200
126
Oribacterium
590
152
50
150
156
Table S3. Candidates of unreported human oral bacteria identified by NR-BLAST and 16S rDNA approaches NR-BLAST a
16S rDNA b
Alicycliphilus Alkaliphilus Basfia Blautia Bryantella Citreicella Chlorobium Collinsella Coprobacillus Curvibacter Dichelobacter Histophilus Holdemania Kytococcus Methylobacillus Methylococcus Parabacteroides Psychrobacter Riemerella Roseburia Sebaldella Spirochaeta Streptobacillus Thauera Verminephrobacter
Allisonella Aquitalea Bergeriella Cloacibacterium Paraprevotella Planobacterium Propionivibrio Tessaracoccus Xylanibacter
NR-BLAST, genera identified by blasting shotgun sequencing reads against the NCBI non-redundant (nr) database; b 16S rDNA, genera identified by using 16S rDNA sequences extracted from the shotgun sequencing reads. a