Metagenomic sequencing reveals microbiota and ... - Semantic Scholar

36 downloads 0 Views 3MB Size Report
Propionivibrio. Collinsella. Tessaracoccus. Coprobacillus. Xylanibacter. Curvibacter. Dichelobacter. Histophilus. Holdemania. Kytococcus. Methylobacillus.
Metagenomic sequencing reveals microbiota and its functional potential associated with periodontal disease Jinfeng Wang1§, Ji Qi2§, Hui Zhao1, Shu He3, Yifei Zhang3, Shicheng Wei3*, Fangqing Zhao1* 1

Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101,

China 2

Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai

200433, China 3

Laboratory of Interdisciplinary Studies, School and Hospital of Stomatology, Peking

University, Beijing 100081, China §

These authors contributed equally to this work;

*

To whom correspondence should be addressed.

Fangqing Zhao Tel: 8610-64869325, Fax: 8610-64874346, Email: [email protected] Shicheng Wei Tel & Fax: 8610-82195780, Email: [email protected]

Supplementary Figures Figure S1. A flowchart of taxonomic and functional analysis pipeline used in this study. (1) Quality control. High-quality paired-end reads were obtained after filtering human and low-quality reads. (2) Taxonomic and functional analyses. High-quality reads were merged to ~180 bp sequences according to the overlap of the 2 × 100 bp PE reads. The 16S rRNA gene sequences were extracted from the datasets of each sample group (Z, PZ, H-1, and H-2) and were classified using the RDP classifier. Non-rDNA sequences were BLASTed with the NCBI non-redundant (NR) sequence database and the BLASTX files were imported into the MEGAN to statistically analyze the taxonomic distributions in each sample. (3) De novo assembly of metagenomic reads. All high-quality reads (unmerged) were pooled and then used to perform de novo assembly and gene prediction. After filtering the short ORFs (< 80 aa), the predicted genes were BLASTed against the NR database to quantify the sequence identities of functional genes. (4) Comparative metagenomic analysis. Figure S2. Human DNA contamination in 9 plaque samples. Human DNA in samples PY2, PY4, PY5, PZ2, PZ3, PZ8, H4, H6, and H7 was assessed before metagenomic sequencing by quantifying the absolute level of the β-actin gene via real-time quantitative polymerase chain reaction. The proportion of human DNA in the samples was calculated by dividing the concentration of human DNA by the overall concentration of the template DNA. This proportion (x-axis) was then compared with that calculated by the sequencing datasets (y-axis). Figure S3. Relative frequency of the most abundant genera in 4 different periodontal states. A comparison between groups Z (green bar, including Z11, Z14, and Z15), PZ (red bar, including PY2, PY4, PY5, PZ2, PZ3, and PZ8), H-1 (orange bar, including H9-1 and H14-1), and H-2 (blue bar, including H4, H6, H7, H9-2, and H14-2) was performed using the mean value of relative abundance of each genus. The number of reads was normalized to 100,000 for each sample. (A) NR-BLASTX-based classification. (B) 16S rRNA-based classification. Both approaches showed similar community structures and inter-group divergence. Figure S4. Intraspecific diversity of Porphyromonas gingivalis and Treponema denticola. Unassembled paired-end reads of all swab and plaque samples of periodontal disease were aligned to currently available reference genomic sequences to identify SNVs. (A) P. gingivalis W83 (NC_002950.2) and (B) T. denticola ATCC 35405 (NC_002967.9) were used as genomic references.

Figure S5. Functional divergence between microbiomes of periodontal health and disease. Functional annotation was performed according to the KEGG assignments. The number of reads was normalized to 100,000 for each sample. (A) Comparison of KEGG categories at the first 3 levels in the microbial communities of groups PZ (plaque of periodontal health, y-axis) and H-2 (plaque of periodontitis, x-axis). The 3 dots (cell motility, bacterial chemotaxis, and flagellar assembly) highlighted by a red circle were deviated from the diagonal. (B) A box-plot of KEGG category abundance of groups PZ (red box) and H-2 (blue box). The KEGG categories of the second level are shown. The boxes represent the interquartile range between the first and third quartiles. The whiskers denote the lowest and highest values within the interquartile ranges of the first and third quartiles. The dots inside the boxes represent the median. The upward arrows in red and downward arrows in blue represented over- and under-represented pathways in the microbiomes of plaque of periodontal disease (group H-2), respectively. Figure S6. Major bacterial clades contributed to functional variations. (A-B) As for bacterial chemotaxis and flagellar assembly, Treponema, Selenomonas, and Campylobacter were the top 3 donors contributing to the increase of flagellar genes in plaque of periodontal disease (H-2 group). (C) Over-representation of LPS biosynthesis genes in the H-2 group was due to the vigorous propagation of the genera Prevotella, Porphyromonas, Fusobacterium, Campylobacter, Selenomonas, and Aggregatibacter. (D) The over-representation of the PrtC gene in the H-2 group was mainly provided by genera Prevotella, Treponema, Selenomonas, Porphyromonas, and Fusobacterium. Figure S7. The sequence similarities between the assigned taxa and their reference sequences. (A) Amino acid sequence identity between metagenomic reads and their reference sequences in the NR database. Only the reads of the top 30 genera from sample H4 were used in the calculation. (B) The colored curves in the inset graph highlight 5 relatively less abundant genera (Clostridium, Parabacteroides, Eubacterium, Ruminococcus, and Lactobacillus), which shared a low sequence similarity with known protein sequences in the NR database. Figure S8. The percentage of novel functions in the periodontal microbiota. In total, 498,886 predicted ORFs > 80 aa were obtained from the metagenomic assembly of all samples. Of these, 35.1% were highly similar to known proteins (≥ 90% sequence identities), whereas 8.9% had no hits in the NR database.

Figure S1

Figure S2

Figure S3

Figure S4

Figure S5

Figure S6

Figure S7

Figure S8

Supplementary Tables Table S1. Statistics of metagenomic sequencing of the swab and plaque samples Sample

Yield (bp)

# human PE-reads

# microbial PE-reads

# merged reads

Average read length (bp)

Z11

3,070,921,400

26,871,620

3,837,596

367,194

175

Z14

2,195,593,600

14,492,544

7,463,390

533,892

179

Z15

2,165,983,000

18,615,524

3,044,306

322,067

178

PY2

1,551,294,600

5,855,774

9,657,172

904,087

178

PY4

1,506,676,400

4,585,620

10,481,144

485,904

182

PY5

1,601,036,000

7,263,276

8,747,084

1,051,782

177

PZ2

1,860,468,000

10,683,714

7,920,966

429,990

181

PZ3

1,898,875,200

7,200,372

11,788,380

1,146,658

178

PZ8

2,115,805,000

6,135,584

15,022,466

1,693,870

176

H9-1

2,361,258,000

11,481,304

12,131,278

1,113,711

178

H14-1

2,510,421,200

14,041,956

11,062,254

605,105

175

H4

2,315,682,000

4,743,384

18,413,436

2,786,241

178

H6

1,976,678,600

12,426,666

7,340,120

947,659

177

H7

2,986,001,000

22,235,928

7,624,082

907,492

175

H9-2

2,529,206,400

20,348,020

4,944,042

921,190

174

H14-2

2,205,713,600

10,911,256

11,145,880

361,898

173

Table S2. The relative abundance of the major genera in the microbiomes of periodontal health and disease. Genus

H4

H6

H7

Prevotella

32459

16797

10861

34150

10138

19695

21346

12300

479

9304

6833

4103

8310

281

3941

2706

7987

9356

3726

14203

11009

11614

3255

16701

14671

10174

10989

18532

3340

11030

2528

395

1375

2107

1963

3179

5097

3497

485

16835

5081

7133

3217

5696

9435

23759

33320

Capnocytophaga Streptococcus

H9-2

H14-2

H9-1

H14-1

PY2

PY4

PY5

PZ2

PZ3

PZ8

Z11

Z14

Z15 1620

20

1633

4098

1216

5143

4123

3651

14415

1476

4035

15641

18705

9153

1991

6157

892

Actinomyces

135

963

1287

476

910

1262

1531

18423

8404

3102

13438

10839

6221

3271

3357

658

Veillonella

528

666

48

6263

10586

3873

6936

1161

761

9086

1111

1259

1525

186

5684

3876

3466

1728

247

4299

6521

3466

8274

4599

1523

1427

8631

Corynebacterium

Neisseria Leptotrichia

117

5684

5008

2094

2909

1064

1784

1589

3529

2210

7381

5627

1524

87

1291

333

2539

1994

465

2358

1556

3958

4055

2416

45

2286

844

702

2382

556

3493

3377 49

Fusobacterium

4554

2847

2970

3836

1507

Selenomonas

6128

4131

330

2962

1214

1821

369

7652

92

2231

742

2746

4218

10

1073

8

422

184

70

20

1848

548

1272

1926

184

1911

1369

1191

11526

11453

704

13

506

294

94

581

743

348

72

2569

540

3880

260

780

17747

1664

2961

Porphyromonas

8167

7409

3248

2742

1859

541

301

714

30

1203

98

238

178

164

82

532

Campylobacter

2418

2122

1560

1415

1701

1644

1070

2016

206

2032

1109

2298

1790

52

1471

264

Treponema

7235

2309

7039

3691

1619

914

313

1054

40

745

149

276

640

109

163

97

86

270

722

2587

1354

2048

2049

1918

57

175

38

Rothia Haemophilus

Cardiobacterium

15

876

1755

66

222

Aggregatibacter

16

2316

1115

247

1068

337

838

169

229

791

522

306

1252

439

414

1048

1215

749

409

25

677

24

54

109

69

1921

710

8

46

73

27

48

12

145

302

7287

Eubacterium

133

235

245

1266

2005

Gemella

137

30

84

193

1577

445

1879

Granulicatella Bacteroides Kingella Coprobacillus Dialister Abiotrophia Eikenella Filifactor Megasphaera

12

27

202

70

516

153

626

16

1480

145

101

178

55

276

428

4457

1115

864

758

777

483

624

363

508

242

339

227

253

379

45

181

202

2

256

141

46

32

78

21

38

3507

412

246

358

347

205

38

34

38

309

537

197

288

156

778

136

83

121

344

117

65

2677

350

904

1323

1102

51

306

299

78

272

611

8

41

266

95

12

0

21

9

418

333

194

431

800

397

216

355

21

262

30

130

316

19

65

90

28

213

249

37

103

85

56

74

396

778

129

115

446

47

30

21

404

63

68

25

4

37

5

7

7

26

7

32

689

384

608

616

1020

86

17

606

57

195

50

374

4

27

27

64

8

0

16

5

247

112

185

69

123

71

105

110

16

114

64

116

166

163

42

82

87

53

293

14

57

18

Clostridium

231

190

211

200

126

Oribacterium

590

152

50

150

156

Table S3. Candidates of unreported human oral bacteria identified by NR-BLAST and 16S rDNA approaches NR-BLAST a

16S rDNA b

Alicycliphilus Alkaliphilus Basfia Blautia Bryantella Citreicella Chlorobium Collinsella Coprobacillus Curvibacter Dichelobacter Histophilus Holdemania Kytococcus Methylobacillus Methylococcus Parabacteroides Psychrobacter Riemerella Roseburia Sebaldella Spirochaeta Streptobacillus Thauera Verminephrobacter

Allisonella Aquitalea Bergeriella Cloacibacterium Paraprevotella Planobacterium Propionivibrio Tessaracoccus Xylanibacter

NR-BLAST, genera identified by blasting shotgun sequencing reads against the NCBI non-redundant (nr) database; b 16S rDNA, genera identified by using 16S rDNA sequences extracted from the shotgun sequencing reads. a