Electric Supplementary Material 1 Statistical recipe for

5 downloads 0 Views 2MB Size Report
Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume,. 5. Kinuyo Yoneya, Kazuaki Matsui. 6. 7 [email protected].
1   



Electric Supplementary Material



Statistical recipe for quantifying microbial functional diversity from EcoPlate metabolic



profiling

4  5 

Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume,



Kinuyo Yoneya, Kazuaki Matsui

7  8 

[email protected]

9  10  11 

Additional methods A: Detailed methods in soil samples

12 

In December 2012, we selected three 1 m × 1 m subplots along a 400-m2 plot and created

13 

trenches (depth: 50–70 cm, width: 40 cm) around the three plots using a shovel. Trenches

14 

prevented other live roots and rhizomes from extending into the trenching plots. Understory

15 

vegetation in the trenching plots was pulled out before every measurement, to minimize effects

16 

of root activity of the understory vegetation. More information is available in Lin et al.

17 

(submitted to Ecological Research). We separated 5-g subsamples from each sieved sample soil

18 

and prepared a 1:1000 dilution with 50 μM NaHPO3 solution. We added 100 μL of the diluted

19 

sample to each well of the EcoPlate, and incubated plates under the same temperature as that of

20 

the sampling condition.

21  22 

Additional methods B: Summary of R script (ecopl_comparison_20170614.R) for main

2   

23 

analyses

24 

Preparation of environment

25 

First, R (https://www.r-project.org/), R-studio (https://www.rstudio.com/products/RStudio/), and

26 

Java (https://java.com/en/) (Java SE Runtime Environment) are installed. Chemistry

27 

Development

28 

downloaded from https://sourceforge.net/projects/cdk/files/cdk/ and installed according to the

29 

instructions at http://cdk.sourceforge.net/old_web/download.html. Java environment and CDK

30 

can be installed to use the chemoinformatic tool (rcdk library) in R. As installation steps for Java

31 

and CDK are more complicated in Mac OSX (9.5 or later) than in Windows and Linux, readers

32 

are directed to guidance_OSX.pdf for more information. In the event that installation of the CDK

33 

library fails for any operating system (OS), some parts of the R script can be skipped and results

34 

from the chemoinformatic tool can be directly loaded (from line 69 in this R script) without

35 

conducting the analysis from the sdf files. When executing the R script in the R-studio

36 

environment, the Working Directory should be set from the toolbar as follows: Session > Set

37 

Working Directory > To Source File Location. All of the functions in this script

38 

(ecopl_comparison_20170614.R) rely on the relative path from this R script (i.e., Source)

39 

location for making access to other files (e.g., EcoPlate data and chemical information). Each

40 

part of the script is described in the following subsections (PART0 to PART4).

Kit

(CDK)

(ref:

http://pubs.acs.org/doi/abs/10.1021/ci025584y)

is

then

41  42 

PART0: Setting environment

43 

Libraries used for the analyses can be installed by the scripts in this part. Alternatively, each

44 

library except for pforeach can be installed from the toolbar of R-studio. Readers who are

45 

unfamiliar with multivariate analysis in ecology should read Appendix C first and then continue

46 

to the following parts.

3   

47  48 

PART1: Analysis of chemical similarity

49 

By executing the script in this part, the chemical dissimilarity matrix (DC) and similarity trees

50 

(TC) for the 31 substrates in EcoPlate shown in Fig. 4 are obtained. This part totally relies on the

51 

library {rcdk}. Any error messages that result from loading this library (library(“rcdk”)) are

52 

likely due to incompatibility of versions of the Java environment and/or CDK in the OS. These

53 

programs should be updated and the environments rebooted. In the R environment, because the

54 

library rcdk depends on a specific version of another library rJava, and because the installation

55 

processes of rcdk include installation of the version of rJAVA, rJava should not be independently

56 

installed or updated. One can also refer to guidance_OSX.pdf.

57  58 

PART2: EcoPlate data loading for EcoPlate analysis

59 

This part includes the functions to load EcoPlate data and loading processes of all data. As the

60 

environment of data accumulation (e.g., file name) is highly variable, the functions to load

61 

should be adjusted depending on the project. More detailed notes are available in the beginning

62 

of PART2 in this script.

63  64 

PART3: EcoPlate analysis combined with chemical similarity

65 

This part of the script includes the i) definitions of functions, ii) analyses for the microcosm

66 

experiment, and iii) analyses for the Xitou data.

67  68 

PART4: Additional analysis for SI

69 

This part of the script is composed of i) checking the distribution of statistical values from

70 

randomly generated similarity trees (for Fig. S6), ii) comparing the chemical dissimilarity matrix

4   

71 

and in situ color development dissimilarity matrix and generating the trees shown in Fig. S5, iii)

72 

subanalysis to investigate the effect of integration period on the explanatory power of the

73 

statistical model for specific datasets (the best datasets) (for Fig. S4), and iv) scripts for

74 

comparing PERMANOVA and distance-based RDA.

75  76 

Additional methods C: Basis for data prearrangement

77 

The processes in Fig. 3 are standard procedure for multivariate analysis in various fields in

78 

ecology and environmental science. Readers who are unfamiliar with these processes should

79 

review the example in ecopl_pre_confirmation.R. Conversion from the functional matrix EC into

80 

EBC, MF, and DF (DFBUW, DFUW) can be confirmed by this R script, and identical results are

81 

available in sample_pre_data.xlsx.

82  83 

Additional methods D: Supplementary methods

84 

Chemoinformatic analyses

85 

We complied two-dimensional structures of the 31 carbon substrates used in EcoPlate into a

86 

single sdf file (list_ecoplate2.sdf). The item with “ FDB***” was

87 

downloaded from FooDB (http://foodb.ca/); all others were downloaded from PubChem

88 

(https://pubchem.ncbi.nlm.nih.gov/)

89 

ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf for the definition of

90 

fingerprint format). This file was used to generate chemical similarity trees tree-a and tree-b

91 

through

92 

(http://chemminetools.ucr.edu/tools/view_job/118888/) was used i) to generate tree-c through

93 

hierarchical clustering with averaging and ii) to obtain the dissimilarity matrix (by the online tool:

94 

Show data) as the file online.dissimilarity. Further processes were all conducted in the R

the

R

package

(see

{rcdk::get.fingerprint}.

ChemMine

Tool

5   

95 

environment. Details on the chemoinformatic tools are available in Guha (2007) and Guha and

96 

Cherlop–Powers (2016).

97  98 

Fuzzy set and fuzzy-weighting of color development patterns

99 

Based on {SYNCSA::belonging}, the multivariate matrix E = {eki}(e.g., functional matrix EC in

100 

this study or community matrix in general) can be weighted with the distance matrix D = {dij}.

101 

First, each element of D is normalized by the maximum element and the updated δij = dij/max(dij).

102 

Then, the distance matrix is converted to the similarity matrix S = {sij} where sij = 1 – δij. The

103 

“belonging” of item i to j fi (j) is calculated as the relative similarity of i to j: f i ( j ) 

si , j

s

; fi(j)

i, j

j

104 

is a fuzzy set. Finally, the multivariate matrix {eki}: the item (e.g., substrate) i of the sample k is

105 

weighted by the items j as ek ,i '   f i ( j )ek , j . j

106  107 

Permutation P value for detecting nonrandom results by chemical weighting

108 

Permutation P values were calculated as follows:

109 

B: Number of permutations to satisfy a certain condition

110 

b: Number of permutations such that R2perm ≥ R2obs

111 

b’: Number of permutations such that R2perm ≤ R2obs

112 

m: Number of permutations

113  114 

The permutation P value is the probability that B is the more extreme case than b or b’.

115  116 

PU = P(B ≤ b) = (b + 1)/(m + 1), and

6   

117 

PL = P(B ≤ b’) = (b’+1)/(m + 1).

118  119 

When PU ≤ 0.025 or PL ≤ 0.025, we interpret the chemically weighted results as being

120 

statistically different from the randomly weighted results (according to the two-tailed test).

121  122 

References in Appendix D

123 

ftp://cran.r-project.org/pub/R/web/packages/SYNCSA/SYNCSA.pdf

124 

Electric Supplementary Material Statistical recipe for quantifying microbial functional diversity from EcoPlate metabolic profiling Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume, Kinuyo Yoneya, Kazuaki Matsui   [email protected]

Additional Figures 

Fig. S1 Miki et al.

(a)

(b) D−Galactonic−Acid−gamma−Lactone Tween−80 beta−Methyl−D−Glucoside alpha−Cyclodextrin alpha−D−Lactose D−Cellobiose Glycogen D−Xylose D−Galacturonic−Acid Glucose−1−Phosphate N−Acetyl−D−Glucosamine Tween−40 D−Malic−Acid alpha−Ketobutyric−Acid D−Mannitol i−Erythritol Itaconic−Acid L−Threonine L−Serine L−Asparagine D−Glucosaminic−Acid L−Arginine Glycyl−L−Glutamic−Acid alpha−Glycerol−Phosphate Putrescine gamma−Hydroxybutyric−Acid Pyruvic−Acid−Methyl−Ester Phenylethyl−amine L−Phenylalanine 4−Hydroxy−Benzoic−Acid 2−Hydroxy−Benzoic−Acid

(c) D−Mannitol

D−Malic−Acid

i−Erythritol

Itaconic−Acid

D−Malic−Acid

L−Asparagine

alpha−Ketobutyric−Acid

gamma−Hydroxybutyric−Acid

Itaconic−Acid

L−Threonine

L−Threonine

L−Serine

L−Serine

alpha−Ketobutyric−Acid

L−Asparagine

Pyruvic−Acid−Methyl−Ester

D−Glucosaminic−Acid

Glycyl−L−Glutamic−Acid

L−Arginine

L−Arginine

Glycyl−L−Glutamic−Acid

Phenylethyl−amine

alpha−D−Lactose

L−Phenylalanine

D−Cellobiose

4−Hydroxy−Benzoic−Acid

Glycogen

2−Hydroxy−Benzoic−Acid

alpha−Cyclodextrin

D−Galactonic−Acid−gamma−Lactone

beta−Methyl−D−Glucoside

D−Glucosaminic−Acid

D−Xylose

D−Galacturonic−Acid

D−Galacturonic−Acid

N−Acetyl−D−Glucosamine

Glucose−1−Phosphate

Glucose−1−Phosphate

D−Galactonic−Acid−gamma−Lactone

beta−Methyl−D−Glucoside

Tween−80

D−Mannitol

N−Acetyl−D−Glucosamine

D−Xylose

alpha−Glycerol−Phosphate

i−Erythritol

Putrescine

alpha−D−Lactose

Phenylethyl−amine

D−Cellobiose

L−Phenylalanine

alpha−Glycerol−Phosphate

4−Hydroxy−Benzoic−Acid

Glycogen

2−Hydroxy−Benzoic−Acid

alpha−Cyclodextrin

gamma−Hydroxybutyric−Acid

Tween−80

Tween−40

Tween−40

Pyruvic−Acid−Methyl−Ester

Putrescine

Fig. S1 Chemical similarity trees generated from different methods: (a) standard method in R function {rcdk:get.fingerprint}, (b) extended method in R function {rcdk:get.fingerprint, type="extended"}, and (c) online tool

Fig. S2 Miki et al.

(a)

(b)

(c)

0.18 0.16

non-weighted

0.14

R2

0.12

a b

0.1 0.08

a b c

tree-a tree-b tree-c

0.06 0.04

c

0.02 0 T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9

T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9

T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9

Avg

Max

Min

Final endpoint

1

Fig. S2 Results of linear models linking multifunctionality with gene diversity in microcosms.

2

R2 values from the linear model (MF ~ reduction of gene diversity) are compared for

3

different metrics: (a) average, (b) maximum, and (c) minimum of triplicates. * denotes Pperm,U

4

< 0.025 or Pperm, L < 0.025, respectively for tree x (x = a, b, c in Figure S1).

Fig. S3 Miki et al. Threshold T

0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5

0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5

0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5

0.5

UW CW-a CW-b CW-c

Constrained fractionof variance

0.55

*

0.45 0.4

*

*

0.35 0.3 0.25

Avg

Max

Min

Temporal integration

Avg

Max

Min

Temporal maximum

Avg

Max

Min

Final endpoint

Fig. S3 Results of db-RDA linking functional composition with month and treatment effects in forest soils, based on binarized data for different calculation methods. Statistical power (constrained fraction of variance) of distance-based RDA model (functional dissimilarity ~ treatment×month) varies depending on calculation methods. * Pperm,U < 0.025 or Pperm, L < 0.025 for tree x (x = a, b, c in Figure S1). Vertical and horizontal axes cross at a position corresponding to the average statistical power from default calculation methods (i.e., “Final endpoint and taking average of triplicates”). T values (= 0.9. 0.7, and 0.5) represent quantile-based threshold for the banalization. . The abbreviations are the same as in Fig. 5.

1.0

Fig. S4 Miki et al.

0.8

binary_uw

0.6

binary_cw

R2

cont_cwGuniFrac

0.4

cont_cwGuniFrac0.5 cont_cwF

0.0

0.2

cont_uw

0

5

10

15

20

25

30

Integration period (days)

Fig. S4 Sensitivity of db-RDA results to the integration period. Statistical power (R2) of distance-based RDA model is shown for different integration periods (0 –30 days). Maximum values among triplicates of forest soil data were used. For chemically weighted results, tree-c in Fig. S1 was used. Threshold T = 0.8 was used for binary results.

Fig. S5 Miki et al.

(a)

(b) D−Malic−Acid gamma−Hydroxybutyric−Acid Itaconic−Acid alpha−Glycerol−Phosphate D−Cellobiose D−Galacturonic−Acid alpha−Cyclodextrin Tween−40 L−Phenylalanine alpha−D−Lactose L−Threonine Glycogen alpha−Ketobutyric−Acid 2−Hydroxy−Benzoic−Acid D−Glucosaminic−Acid L−Serine Pyruvic−Acid−Methyl−Ester L−Asparagine Tween−80 D−Mannitol i−Erythritol Glucose−1−Phosphate Glycyl−L−Glutamic−Acid D−Galactonic−Acid−gamma−Lactone L−Arginine D−Xylose beta−Methyl−D−Glucoside Phenylethyl−amine 4−Hydroxy−Benzoic−Acid N−Acetyl−D−Glucosamine Putrescine

(c) Phenylethyl−amine L−Phenylalanine 2−Hydroxy−Benzoic−Acid D−Cellobiose alpha−Ketobutyric−Acid D−Galactonic−Acid−gamma−Lactone alpha−D−Lactose L−Threonine gamma−Hydroxybutyric−Acid D−Glucosaminic−Acid D−Xylose alpha−Cyclodextrin i−Erythritol L−Arginine D−Malic−Acid alpha−Glycerol−Phosphate Glucose−1−Phosphate N−Acetyl−D−Glucosamine beta−Methyl−D−Glucoside D−Mannitol Pyruvic−Acid−Methyl−Ester Glycogen Tween−80 Tween−40 Putrescine L−Asparagine 4−Hydroxy−Benzoic−Acid L−Serine D−Galacturonic−Acid Itaconic−Acid Glycyl−L−Glutamic−Acid

Glucose−1−Phosphate beta−Methyl−D−Glucoside Glycogen L−Threonine L−Phenylalanine D−Glucosaminic−Acid D−Cellobiose alpha−Cyclodextrin alpha−D−Lactose D−Malic−Acid L−Arginine D−Galactonic−Acid−gamma−Lactone alpha−Ketobutyric−Acid 4−Hydroxy−Benzoic−Acid N−Acetyl−D−Glucosamine Tween−80 Tween−40 L−Asparagine D−Mannitol L−Serine D−Galacturonic−Acid Pyruvic−Acid−Methyl−Ester Glycyl−L−Glutamic−Acid Phenylethyl−amine i−Erythritol D−Xylose Itaconic−Acid Putrescine 2−Hydroxy−Benzoic−Acid gamma−Hydroxybutyric−Acid alpha−Glycerol−Phosphate

Fig. S5 Chemical similarity trees based on pairwise correlations of color development patterns (color depths) between samples in EcoPlate incubations. Pairwise dissimilarity was calculated as 1 – max(r, 0) where r is the correlation coefficient. Similarity calculated in this way was interpreted to be based on the similarity of microbial responses to different substrates rather than those defined by chemical structure (see Fig. S1). Results were calculated using (a) temporally integrated data and maximum values of triplicates from forest soils, (b) final endpoint data and maximum values of triplicates from microcosm experiments, and (c) combined data from (a) and (b).

Fig. S6 Miki et al.

(a)

(b)

R2 of UW (0.0963)

(c)

R2 of UW 25

15

15

R2 of UW

5

0

R2 of CW-b (0.1629)

0

0

5

R2 of CW-a (0.1669)

0.00

0.05

0.10

0.15

R2 values

0.20

0.25

5

10

15

Density

10

10

20

R2 of CW-c (0.0731)

0.00

0.05

0.10

0.15

R2 values

0.20

0.25

0.00

0.05

0.10

R2 values

Fig. S6 Results of linear models linking multifunctionality with month and treatment effects in forest soils using permutated chemical dissimilarity matrices. Statistical power (R2) of linear model from chemically weighted multifunctionality (CW-a,b,c) is compared with power results from multifunctionality weighted by randomly permutated chemical-dissimilarity trees, to evaluate effectiveness of chemically weighted methods. We used data from Fig. 6a (temporal maximum with minimum of triplicates, and threshold T = 0.7; highlighted by a blue bar). Probability density distribution is from permuting tree-a (a) tree-b (b), and tree-c (c) as shown in Fig. S1. Pperm,U = 0.009, Pperm,U = 0.007, and Pperm,L = 0.107 for (a), (b), and (c), respectively. R2 of UW represents statistical power from chemically unweighted indices.

0.15

0.20

0.25