Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume,. 5. Kinuyo Yoneya, Kazuaki Matsui. 6. 7 kmatsui@civileng.kindai.ac.jp.
1
1
Electric Supplementary Material
2
Statistical recipe for quantifying microbial functional diversity from EcoPlate metabolic
3
profiling
4 5
Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume,
6
Kinuyo Yoneya, Kazuaki Matsui
7 8
kmatsui@civileng.kindai.ac.jp
9 10 11
Additional methods A: Detailed methods in soil samples
12
In December 2012, we selected three 1 m × 1 m subplots along a 400-m2 plot and created
13
trenches (depth: 50–70 cm, width: 40 cm) around the three plots using a shovel. Trenches
14
prevented other live roots and rhizomes from extending into the trenching plots. Understory
15
vegetation in the trenching plots was pulled out before every measurement, to minimize effects
16
of root activity of the understory vegetation. More information is available in Lin et al.
17
(submitted to Ecological Research). We separated 5-g subsamples from each sieved sample soil
18
and prepared a 1:1000 dilution with 50 μM NaHPO3 solution. We added 100 μL of the diluted
19
sample to each well of the EcoPlate, and incubated plates under the same temperature as that of
20
the sampling condition.
21 22
Additional methods B: Summary of R script (ecopl_comparison_20170614.R) for main
2
23
analyses
24
Preparation of environment
25
First, R (https://www.r-project.org/), R-studio (https://www.rstudio.com/products/RStudio/), and
26
Java (https://java.com/en/) (Java SE Runtime Environment) are installed. Chemistry
27
Development
28
downloaded from https://sourceforge.net/projects/cdk/files/cdk/ and installed according to the
29
instructions at http://cdk.sourceforge.net/old_web/download.html. Java environment and CDK
30
can be installed to use the chemoinformatic tool (rcdk library) in R. As installation steps for Java
31
and CDK are more complicated in Mac OSX (9.5 or later) than in Windows and Linux, readers
32
are directed to guidance_OSX.pdf for more information. In the event that installation of the CDK
33
library fails for any operating system (OS), some parts of the R script can be skipped and results
34
from the chemoinformatic tool can be directly loaded (from line 69 in this R script) without
35
conducting the analysis from the sdf files. When executing the R script in the R-studio
36
environment, the Working Directory should be set from the toolbar as follows: Session > Set
37
Working Directory > To Source File Location. All of the functions in this script
38
(ecopl_comparison_20170614.R) rely on the relative path from this R script (i.e., Source)
39
location for making access to other files (e.g., EcoPlate data and chemical information). Each
40
part of the script is described in the following subsections (PART0 to PART4).
Kit
(CDK)
(ref:
http://pubs.acs.org/doi/abs/10.1021/ci025584y)
is
then
41 42
PART0: Setting environment
43
Libraries used for the analyses can be installed by the scripts in this part. Alternatively, each
44
library except for pforeach can be installed from the toolbar of R-studio. Readers who are
45
unfamiliar with multivariate analysis in ecology should read Appendix C first and then continue
46
to the following parts.
3
47 48
PART1: Analysis of chemical similarity
49
By executing the script in this part, the chemical dissimilarity matrix (DC) and similarity trees
50
(TC) for the 31 substrates in EcoPlate shown in Fig. 4 are obtained. This part totally relies on the
51
library {rcdk}. Any error messages that result from loading this library (library(“rcdk”)) are
52
likely due to incompatibility of versions of the Java environment and/or CDK in the OS. These
53
programs should be updated and the environments rebooted. In the R environment, because the
54
library rcdk depends on a specific version of another library rJava, and because the installation
55
processes of rcdk include installation of the version of rJAVA, rJava should not be independently
56
installed or updated. One can also refer to guidance_OSX.pdf.
57 58
PART2: EcoPlate data loading for EcoPlate analysis
59
This part includes the functions to load EcoPlate data and loading processes of all data. As the
60
environment of data accumulation (e.g., file name) is highly variable, the functions to load
61
should be adjusted depending on the project. More detailed notes are available in the beginning
62
of PART2 in this script.
63 64
PART3: EcoPlate analysis combined with chemical similarity
65
This part of the script includes the i) definitions of functions, ii) analyses for the microcosm
66
experiment, and iii) analyses for the Xitou data.
67 68
PART4: Additional analysis for SI
69
This part of the script is composed of i) checking the distribution of statistical values from
70
randomly generated similarity trees (for Fig. S6), ii) comparing the chemical dissimilarity matrix
4
71
and in situ color development dissimilarity matrix and generating the trees shown in Fig. S5, iii)
72
subanalysis to investigate the effect of integration period on the explanatory power of the
73
statistical model for specific datasets (the best datasets) (for Fig. S4), and iv) scripts for
74
comparing PERMANOVA and distance-based RDA.
75 76
Additional methods C: Basis for data prearrangement
77
The processes in Fig. 3 are standard procedure for multivariate analysis in various fields in
78
ecology and environmental science. Readers who are unfamiliar with these processes should
79
review the example in ecopl_pre_confirmation.R. Conversion from the functional matrix EC into
80
EBC, MF, and DF (DFBUW, DFUW) can be confirmed by this R script, and identical results are
81
available in sample_pre_data.xlsx.
82 83
Additional methods D: Supplementary methods
84
Chemoinformatic analyses
85
We complied two-dimensional structures of the 31 carbon substrates used in EcoPlate into a
86
single sdf file (list_ecoplate2.sdf). The item with “ FDB***” was
87
downloaded from FooDB (http://foodb.ca/); all others were downloaded from PubChem
88
(https://pubchem.ncbi.nlm.nih.gov/)
89
ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf for the definition of
90
fingerprint format). This file was used to generate chemical similarity trees tree-a and tree-b
91
through
92
(http://chemminetools.ucr.edu/tools/view_job/118888/) was used i) to generate tree-c through
93
hierarchical clustering with averaging and ii) to obtain the dissimilarity matrix (by the online tool:
94
Show data) as the file online.dissimilarity. Further processes were all conducted in the R
the
R
package
(see
{rcdk::get.fingerprint}.
ChemMine
Tool
5
95
environment. Details on the chemoinformatic tools are available in Guha (2007) and Guha and
96
Cherlop–Powers (2016).
97 98
Fuzzy set and fuzzy-weighting of color development patterns
99
Based on {SYNCSA::belonging}, the multivariate matrix E = {eki}(e.g., functional matrix EC in
100
this study or community matrix in general) can be weighted with the distance matrix D = {dij}.
101
First, each element of D is normalized by the maximum element and the updated δij = dij/max(dij).
102
Then, the distance matrix is converted to the similarity matrix S = {sij} where sij = 1 – δij. The
103
“belonging” of item i to j fi (j) is calculated as the relative similarity of i to j: f i ( j )
si , j
s
; fi(j)
i, j
j
104
is a fuzzy set. Finally, the multivariate matrix {eki}: the item (e.g., substrate) i of the sample k is
105
weighted by the items j as ek ,i ' f i ( j )ek , j . j
106 107
Permutation P value for detecting nonrandom results by chemical weighting
108
Permutation P values were calculated as follows:
109
B: Number of permutations to satisfy a certain condition
110
b: Number of permutations such that R2perm ≥ R2obs
111
b’: Number of permutations such that R2perm ≤ R2obs
112
m: Number of permutations
113 114
The permutation P value is the probability that B is the more extreme case than b or b’.
115 116
PU = P(B ≤ b) = (b + 1)/(m + 1), and
6
117
PL = P(B ≤ b’) = (b’+1)/(m + 1).
118 119
When PU ≤ 0.025 or PL ≤ 0.025, we interpret the chemically weighted results as being
120
statistically different from the randomly weighted results (according to the two-tailed test).
121 122
References in Appendix D
123
ftp://cran.r-project.org/pub/R/web/packages/SYNCSA/SYNCSA.pdf
124
Electric Supplementary Material Statistical recipe for quantifying microbial functional diversity from EcoPlate metabolic profiling Takeshi Miki, Taichi Yokokawa, Po-Ju Ke, I Fang Hsieh, Chih-hao Hsieh, Tomonori Kume, Kinuyo Yoneya, Kazuaki Matsui kmatsui@civileng.kindai.ac.jp
Additional Figures
Fig. S1 Miki et al.
(a)
(b) D−Galactonic−Acid−gamma−Lactone Tween−80 beta−Methyl−D−Glucoside alpha−Cyclodextrin alpha−D−Lactose D−Cellobiose Glycogen D−Xylose D−Galacturonic−Acid Glucose−1−Phosphate N−Acetyl−D−Glucosamine Tween−40 D−Malic−Acid alpha−Ketobutyric−Acid D−Mannitol i−Erythritol Itaconic−Acid L−Threonine L−Serine L−Asparagine D−Glucosaminic−Acid L−Arginine Glycyl−L−Glutamic−Acid alpha−Glycerol−Phosphate Putrescine gamma−Hydroxybutyric−Acid Pyruvic−Acid−Methyl−Ester Phenylethyl−amine L−Phenylalanine 4−Hydroxy−Benzoic−Acid 2−Hydroxy−Benzoic−Acid
(c) D−Mannitol
D−Malic−Acid
i−Erythritol
Itaconic−Acid
D−Malic−Acid
L−Asparagine
alpha−Ketobutyric−Acid
gamma−Hydroxybutyric−Acid
Itaconic−Acid
L−Threonine
L−Threonine
L−Serine
L−Serine
alpha−Ketobutyric−Acid
L−Asparagine
Pyruvic−Acid−Methyl−Ester
D−Glucosaminic−Acid
Glycyl−L−Glutamic−Acid
L−Arginine
L−Arginine
Glycyl−L−Glutamic−Acid
Phenylethyl−amine
alpha−D−Lactose
L−Phenylalanine
D−Cellobiose
4−Hydroxy−Benzoic−Acid
Glycogen
2−Hydroxy−Benzoic−Acid
alpha−Cyclodextrin
D−Galactonic−Acid−gamma−Lactone
beta−Methyl−D−Glucoside
D−Glucosaminic−Acid
D−Xylose
D−Galacturonic−Acid
D−Galacturonic−Acid
N−Acetyl−D−Glucosamine
Glucose−1−Phosphate
Glucose−1−Phosphate
D−Galactonic−Acid−gamma−Lactone
beta−Methyl−D−Glucoside
Tween−80
D−Mannitol
N−Acetyl−D−Glucosamine
D−Xylose
alpha−Glycerol−Phosphate
i−Erythritol
Putrescine
alpha−D−Lactose
Phenylethyl−amine
D−Cellobiose
L−Phenylalanine
alpha−Glycerol−Phosphate
4−Hydroxy−Benzoic−Acid
Glycogen
2−Hydroxy−Benzoic−Acid
alpha−Cyclodextrin
gamma−Hydroxybutyric−Acid
Tween−80
Tween−40
Tween−40
Pyruvic−Acid−Methyl−Ester
Putrescine
Fig. S1 Chemical similarity trees generated from different methods: (a) standard method in R function {rcdk:get.fingerprint}, (b) extended method in R function {rcdk:get.fingerprint, type="extended"}, and (c) online tool
Fig. S2 Miki et al.
(a)
(b)
(c)
0.18 0.16
non-weighted
0.14
R2
0.12
a b
0.1 0.08
a b c
tree-a tree-b tree-c
0.06 0.04
c
0.02 0 T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9
T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9
T=0.1 T=0.2 T=0.3 T=0.4 T=0.5 T=0.6 T=0.7 T=0.8 T=0.9
Avg
Max
Min
Final endpoint
1
Fig. S2 Results of linear models linking multifunctionality with gene diversity in microcosms.
2
R2 values from the linear model (MF ~ reduction of gene diversity) are compared for
3
different metrics: (a) average, (b) maximum, and (c) minimum of triplicates. * denotes Pperm,U
4
< 0.025 or Pperm, L < 0.025, respectively for tree x (x = a, b, c in Figure S1).
Fig. S3 Miki et al. Threshold T
0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5
0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5
0.9 0.7 0.5 0.9 0.7 0.5 0.9 0.7 0.5
0.5
UW CW-a CW-b CW-c
Constrained fractionof variance
0.55
*
0.45 0.4
*
*
0.35 0.3 0.25
Avg
Max
Min
Temporal integration
Avg
Max
Min
Temporal maximum
Avg
Max
Min
Final endpoint
Fig. S3 Results of db-RDA linking functional composition with month and treatment effects in forest soils, based on binarized data for different calculation methods. Statistical power (constrained fraction of variance) of distance-based RDA model (functional dissimilarity ~ treatment×month) varies depending on calculation methods. * Pperm,U < 0.025 or Pperm, L < 0.025 for tree x (x = a, b, c in Figure S1). Vertical and horizontal axes cross at a position corresponding to the average statistical power from default calculation methods (i.e., “Final endpoint and taking average of triplicates”). T values (= 0.9. 0.7, and 0.5) represent quantile-based threshold for the banalization. . The abbreviations are the same as in Fig. 5.
1.0
Fig. S4 Miki et al.
0.8
binary_uw
0.6
binary_cw
R2
cont_cwGuniFrac
0.4
cont_cwGuniFrac0.5 cont_cwF
0.0
0.2
cont_uw
0
5
10
15
20
25
30
Integration period (days)
Fig. S4 Sensitivity of db-RDA results to the integration period. Statistical power (R2) of distance-based RDA model is shown for different integration periods (0 –30 days). Maximum values among triplicates of forest soil data were used. For chemically weighted results, tree-c in Fig. S1 was used. Threshold T = 0.8 was used for binary results.
Fig. S5 Miki et al.
(a)
(b) D−Malic−Acid gamma−Hydroxybutyric−Acid Itaconic−Acid alpha−Glycerol−Phosphate D−Cellobiose D−Galacturonic−Acid alpha−Cyclodextrin Tween−40 L−Phenylalanine alpha−D−Lactose L−Threonine Glycogen alpha−Ketobutyric−Acid 2−Hydroxy−Benzoic−Acid D−Glucosaminic−Acid L−Serine Pyruvic−Acid−Methyl−Ester L−Asparagine Tween−80 D−Mannitol i−Erythritol Glucose−1−Phosphate Glycyl−L−Glutamic−Acid D−Galactonic−Acid−gamma−Lactone L−Arginine D−Xylose beta−Methyl−D−Glucoside Phenylethyl−amine 4−Hydroxy−Benzoic−Acid N−Acetyl−D−Glucosamine Putrescine
(c) Phenylethyl−amine L−Phenylalanine 2−Hydroxy−Benzoic−Acid D−Cellobiose alpha−Ketobutyric−Acid D−Galactonic−Acid−gamma−Lactone alpha−D−Lactose L−Threonine gamma−Hydroxybutyric−Acid D−Glucosaminic−Acid D−Xylose alpha−Cyclodextrin i−Erythritol L−Arginine D−Malic−Acid alpha−Glycerol−Phosphate Glucose−1−Phosphate N−Acetyl−D−Glucosamine beta−Methyl−D−Glucoside D−Mannitol Pyruvic−Acid−Methyl−Ester Glycogen Tween−80 Tween−40 Putrescine L−Asparagine 4−Hydroxy−Benzoic−Acid L−Serine D−Galacturonic−Acid Itaconic−Acid Glycyl−L−Glutamic−Acid
Glucose−1−Phosphate beta−Methyl−D−Glucoside Glycogen L−Threonine L−Phenylalanine D−Glucosaminic−Acid D−Cellobiose alpha−Cyclodextrin alpha−D−Lactose D−Malic−Acid L−Arginine D−Galactonic−Acid−gamma−Lactone alpha−Ketobutyric−Acid 4−Hydroxy−Benzoic−Acid N−Acetyl−D−Glucosamine Tween−80 Tween−40 L−Asparagine D−Mannitol L−Serine D−Galacturonic−Acid Pyruvic−Acid−Methyl−Ester Glycyl−L−Glutamic−Acid Phenylethyl−amine i−Erythritol D−Xylose Itaconic−Acid Putrescine 2−Hydroxy−Benzoic−Acid gamma−Hydroxybutyric−Acid alpha−Glycerol−Phosphate
Fig. S5 Chemical similarity trees based on pairwise correlations of color development patterns (color depths) between samples in EcoPlate incubations. Pairwise dissimilarity was calculated as 1 – max(r, 0) where r is the correlation coefficient. Similarity calculated in this way was interpreted to be based on the similarity of microbial responses to different substrates rather than those defined by chemical structure (see Fig. S1). Results were calculated using (a) temporally integrated data and maximum values of triplicates from forest soils, (b) final endpoint data and maximum values of triplicates from microcosm experiments, and (c) combined data from (a) and (b).
Fig. S6 Miki et al.
(a)
(b)
R2 of UW (0.0963)
(c)
R2 of UW 25
15
15
R2 of UW
5
0
R2 of CW-b (0.1629)
0
0
5
R2 of CW-a (0.1669)
0.00
0.05
0.10
0.15
R2 values
0.20
0.25
5
10
15
Density
10
10
20
R2 of CW-c (0.0731)
0.00
0.05
0.10
0.15
R2 values
0.20
0.25
0.00
0.05
0.10
R2 values
Fig. S6 Results of linear models linking multifunctionality with month and treatment effects in forest soils using permutated chemical dissimilarity matrices. Statistical power (R2) of linear model from chemically weighted multifunctionality (CW-a,b,c) is compared with power results from multifunctionality weighted by randomly permutated chemical-dissimilarity trees, to evaluate effectiveness of chemically weighted methods. We used data from Fig. 6a (temporal maximum with minimum of triplicates, and threshold T = 0.7; highlighted by a blue bar). Probability density distribution is from permuting tree-a (a) tree-b (b), and tree-c (c) as shown in Fig. S1. Pperm,U = 0.009, Pperm,U = 0.007, and Pperm,L = 0.107 for (a), (b), and (c), respectively. R2 of UW represents statistical power from chemically unweighted indices.
0.15
0.20
0.25