Verh. Internat. Verein. Limnol. 2008, vol. 30, Part 1, p. 100–104, Stuttgart, January 2008 © by E. Schweizerbart’sche Verlagsbuchhandlung 2008
Modelling dinoflagellate dynamics in Lake Kinneret Natasa Atanasova, Gideon Gal, Boris Kompare
Introduction Lake Kinneret, the only natural fresh water body in Israel, provides approx. 30% of the country’s drinking water; thus, maintaining high water quality and a stable ecosystem is of prime importance. Until 1994 the lake ecosystem was very stable, dominated by the dinoflagellate Peridinium gatunese. Since the mid 1990s the lake has undergone changes that have led to an unstable ecosystem (ROELKE et al. 2007). In this study we used modelling techniques to identify some of conditions required for development of Peridinium blooms. Further, we built a model that predicts bloom concentration 2 weeks in advance. Our approach is data driven, an application of learning models from measured data. Key words: Dinoflagellate, Lake Kinneret, modelling, machine learning, regression trees eschweizerbartxxx
Method Basic principle of machine learning ML Typically, machine learning methods build models by learning from data. The ML procedure consists of a concept (dependant variable), examples, learning algorithms and a learning scheme or model. The main task of ML is to learn a model for the concept from given examples of that concept (MICHALSKI et al. 1998). An example of a concept could be a record in a database with specific values of its attributes. Using the examples and some background knowledge, the ML algorithm then generates the learning scheme, or a model, which is a presentation of what has been learned (Fig. 1). Building regression trees. We used a machine learning algorithm (M5) for building regression tree models (QUINLAN 1992). Compared to simple linear regression, which calculates one equation (one weight vector) for the dependant variable that applies to the entire data set, piecewise or tree-structured regression divides the data set into several subsets on which a uniform value or linear equation can be applied. In this manner the piecewise linear parts can much better cover the nonlinear behaviour of the dependent variable.
H[DPSOHVWKDWGHVFULEHD FRQFHSW GDWD
H[SHUWNQRZOHGJH
0/DOJRULWKP
OHDUQLQJVFKHPH PRGHO
Fig. 1. Machine learning procedure.
Regression tree model consists of nodes (branching points); branches, which connect the nodes; and leaves, which are terminal nodes where the dependant variable is predicted. The algorithm works recursively, starting with the entire set of examples (S) and selecting the best attribute and the best split of that attribute according to the splitting criterion, deriving the most homogeneous subset regarding the class values or regression model. Further details about the algorithm can be found in QUINLAN (1992). After the tree is constructed from the training (learning) set of data, the accuracy of prediction, termed model quality, can be assessed by dividing the entire data set into learning and testing subsets. The tree is constructed by using the learning set, while the evaluation is performed on the test data subset. A better method, which is intended to avoid the possible over reliance on any one particular division into test and train components, is to partition the original dataset in several different ways and to compute an average score over the different partitions (i.e., to employ cross-validation criteria); thus, the given dataset is partitioned into a number of chosen folds. In turn, each fold is used for testing, while the remainder is used for training.
Dataset and the experimental setup The dataset used for modelling includes 4 years of physical, chemical, and biological data, with a weekly-to-fortnightly sampling frequency. We used 144 examples (records) of measured variables (attributes; Table 1). Two experiments were performed resulting in 2 types of models for Peridinium: (1) a knowledge discovery model and (2) a prediction model. The goal of the first experiment is to 0368-0770/08/0100 $ 1.25 © 2008 E. Schweizerbartsche Verlagsbuchhandlung, D-70176 Stuttgart
N. Atanasova et al., Modelling dinoflagellate Table 1. Measured variables (attributes) in Lake Kinneret, used for model construction. attributes
units
Ca CO2 Conductivity Corg NH4 NO3 pH P-ort TSS Turbidity CHLOROPHITA PERIDINIUM Copepoda Cladocera Rotifera Temperature
um and the other measured attributes today, the model calculates the concentration of Peridinium 2 weeks into the future. The models were built using the software WEKA, which represents a shell including most of the popular ML algorithms (WITTEN & FRANK 2000).
measured data on 5. 1. 1997 53.00 0.01 1082.50 2.70 0.0230 0.2180 8.1750 0.0005 ? 2.13 20.31 79.29 16.69 18.85 0.15 18.00
mg/L mg/L µmhos/cm mg/L mg/L mg/L Std Units mg/L mg/L NTU g/m 2 g/m 2 g/m 2 g/m 2 g/m 2 deg C
Results and discussion Knowledge discovery model Given the dataset of 144 examples containing values of the measured attributes (Table 1), the algorithm constructed a model that consists of branches and 10 leaves (Fig. 2). Each leaf contains a linear model (LM; due to clarity and space restrictions, equations are not presented in this paper) that calculates the concentration of dinoflagellates (mg m–2). The model is read in terms of IF THEN rules. When an example (record in the data base) reaches a specific leaf, then LM is applied to calculate the dinoflagellate concentration. The example of data measured 1 May 1997 (Table 1) is classified in the leaf LM2, because pH < 8.17 AND Cladocera 16.3 g/m2. This classification means that the Peridinium concentration for the given example is calculated by the linear model LM 2, presented in equation 1, which results in 12.3 g/m2.
disclose some knowledge about the relations between the measured variables. In this way it will bring ideas on specific conditions for various aspect of the Peridinium behaviour. The second experiment was aimed at constructing a model that can predict the Peridinium concentration in the lake 2 weeks in advance. An additional attribute, Peridinium concentration after 2 weeks, was introduced into the database and set as a dependant variable. Thus, given the present value of Peridinieschweizerbartxxx
S+
! 7XUELGLW\
&ODGRFHUD
!
!
/0
/0
!
/0
/0
!
QLWUDWH
/0
!
/0
WHPS
/0
1+
3RUW
!
&KORURSKLWD
/0
! &KORURSKLWD
101
/0
! /0
Fig. 2. Knowledge discovery model for Dinoflagellate Peridinium gatunense.
&RUUHODWLRQFRHIILFLHQW 0HDQDEVROXWHHUURU 7RWDOQXPEHURILQVWDQFHV
102
Verh. Internat. Verein. Limnol. 30 ',1235(',&7('
3HUIRUPDFHRIWKHNQRZOHGJHGLVFRYHU\PRGHO
',120($685('
'LQRIODJHOODWH>PJP@
WLPH
Fig. 3. Comparison of the measured and simulated data with the knowledge discovery model for P. gatunense concentration.
LM 2: DINO = -57782.3618 * Nitrate + 11164.4098 * pH – 2984885.0457 * Port + 5725.3334 * Turbidity – 0.1814 * Chlorophyta + 98.5937 * Cladocera – 1627.3683 * temp – 45907.6743 (1) Model evaluation was conducted by strict cross-validation criteria (see section 2.2). Compared to the data measurements, the model performs with satisfactory accuracy (Fig. 2), correctly simulating the interannual biomass peaks (Fig. 3). The regression tree structure of the model is highly descriptive and fits well to expert understanding of the ecosystem function. According to the model, pH is the most important attribute for Peridinium in Lake Kinneret. Given there was no direct measure of CO2 in the model, only indirectly through pH, and given the key role CO2 limitation plays in governing the Peridinium bloom in the lake (BERMAN-FRANK et al. 1998, BERMAN-FRANK & EREZ 1996), the key role of pH in the model is expected. Zooplankton, specifically cladoceran species, also play a key role in determining Peridinium blooms The link between Peridinium and Cladocera has also been identified in previous research (ROELKE et al. 2007). According to the model, this connection is conditioned with lower pH values (pH < 8.7). The direct link between Cladocera and Peridinium is unclear because they cannot serve as dietary source due to their large cell size. A number of possible indirect links may exist between the lake’s zooplankton and algal assemblage, including nutrient recycling or grazing pressure on competing, yet edible, algal species (ZOHARY 2004). The latter, possibly indirect link may also explain the presence of a Chlorophyta condition in the Peridinium model; high cladoceran biomass results eschweizerbartxxx
in high grazing predation on the edible algal species (e.g., chlorophyta), thereby reducing competition between the algal species for limiting resources such as nutrients. While these indirect pathways require further examination, they are plausible. The nutrient limitation branch suggests that only under very low P conditions of 0.001 mg L–1 or less does it become a limiting factor. While these conditions exist in the lake, they are typical for summer conditions and not during the spring when Peridinium typically bloom. According to the nutrient branch in the model, nitrate becomes important after 2 conditions are true: P > 0.001 mg L–1 and temperature >18 °C. Since these conditions typically happen in spring when Peridinium blooms, the model confirms the expert expectations that nitrate (and not phosphorus) becomes limiting during the Peridinium blooms.
Prediction model The prediction model was constructed to predict Peridinium concentration 2 weeks in advance, based on the present measurements of the attributes (Table 1). The regression tree model (Fig. 4), where DINO denotes the present Peridinium concentration. Similar to the previous model, the dependant variable (Peridinium concentration 2 weeks forward) is calculated by the linear models LM 1 to LM 11. Again, the model was evaluated with cross-validation criteria and shows good results in terms of accuracy and performance by correctly simulating the large interannual variation (Fig. 5).
N. Atanasova et al., Modelling dinoflagellate
103
',12
!
',12
1BWRW !
&ODGRFHUD
1+
!
/0
/0
!
!
/0
/0
QLWUDWH
/0 !
&KOD ! /0
/0
&RUUHODWLRQFRHIILFLHQW 0HDQDEVROXWHHUURU 7RWDOQXPEHURILQVWDQFHV
!
3RUW
/0
1BWRW
!
&KORURSKLWD
/0
!
/0
/0
Fig. 4. Prediction model: Given the present values of the measured attributes in the lake (see Table 1) the model predicts P. gatunense concentration for 2 weeks ahead.
',1235(',&7('
3HUIRUPDQFHRIWKHSUHGLFWLRQPRGHO
',120($685('
'LQRIODJHOODWH>PJP@
eschweizerbartxxx
WLPH
Fig. 5. Comparison of the measured and simulated data with the prediction model for Peridinium concentration.
Conclusions and further work In this study we implement a ML technique by building a regression tree model to extract knowledge from measured data, and to predict Peridinium concentration in Lake Kinneret. Compared to the measured data, both models show acceptable accuracy in their performance. Moreover their structure is in line with expert expectations, which makes them clear and explainable. Further work is focused on performing similar experiments with longer time series data. This will enable building more reliable models for Peridinium as well as for other
variables in the system and thus will represent a solid tool to support the management of the lake.
Acknowledgments We thank the scientists of the Lake Kinneret Limnological Laboratory, especially Ami Nishri, Alon Rimmer and Tamar Zohary, for allowing us to use data they collected.
104
Verh. Internat. Verein. Limnol. 30
References BERMAN-FRANK, I. & J. EREZ. 1996. Inorganic carbon pools in the bloom-forming dinoflagellate Peridinium gatunense. Limnol. Oceanogr. 41: 1780–1789. BERMAN-FRANK, I., J. EREZ & A. K APLAN. 1998. Changes in inorganic carbon uptake during the progression of a dinoflagellate bloom in a lake ecosystem. Can J. Bot. 76: 1043– 1051. MICHALSKI, R. S., I. BRATKO & M. KUBAT. 1998. Machine learning and data mining: methods and applications. John Wiley and Sons Ltd., West Sussex, England.
QUINLAN, J. R. 1992. Learning with continuous classes, p. 343– 348. In N. Adams & L. Sterling [eds.], proceedings AI’92 (Australian Conference on AI), Singapore, World Scientific. ROELKE, D. L., T. ZOHARY, K. D. HAMBRIGHT & J. V. MONTOYA. 2007. Alternative states in the phytoplankton of Lake Kinneret, Israel (Sea of Galilee). Freshw. Biol. 52: 399–411. WITTEN I. H. & E. FRANK. 2000. Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers. ZOHARY, T. 2004. Changes to the phytoplankton assemblage of Lake Kinneret after decades of a predictable, repetitive pattern. Freshw. Biol. 49: 1355–1371.
Authors’ addresses: Natasa Atanasova, Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Slovenia,
[email protected], corresponding author. Gideon Gal, Yigal Alon Kinneret Limnological Laboratory, IOLR, PO Box 447, Migdal 14950 Israel,
[email protected] Boris Kompare, Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Slovenia,
[email protected]
eschweizerbartxxx