Using a Gaussian decomposition approach to model ...

2 downloads 0 Views 2MB Size Report
This work was supported by the IMAGE project (09-067259) from the Danish Council for ... Coble, P.G., Green, S.A., Blough, N.V., Gagosian, R.B., 1990.
Marine Chemistry 180 (2016) 24–32

Contents lists available at ScienceDirect

Marine Chemistry journal homepage: www.elsevier.com/locate/marchem

Using a Gaussian decomposition approach to model absorption spectra of chromophoric dissolved organic matter Philippe Massicotte ⁎, Stiig Markager Aarhus University, Department of Bioscience, Frederiksborgvej 399, DK-4000 Roskilde, Denmark

a r t i c l e

i n f o

Article history: Received 20 July 2015 Received in revised form 27 January 2016 Accepted 28 January 2016 Available online 14 February 2016 Keywords: Spectral slope Chromophoric dissolved organic matter Gaussian decomposition Light absorption Spectral modeling

a b s t r a c t The chromophoric dissolved organic matter (CDOM) is a significant water constituent influencing inherent and apparent optical properties of natural waters and plays a key role in ecosystem functioning. The spectral slope (S) describing the approximate exponential decline in CDOM absorption with increasing wavelength is widely used for tracing changes in the chemical composition of CDOM. The currently accepted method of characterizing CDOM absorption (i.e., fitting a simple exponential model) can lead to loss of information and large errors. We propose a better method for modeling CDOM absorption spectra based on a Gaussian decomposition approach that removes the errors associated with the choice of the spectral range used to estimate S. Using artificially generated spectra with known parameters (n = 1000), we show that our method provides robust estimates of S closely resembling the original values. On average, the error on S estimations was 0.16% for the proposed method compared to 27% and 11% for the traditional modeling approaches fitted over 300–700 nm and 240–700 nm respectively. We further demonstrate the ability of the method to decompose and model chromophores present in complex spectra from oceanic water samples from around the world. The proposed method opens avenues for long-term or cross-site comparison studies of the dynamics of the CDOM pool and constitutes a promising supplement to techniques based on CDOM fluorescence. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Dissolved organic matter (DOM) is the largest dynamic pool of carbon in both marine (Benner, 2002) and freshwater ecosystems (Cole et al., 2007). DOM influences the functioning of aquatic ecosystems in numerous ways. For example, the optical properties of the DOM pool determine underwater light characteristics (Kirk, 1994), the composition of aquatic microbial communities (Foreman and Covert, 2003; Kritzberg et al., 2006), the carbon cycling on local to global scales (Cole et al., 2007) and the mineralization and transport of nitrogen (Markager et al., 2011; Keller and Hood, 2011; Jørgensen et al., 2014). Chemically, the DOM pool is complex and only a small fraction can easily be characterized with chemical methods (Benner, 2002; Seitzinger et al., 2005). Optical techniques such as absorbance and fluorescence have been developed to characterize the DOM pool in aquatic ecosystems (Coble et al., 1990; McKnight et al., 2001). These techniques have their limitations, as it is difficult to link optical characteristics directly to the chemical composition. Nevertheless, they are useful as they are rapid, and therefore cost effective, relative to chemical analyses. Chromophoric dissolved organic matter (CDOM) is the optically active fraction of the DOM pool. It is well known that optical ⁎ Corresponding author. E-mail addresses: [email protected] (P. Massicotte), [email protected] (S. Markager).

http://dx.doi.org/10.1016/j.marchem.2016.01.008 0304-4203/© 2016 Elsevier B.V. All rights reserved.

characteristics of the CDOM pool relate to its biochemical characteristics such as aromaticity (Weishaar et al., 2003) and molecular size (Sharma and Schulman 1999; Helms et al., 2008). CDOM is responsible for much of the underwater variability in light attenuation (Kirk, 1994) and its optical properties are commonly used as a proxy to trace the origin and the dynamic of the DOM pool over time and space in many aquatic ecosystems (McKnight et al., 2001; Stedmon and Markager, 2001; Baker and Spencer, 2004; Yamashita et al., 2013; Jørgensen et al., 2014). An adequate description of the absorption properties of CDOM is necessary in order to understand photochemical and bio-optical processes such as primary production (Markager et al., 2004; Thrane et al., 2014) and to parametrize dynamic ecosystem models (Massicotte and Frenette, 2013; Maar et al., 2016) and remote sensing applications such as ocean color algorithms (Bélanger et al., 2008). Given that UV–visible absorption spectra of CDOM decrease approximately exponentially with increasing wavelength, different exponential models have been proposed to extract quantitative information about optical properties of CDOM (reviewed in Twardowski et al. (2004)). Eq. (1) presents the most common approach (Stedmon and Markager, 2001): aCDOMðλÞ ¼ aCDOMðλ0Þ e−Sðλ−λ0Þ þ K

ð1Þ

where aCDOM is the absorption coefficient (m−1), λ is the wavelength (nm), λ0 is a reference wavelength (nm), K is a background constant

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

(m−1) accounting for scatter in the cuvette and drift of the instrument and S is the spectral slope (nm−1) that describes the approximate exponential rate of decrease absorption with increasing wavelength. Higher slopes indicate a more rapid decrease in absorption with increasing wavelength. The S parameter is frequently used as a proxy for tracing photochemical and microbial-induced changes in the CDOM pool (Moran et al., 2000; Twardowski et al., 2004; Helms et al., 2013) or to determine its origin (Stedmon and Markager, 2001). Eq. (1) assumes that absorption spectra follow a continuous exponential decrease as wavelength increases. If this assumption was true the spectral range (or the wavelength interval) used to fit the data should not influence the value of S. However, it is common to observe deviations, shoulders or peaks in absorption spectra (Fig. 1A, C). In these situations, the usefulness of S for characterizing DOM is limited by the spectral range over which it is calculated (Helms et al., 2008). Different spectral ranges, e.g., 300–700 nm, 275–295 nm, 350–400 nm, 280–650 nm, have been proposed to estimate S (Twardowski et al., 2004; Helms et al., 2008; Osburn et al., 2009). Using a narrow wavelength range often provides a different result from that obtained with a broader range (Twardowski et al., 2004) but even broad and quite similar spectral ranges (ex: 240– 700 nm vs 300–700 nm) can produce important differences in the estimation of S (Fig. 1A, B). Therefore, the dependence of S on the spectral range over which it is calculated severely limits our ability to compare results from the literature and hamper our understanding on how S varies in different aquatic ecosystems on the global scale. This is a serious issue since about three-quarters of the variability in S from the literature can be explained by the different spectral ranges used in each study (Twardowski et al., 2004). Another shortcoming of the current modeling approaches is that although high determination coefficients are often observed (R 2 N 0.99, Fig. 1), residuals from the models often show patterns that clearly violate the homoscedasticity assumption for regression

25

models of uncorrelated and normally distributed residuals (Fig. 1C, D). This suggests that current modeling approaches are not fully capturing or exploiting all the information provided in CDOM spectra. In this study, we propose a new method to model CDOM absorption spectra. The first objective is to obtain robust estimates of S that are independent of the spectral range used and therefore more reliable and comparable among studies. The second objective is to develop a method that can identify absorption peaks from specific chromophores in spectra. Such peaks can potentially provide additional information about the CDOM pool and dynamics of specific chromophores. The underlying hypothesis in our approach is that specific compounds, or structures in larger molecules, in significant amounts will show up as peaks or shoulders in absorption spectra causing deviations from the expected exponential decay curve (Fig. 1). The proposed method is based on a Gaussian decomposition approach that identifies and models spectral regions where peaks are occurring. Simulated artificial spectra with known characteristics have been used to evaluate the capability of the method to retrieve the true properties of complex spectra. We also tested the method on 290 measured CDOM spectra from the third Danish Galathea expedition that circumnavigated the world in 2006–2007. 2. Methods 2.1. Modeling framework The new modeling framework is based on two components. The first component models the general exponential decrease in CDOM absorption with increasing wavelength (Eq. (1)). The second component identifies and models regions deviating from the continuous exponential decay curve. A Gaussian decomposition approach is used to model the

Fig. 1. Absorption spectra (gray dots) and models (lines) fitted using Eq. (1) on (A) in situ and (B) simulated data (parameters used: a0 = 0.4 m−1, S = 0.02 m−1, K = 0.01 m−1; λ0 = 295 nm). Fits were obtained using the most common spectral ranges used in the literature (see legend). Residual plots for the model using the complete spectral range (240–700 nm) on (C) in situ and (D) simulated data. Green shaded polygons represent possible deviations not accounted for by the model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

26

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

absorption contribution from individual chromophores. The probability density function of a three parameters Gaussian curve is given in Eq. (2): ðx−μ Þ2 2σ 2



f ðx; φ; μ; σ Þ ¼ φe

ð2Þ

where σ (nm) is the standard deviation controlling the width of the curve, φ (m−1) is the height of the curve peak (φ ¼ σ p1ffiffiffiffi ) and μ (nm) 2π is the position of the center of the peak (Fig. 2C). The approach allows a linear combination of many individual Gaussian components to be present in a CDOM spectrum. Eq. (3) is obtained by combining Eqs. (1) and (2): 2

aCDOMðλÞ ¼ aCDOMðλ0Þ e−Sðλ−λ0Þ þ K þ

ðx−μ i Þ n X − 2 φi e 2σ i þ ε

ð3Þ

i¼0

where i = 0...n denote a particular Gaussian component and ε is the residual representing the variability not accounted by the model. If no Gaussian components are modeled (i = 0), Eq. (3) is equivalent to Eq. (1). A necessary initial step is to identify regions in a spectrum deviating from the general exponential decrease in absorption. This improves the estimate of S and provides the starting points for the determination of Gaussian components. This is done by an iterative procedure where the exponential model (Eq. (1)) is fitted to the spectrum after the wavelengths with the largest residual are identified and removed. This is repeated until all remaining residuals are less than C times the mean of absolute values of residuals. In this study, we used C = 1 (dimensionless ratio), but the optimal value for this criterion depends of the quality of the data (signal-to-noise ratio). The global procedure used to identify and model Gaussian components is presented in Algorithm 1 and illustrated in Fig. 2.

The search for optimal Gaussian parameter values (σ, φ, μ) was constrained in order to maintain their physical interpretation. Hence, for each Gaussian component, μ is allowed to vary by ±10 nm around the identified peak, σ is bounded by the full width at half of the maxipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mum (FWHM) in absorption ðFWHM ¼ 2 2 ln 2σ Þ and φ is bounded between 0 and the maximum absorption value of the identified peak. Algorithm 1. Global modeling procedure. 1 Fit Eq. (1) to the absorption data (Fig. 2A). Outlier data points (residual values that are ≥C times the mean of absolute residuals) are iteratively removed from the spectrum in order to obtain an initial exponential model (baseline exponential curve) not influenced by possible peaks. 2 Calculate the residuals of the baseline exponential model for the entire spectrum (Fig. 2B). 3 Smooth the residuals using a Savitzky–Golay smoothing filter (Savitzky and Golay, 1964) to remove potential noise in the data (Fig. 2B). 4 Determine the optimal number of Gaussian components that models the residuals from the baseline exponential model. Gaussian parameters for each component are estimated using Eq. (2). 5 Model the complete spectrum using the Eq. (3) (Fig. 2C). The Bayesian information criterion (BIC) was used to identify the optimal number of Gaussian components to model residuals while avoiding over-fitting (Schwarz, 1978). The BIC criterion is based on the principle of parsimony, helping to identify the model that accounts for the most variation with the fewest variables. Such a model selection requires the calculation of the BIC differences (Δi) for all candidate models in the set as Δi = BICi − BICmin (where BICmin is the smallest BIC of all models). The larger the Δi, the less plausible is the model. The Gaussian decomposition procedure is only performed on data points below 500 nm since absorption at higher wavelengths often has a low signal-to-noise ratio, being

Fig. 2. Schema of the modeling framework illustrating the fitting process on a simulated spectrum containing one Gaussian component. Parameters used to simulate the spectrum are as follows: aCDOM(350) = 0.04 m−1, S = 0.02 m−1, K = 0.00 m−1, σ = 15; μ = 300 nm; φ = 0.05 m−1. φ is the height of the curve peak (m−1), μ is the position of the center of the peak (nm), σ is the standard deviation controlling the width of the curve (nm). (A) Result of the baseline exponential fitting process. Orange points have been discarded to calculate the baseline exponential curve (see Section 2). (B) Residual plot of the baseline exponential model (observed data — fitted data). (C) The modeling procedure estimates the three parameters associated with all Gaussian components (see Algorithm 1). (D) Residuals from the estimated spectrum (estimated — data). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

close to the detection limit. We applied a threshold of 0.01 m−1 for φ in order to discard Gaussian components that are likely modeling noise, but as mentioned for the identification of outliers (the C-ratio), the value for such a threshold depends on the quality of the data set. 2.2. Simulation procedure In order to evaluate the capability of the proposed method to retrieve the true properties of complex spectra, we have developed an algorithm to generate artificial spectra with a random number of Gaussian components with known characteristics. A total of 1000 spectra were generated according to Algorithm 2. For each spectrum, both the traditional and proposed methods (Eqs. (1) and (3)) were used to estimate the parameters characterizing the generated spectrum. Two spectral ranges were used to estimate S with the traditional model presented in Eq. (1). S240–700 denotes the spectral slope calculated between 240 and 700 nm whereas S300–700 denotes the spectral slope calculated between 300 and 700 nm. We also calculated Slog which is obtained by calculating the slope of the linear regression between log-transformed absorption and wavelength (Helms et al., 2008). SGaussian denotes the spectral slope calculated using the proposed method (Eq. (3)). 2.3. Application of the proposed method on measured spectra A total of 290 measured CDOM spectra have been used to compare standard and proposed modeling approaches on samples collected during the first part of third Danish Galathea expedition. The spectra come mainly from the open Atlantic Ocean between latitude 34 and 67° North and between longitude 57° East and 12° West and include filtered seawater (pre-combusted Advantec GF75) from surface to 2850 m depth. Seawater samples were incubated with a bacterium inoculum for 180 days. Within this dataset, three spectra presenting light, medium and heavy peaks were selected to evaluate the robustness of the proposed modeling method to estimate S on spectra with different shapes. The CDOM spectra were measured at 0.5 nm intervals from 240 to 700 nm on a Shimadzu UV-2401PC spectrophotometer in 10 cm quartz cuvettes with fresh made MQ-water as reference. A MQ-spectrum at room temperature was recorded at every 10 samples and samples were at room temperature before measuring. Samples were measured within a few hours after sampling. Absorption coefficients were calculated using the following equation (Kirk, 1994): aCDOMðλÞ ¼

2:303  AðλÞ L

ð4Þ

where aCDOM(λ) is the absorption coefficient (m−1) at wavelength λ, A(λ) the absorbance at wavelength λ and L the path length of the optical cell in meters (0.1 m). Algorithm 2. Global modeling procedure. 1 Generate a baseline exponential curve using Eq. (1). The value of each parameter is drawn randomly from a distribution commonly observed in literature (0.01 ≤ S ≤ 0.04, 0.01 ≤ aCDOM(λ0) ≤ 0.5, − 0.005 ≤ K ≤ 0.025, λ0 = 350). Random normal noise is added to the data. 2 Randomly choose the number of Gaussian components (0–5) to add to the baseline exponential curve. For each Gaussian component, randomly choose values for the three parameters (σ, φ, μ) controlling the width, the height and the peak position of the curve (Eq. (2)). 3 Estimate S240–700, S300–700, Slog and SGaussian.

27

signal into its component parts (Eq. (2)) has been performed using the fminsearch() function from the neldermead R package. The nlsLM() function from the minpack.lm R package (Elzhov et al., 2013) was used to fit the data (Eq. (3)). We developed an R package entitled cdom that implements the necessary tools to model CDOM spectra accordingly to the proposed method. The package is available on CRAN website. 3. Results 3.1. Absorption spectra modeled using the traditional exponential approach The inadequacy of the traditional exponential method for modeling absorption is shown in Fig. 1 for a measured spectrum and an artificial spectrum. Fig. 1A shows a CDOM spectrum with two visible shoulders centered at wavelengths of ~275 nm and ~300 nm, respectively. S calculated with Eq. (1) over different wavelength ranges varied between 0.0162 and 0.0433 nm−1, representing a variation of 167%. An artificial spectrum with one peak centered at 300 nm shows the same phenomenon (Fig. 1B). Here, S varied between 0.0179 and 0.0291 nm−1 (63% variation) where a value of 0.0200 nm−1 was used to generate the spectrum. The residuals from the two spectra show patterns with two overlapping peaks in the measured spectrum (Fig. 1C). For the artificial spectrum, the peak used to generate the spectrum is clearly visible (Fig. 1D). Despite high R2-values (N 0.99), these deviations indicate that additional information can be retrieved from the CDOM spectra. 3.2. Analysis of artificial spectra An analysis of 1000 artificial spectra with known S-values shows that the traditional models produced higher spread in estimated Svalues (i.e., larger scatter around the regression line) compared to the proposed approach (Fig. 3). The highest errors in S estimations were observed with Slog method which uses the slope of the linear regression between log-transformed absorption and wavelengths. In this case, a relationship between the true value used to generate the spectrum and the estimated value was hardly visible and very weak, although significant due to the large number of data points (Fig. 3A, R2 = 0.014, p = 0.0002). S300–700 estimated using Eq. (1) on data between 300 and 700 nm also showed high variability, but for this technique the relationship to the true S was better (Fig. 3B, R2 = 0.384, p = b0.0001) although the error increased proportionally with the values of S. When the complete wavelength range was used (S240–700), the estimated values of S showed lower variability (Fig. 3C, R2 = 0.956, p = b 0.0001) and the spread in error estimations around the regression line did not present any identifiable patterns. On average, Slog was underestimated by 67% (Fig. 3A) whereas the simple exponential model (Eq. (1)) underestimated S300–700 by 27% (Fig. 3B) and S240–700 by 11% (Fig. 3C). Estimations of S using the proposed model showed less variability and the average error was reduced to 0.16% (Fig. 3D, R2 = 0.998, p = b0.0001). The medians and ranges spanned by the residual standard error (RSE) for the 1000 simulations are presented in Fig. 4. The median RSE for the model fitted using the log-transformed relationship was 0.5419 m−1 (lower = 0.0806, upper = 1.1459). The median RSE for the models fitted over 300–700 nm and 240–700 nm were 0.0061 m−1 (lower = 0.0009, upper = 0.0266) and 0.0101 m− 1 (lower = 0.0010, upper = 0.0405), respectively. The median error for the models fitted using the proposed method was 0.001 m− 1 (lower = 0.0009, upper = 0.0011). RSE values in Fig. 4 are presented on a log scale given the important differences among each model.

2.4. Statistical analyses

3.3. Analysis of measured spectra — results from the three selected test cases

All statistical analyses where performed in R version 3.1.2 (R Core Team 2015). The Gaussian decomposition of the overlapping-peak

In order to show how the proposed method performs on individual spectra, we selected three measured spectra presenting light, medium

28

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

Fig. 3. Comparisons between the specified and predicted spectral slopes (S) for 1000 artificial spectra obtained with: (A) the slope of the log-transformed data (Slog), (B) the non-linear traditional model fitted on data between 300 and 700 nm (S300–700), (C) the non-linear traditional model (Eq. (1)) fitted on data between 240 and 700 nm (S240–700) and (D) the proposed model (SGaussian).

and heavy deviations from an exponential model. The results are shown for three spectra together with the traditional exponential method applied over two wavelength ranges (S300–700, S240–700) as well with the new method (SGaussian). The new method (Algorithm 1) automatically identified that the spectrum with only a light deviation (Fig. 5A) was best modeled with two Gaussian components whereas medium (Fig. 5B) and heavy (Fig. 5C) deviating spectra were best modeled with three Gaussian components. Seven out of the eight identified Gaussian components had their peak position (μ) located in the UV range (b380 nm) and one component was identified in the blue region μ = 445 nm (Fig. 5B). Estimations of S were almost always higher when calculated using the new method (SGaussian). For instance, SGaussian was 2.4% and 3.3% higher compared to S240–700 and S300–700, respectively, for the spectrum with a light deviation (Fig. 5A and red bars in Fig. 6). SGaussian was higher by 69% and 51% compared to S240–700 and S300–700 for the spectrum with a medium deviation (Fig. 5B and green bars in Fig. 6). SGaussian deviated

by 60% and − 6.5% for the spectrum presenting a heavy deviation (Fig. 5C and blue bars in Fig. 6). These numbers confirm the pattern found in the artificial spectra that the new method provides higher estimates of S (Fig. 3). In all three spectra, RSE produced by the proposed new method was reduced between one and two orders of magnitude compared to the traditional method (Fig. 6B). Globally, for the 290 measured spectra, the error associated with S was lowest for SGaussian followed by S240–700 and S300-700 and this pattern was also observed for the three deviation intensity spectra in Fig. 5, once again, confirming the patterns observed in Fig. 3. In addition, no identifiable patterns were found in the residuals generated by the proposed method (data not shown but see Fig. 2D for an example). 3.4. Comparing results between artificial and measured spectra Estimations of S on artificial and measured spectra showed the same pattern with SGaussian N S240–700 N S300–700 (Fig. 7). S240–700 and S300–700 were similar between the artificial and measured spectra. A higher difference was identified for SGaussian between artificial and measured data. However, S240–700 and S300–700 were similar between the artificial and measured spectra (Fig. 7). 3.5. Analysis of 290 measured spectra (Gaussian component distribution)

Fig. 4. Boxplot showing medians and ranges spanned by the residual standard error (RSE) for the 1000 simulations for Slog, S300–700, S240–700 and SGaussian.

The results of the Gaussian decomposition performed on the 290 measured spectra are presented in Fig. 8. A total of 885 Gaussian components were found across all spectra. The number of modeled Gaussian components varied between 0 and 5 and the average number of components per spectrum was 3.3 ± 0.57 (mean ± SD). The peak position varied from 250 to 500 nm with a mean value of 312 nm, i.e., a majority of the peaks were found the UV-part of the spectrum. A frequency distribution indicates seven groups centered at 269, 299, 344, 375, 407, 444 and 485 nm (Fig. 8A). The height of the Gaussian components was on average 0.0187 ± 0.0311 m−1. The majority of the components had a height less than 0.025 m− 1 but with a long “tail” with values up to about 0.2 m− 1 (Fig. 8B). The width of the components was usually less than 30 nm but sometimes much higher (mean = 41 ± 39 nm, Fig. 8C).

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

29

Fig. 5. Application of the proposed method on three in situ spectra presenting light (A), medium (C) and heavy (E) deviations from an exponential pattern. Identified Gaussian components in each spectrum and their associated parameters (φ, μ, σ) are shown in panels B, D and F. The gray thick lines in the right panels represent the sum of the identified Gaussian components.

4. Discussion The general exponential shape of CDOM absorption spectra has been known for decades (Jerlov, 1968) as reflected in the terms “gelbstoff” or “yellow substances”. The effects of UV-radiation in aquatic ecosystems and the development of remote sensing techniques sparked a great interest in CDOM absorption in the 90′ and measurements of CDOM are now widely used. Most mathematical formulations used to model CDOM absorption are based on the original exponential approach proposed by Jerlov (1968) although early attempts at using Gaussian decomposition exists (Schwarz et al., 2002). We believe that the proposed new method contributes to this development as it provides a better description of CDOM spectra with lower errors (Figs. 4 and 6) and more precise S-values (Figs. 3 and 7) compared to the previous used methods. On average, the new method gives higher S-values compared to other methods (Figs. 3 and 7) such as the traditional exponential method (Eq. (1)) or the log-method, which could not even reconstruct the “true” values in the artificial spectra (Fig. 3). Different spectral ranges have been proposed to calculate S in order to obtain quantitative information about biological or physical processes that are known to influence the absorption of CDOM at specific wavelengths. In general, S calculated in the UV-B region (b300 nm) is used to obtain chemical characteristics of DOM whereas S derived from longer wavelengths is typically used in remote sensing applications (Stedmon et al., 2000; Bélanger et al., 2006). There are important compromises in using narrower versus larger spectral ranges to estimate S.

On one hand, S calculated over a large spectral range had the smallest error (Figs. 3 to 6). But because more data points are used, subtle changes in spectra occurring at specific regions are likely to be missed (see Fig. 5A for example). Consequently, using wider spectral range to model CDOM spectra only provides limited insights on the biogeocycling of DOM in aquatic ecosystems. On the other hand, S calculated using narrower wavelength range (ex.: 275–295 nm, Fig. 1) may provide precise information about specific processes altering the characteristics of CDOM (ex.: photodegradation) but are not fully exploiting all the information provided in the spectra. Different strategies, such as the spectral slope ratio (Helms et al., 2008) or the spectral curve (Loiselle et al., 2009), have been proposed to minimize the effect of choosing a spectral range for modeling CDOM spectra although the choice of the spectral range over which these methods are applied remains rather subjective. As a consequence, about three-quarters of the variability in S from the literature can be explained by the different spectral ranges used in each study (Twardowski et al., 2004). Our approach eliminate the difficulties associated with the choice of the spectral range used to estimate S and further exploits all the information contained in CDOM spectra. The results with artificial spectra showed that the proposed method provided more robust estimations of S compared to the traditional linear (Fig. 3A) or exponential (Fig. 3B, C) methods. Even when using the complete spectral range to model the data (i.e., less sensitive to the influence of peaks), we observed a relatively large spread of the S values when using the traditional methods (Figs. 3 and 6).

30

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

4.1. Ecological implications Remote sensing color algorithms (Bélanger et al., 2008; Massicotte et al., 2013), phytoplankton and chlorophyll-a concentration estimation models (Mitchell et al., 2002; Staehr and Markager, 2004) and attempts to model phytoplankton primary production (Alver et al., 2014; Thrane et al., 2014) often require a precise description of absorption properties to be adequately parametrized. Characterization of CDOM absorption represents a significant challenge for such parametrization (see Helms et al. (2013) and references therein). 4.2. Tracing the chemical characteristics of CDOM

Fig. 6. Barplots showing (A) S values estimated on in situ spectra presenting light, medium and heavy deviations (see Fig. 5) and (B) the residual standard error associated to each model. The groups represent the results obtained with the three methods.

Although our approach provides better estimates of S, it also uses more parameters compared to both traditional methods. The number of parameters can be calculated as 3 + (3 × n) where n is the number of Gaussian component in a single spectrum. For example, a total of 18 parameters would have to be estimated for a spectrum containing 5 Gaussian components. As pointed out by Draper and Smith (1998), the number of observations used in a regression analysis should be at least ten times the number of parameters in the model. Hence, a spectrum modeled with 5 Gaussian components should contain at least 180 data points in order to avoid overfitting. This should be systematically verified for every fit. Based on the analysis of 290 measured CDOM spectra, we found that 3.3 Gaussian components were used on average to model a spectrum, meaning that on average, approximately 130 data points will be required to avoid overfitting. This should not be problematic since most spectrophotometers are capable of providing spectra measured at 1 or even 0.5 nm increments which gives enough data points to avoid overfitting.

The bulk of DOM is essentially composed of carbon, nitrogen and phosphorus (Benner, 2002). The relative proportion of carbon content (DOC) to other nutrients (DON, DOP) is known to vary among aquatic ecosystems (Bronk, 2002; Karl and Björkman, 2002). Given that absorption peaks of DOC, DON and DOP occur at different wavelengths (Gaffney et al., 1992; Weishaar et al., 2003; Paz et al., 2010), the relative contribution of each nutrient (in significant amount) is likely to show up as peaks in CDOM spectra. In the same manner as parallel factor analysis (PARAFAC) can be used to decompose the complex fluorescence patterns of CDOM into underlying groups of fluorophores (Bro, 1997; Stedmon et al., 2003), the Gaussian decomposition approach used in the proposed method could explicitly identify and characterize these compounds (see Fig. 5 where potential chemical components were identified). 4.3. Tracing the diagnostic state of CDOM Processes such as bacterial degradation or photodegradation can influence the shape of CDOM spectra (Vähätalo and Wetzel, 2004; Helms et al., 2008, 2013; Zhang et al., 2013). A recent study conducted by Helms et al. (2013) has pointed out that the most dynamic regions influenced by photodegradation were 275–295 nm and 350–400 nm and that peak positions were shifted toward lower wavelengths (i.e., aromatic transitions) after irradiation. Based on the Gaussian decomposition of 290 measured samples, we found that 15% (n = 131) of peak positions were located between 275 and 295 nm and 14% (n = 124) between 350 and 400 nm (Fig. 8A). Furthermore, we observed that peak positions were normally distributed in these two regions, possibly reflecting that the analyzed CDOM samples were exposed to a wide range of UV-radiation exposure conditions (i.e., water depth, exposure time, water clarity). Assuming that the peak height represents the concentration of a specific compound or structure in DOM molecules, the proposed method might be able to provide information about the dynamic of such compounds used to study the kinetic of production or degradation of DOM in aquatic ecosystems.

Fig. 7. Boxplot showing medians and ranges spanned by the spectral slope (S) calculated using both the traditional and new methods for the two datasets used in this study.

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

31

the need for choosing a particular spectral range for estimating S. Furthermore, it can identify specific peak and thereby pave the road for information about the behavior of CDOM components in natural ecosystems. Acknowledgments This work was supported by the IMAGE project (09-067259) from the Danish Council for Strategic Research (Stiig Markager, S.M.). The CDOM data were kindly provided by the project A global perspective on dissolved organic matter funded by the Danish Council for Independent Natural Sciences (project 272-05-0318 to SM) and part of the third Danish Galathea expedition in 2006–2007. This is publication #114 from the Third Galathea Expedition. We acknowledge Ciarán Murray for helpful comments on the manuscript. P. Massicotte was supported by a postdoctoral fellowship from the Natural Sciences and Engineering Research Council of Canada (NSERC). References

Fig. 8. Distribution of parameters for 885 Gaussian components calculated on 290 in situ CDOM absorption spectra. (A) μ is the position of the center of the peak (nm, bin width ≈5 nm). Numbers above the density curve are peak positions. (B) φ is the height of the curve peak (m−1, bin width ≈0.005 m−1). (C) σ is the standard deviation controlling the width of the curve (nm, bin width ≈6 nm).

5. Conclusions In the growing awareness of the importance of DOM and CDOM in aquatic ecosystems and the increasing use of remote sensing where CDOM absorption is important, we believe that measurements of CDOM absorption spectra will escalate in the future. This will in turn enhance the demand for a robust method for analyzing CDOM spectra. The new method provides this and thereby better characteristics of CDOM spectra, which is likely to enhance our understanding of DOM biogeochemical cycling in natural aquatic ecosystems. The method is able to retrieve the true values used to create artificial spectra and eliminate

Alver, M.O., Hancke, K., Sakshaug, E., Slagstad, D., 2014. A spectrally-resolved light propagation model for aquatic systems: steps toward parameterizing primary production. J. Mar. Syst. 130, 134–146. Baker, A., Spencer, R.G.M., 2004. Characterization of dissolved organic matter from source to sea using fluorescence and absorbance spectroscopy. Sci. Total Environ. 333 (1–3), 217–232. Bélanger, S., Xie, H.X., Krotkov, N., Larouche, P., Vincent, W.F., Babin, M., 2006. Photomineralization of terrigenous dissolved organic matter in Arctic coastal waters from 1979 to 2003: interannual variability and implications of climate change. Glob. Biogeochem. Cycles 20. Bélanger, S., Babin, M., Larouche, P., 2008. An empirical ocean color algorithm for estimating the contribution of chromophoric dissolved organic matter to total light absorption in optically complex waters. J. Geophys. Res. 113 (C4), C04027. Benner, R., 2002. Chemical composition and reactivity. In: Hansell, D.A., Carlson, C.A. (Eds.), Biogeochemistry of Marine Dissolved Organic Matter. Academic Press, San Diego, CA, pp. 59–90. Bro, R., 1997. PARAFAC. Tutorial and applications. Chemom. Intell. Lab. Syst. 38 (2), 149–171. Bronk, D.A., 2002. Dynamics of DON. In: Hansell, D.A., Carlson, C.A. (Eds.), Biogeochemistry of Marine Dissolved Organic Matter. Elsevier, San Diego, pp. 153–247 http://dx. doi.org/10.1016/B978-012323841-2/50007-5. Coble, P.G., Green, S.A., Blough, N.V., Gagosian, R.B., 1990. Characterization of dissolved organic matter in the Black Sea by fluorescence spectroscopy. Nature 348 (6300), 432–435. Cole, J.J., Prairie, Y.T., Caraco, N.F., McDowell, W.H., Tranvik, L.J., Striegl, R.G., Duarte, C.M., Kortelainen, P., Downing, J.A., Middelburg, J.J., Melack, J., 2007. Plumbing the global carbon cycle: integrating inland waters into the terrestrial carbon budget. Ecosystems 10 (1), 172–185. Draper, N.R., Smith, H., 1998. Applied Regression Analysis. John Wiley & Sons, Inc., Hoboken, NJ, USA (Wiley). Elzhov, T.V., Mullen, K.M., Spiess, A.-N., Bolker, B., 2013. Minpack.Lm: R Interface to the Levenberg–Marquardt Nonlinear Least-squares Algorithm Found in MINPACK, Plus Support for Bounds. Foreman, C., Covert, J., 2003. Linkages between dissolved organic matter composition and bacterial community structure. Aquatic Ecosystems: Interactivity of Dissolved Organic Matter. Elsevier, pp. 343–362 (Chap. 14). Gaffney, J.S., Marley, N.A., Cunningham, M.M., 1992. Measurement of the absorption constants for nitrate in water between 270 and 335 nm. Environ. Sci. Technol. 26 (1), 207–209. Helms, J.R., Stubbins, A., Ritchie, J.D., Minor, E.C., Kieber, D.J., Mopper, K., 2008. Absorption spectral slopes and slope ratios as indicators of molecular weight, source, and photobleaching of chromophoric dissolved organic matter. Limnol. Oceanogr. 53 (3), 955–969. Helms, J.R., Stubbins, A., Perdue, E.M., Green, N.W., Chen, H., Mopper, K., 2013. Photochemical bleaching of oceanic dissolved organic matter and its effect on absorption spectral slope and fluorescence. Mar. Chem. 155, 81–91. Jerlov, N., 1968. Optical Oceanography. Elsevier Publishing Company, New York, p. 194. Jørgensen, L., Markager, S., Maar, M., 2014. On the importance of quantifying bioavailable nitrogen instead of total nitrogen. Biogeochemistry 117 (2–3), 455–472. Karl, D.M., Björkman, K.M., 2002. Dynamics of DOP. In: Hansell, D.A., Carlson, C.A. (Eds.), Biogeochemistry of Marine Dissolved Organic Matter. Elsevier, San Diego, pp. 249–366 http://dx.doi.org/10.1016/B978-012323841-2/50008-7. Keller, D.P., Hood, R.R., 2011. Modeling the seasonal autochthonous sources of dissolved organic carbon and nitrogen in the upper Chesapeake Bay. Ecol. Model. 222 (5), 1139–1162. Kirk, J.T.O., 1994. Light and Photosynthesis in Aquatic Ecosystems. second ed. Cambridge University Press, Cambridge [England]; New York, p. xvi (509 pp.). Kritzberg, E.S., Langenheder, S., Lindstrom, E.S., 2006. Influence of dissolved organic matter source on lake bacterioplankton structure and function — implications for seasonal dynamics of community composition. FEMS Microbiol. Ecol. 56 (3), 406–417.

32

P. Massicotte, S. Markager / Marine Chemistry 180 (2016) 24–32

Loiselle, S.A., Bracchini, L., Dattilo, A.M., Ricci, M., Tognazzi, A., Cézar, A., Rossi, C., 2009. The optical characterization of chromophoric dissolved organic matter using wavelength distribution of absorption spectral slopes. Limnol. Oceanogr. 54 (2), 590–597. Maar, M., Markager, S., Madsen, K.S., Windolf, J., Lyngsgaard, M.M., Andersen, H.E., Møller, E.F., 2016. The importance of local versus external nutrient loads for Chl a and primary production in the Western Baltic Sea. Ecol. Model. 320, 258–272. http://dx.doi.org/ 10.1016/j.ecolmodel.2015.09.023. Markager, S., Stedmon, C.A., Conan, P., 2004. Effects of DOM in marine ecosystems. In: Søndergaard, M., Thomas, D.N. (Eds.), Dissolved Organic Matter (DOM) in Aquatic Ecosystems, pp. 37–42 (Chap. 4). Markager, S., Stedmon, C.A., Søndergaard, M., 2011. Seasonal dynamics and conservative mixing of dissolved organic matter in the temperate eutrophic estuary Horsens Fjord. Estuar. Coast. Shelf Sci. 92 (3), 376–388. Massicotte, P., Frenette, J.-J., 2013. A mechanistic-based framework to understand how dissolved organic carbon is processed in a large fluvial lake. Limnol. Oceanogr. Fluids Environ. 3 (3), 139–155. Massicotte, P., Gratton, D., Frenette, J.-J., Assani, A.A., 2013. Spatial and temporal evolution of the St. Lawrence River spectral profile: a 25-year case study using Landsat 5 and 7 imagery. Remote Sens. Environ. 136, 433–441. McKnight, D.M., Boyer, E.W., Westerhoff, P.K., Doran, P.T., Kulbe, T., Andersen, D.T., 2001. Spectrofluorometric characterization of dissolved organic matter for indication of precursor organic material and aromaticity. Limnol. Oceanogr. 46 (1), 38–48. Mitchell, B.G., Kahry, M., Wieldand, J., Stramska, M., 2002. Determination of spectral absorption coefficients of particles, dissolved material and phytoplankton for discrete water samples. In: M. JL, F. GS (Eds.), Ocean Optics Protocols for Satellite Ocean Color Sensor Validation vol. 2. Academic Press, San Diego, pp. 231–258. Moran, M.A., Sheldon, W.M., Zepp, R.G., 2000. Carbon loss and optical property changes during long-term photochemical and biological degradation of estuarine dissolved organic matter. Limnol. Oceanogr. 45 (6), 1254–1264. Osburn, C.L., Retamal, L., Vincent, W.F., 2009. Photoreactivity of chromophoric dissolved organic matter transported by the Mackenzie River to the Beaufort Sea. Mar. Chem. 115 (1–2), 10–20. Paz, A.R., Bravo, J.M., Allasia, D., Collischonn, W., Tucci, C.E.M., 2010. Large-scale hydrodynamic modeling of a complex river network and floodplains. J. Hydrol. Eng. 15 (2), 152–165. Savitzky, A., Golay, M.J., 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639. Schwarz, G., 1978. Estimating the dimension of a model. Ann. Stat. 6 (2), 461–464. Schwarz, J.N., Kowalczuk, P., Kaczmarek, S., Cota, G.F., Mitchell, B.G., Kahru, M., Chavez, F.P., Cunningham, A., McKee, D., Gege, P., Kishino, M., Phinney, D.A., Raine, R., 2002. Two models for absorption by coloured dissolved organic matter (CDOM). Oceanologia 44 (2), 209–241.

Seitzinger, S.P., Hartnett, H., Lauck, R., Mazurek, M., Minegishi, T., Spyres, G., Styles, R., 2005. Molecular-level chemical characterization and bioavailability of dissolved organic matter in stream water using electrospray-ionization mass spectrometry. Limnol. Oceanogr. 50 (1), 1–12. Sharma, A., Schulman, S.G., 1999. Introduction to Fluorescence Spectroscopy. Wiley, New York; Chichester, p. xiv (173 pp.). Staehr, P.A., Markager, S., 2004. Parameterization of the chlorophyll a-specific in vivo light absorption coefficient covering estuarine, coastal and oceanic waters. Int. J. Remote Sens. 25, 5117–5130. Stedmon, C.A., Markager, S., 2001. The optics of chromophoric dissolved organic matter (CDOM) in the Greenland Sea: an algorithm for differentiation between marine and terrestrially derived organic matter. Limnol. Oceanogr. 46 (8), 2087–2093. Stedmon, C.A., Markager, S., Kaas, H., 2000. Optical properties and signatures of chromophoric dissolved organic matter (CDOM) in Danish coastal waters. Estuar. Coast. Shelf Sci. 51 (2), 267–278. Stedmon, C.A., Markager, S., Bro, R., 2003. Tracing dissolved organic matter in aquatic environments using a new approach to fluorescence spectroscopy. Mar. Chem. 82 (3– 4), 239–254 (English). Thrane, J.-E., Hessen, D.O., Andersen, T., 2014. The absorption of light in lakes: negative impact of dissolved organic carbon on primary productivity. Ecosystems 17 (6), 1040–1052. Twardowski, M.S., Boss, E., Sullivan, J.M., Donaghay, P.L., 2004. Modeling the spectral shape of absorption by chromophoric dissolved organic matter. Mar. Chem. 89 (1– 4), 69–88. Vähätalo, A.V., Wetzel, R.G., 2004. Photochemical and microbial decomposition of chromophoric dissolved organic matter during long (months-years) exposures. Mar. Chem. 89 (1–4), 313–326 (English). Weishaar, J.L., Aiken, G.R., Bergamaschi, B.A., Fram, M.S., Fujii, R., Mopper, K., 2003. Evaluation of specific ultraviolet absorbance as an indicator of the chemical composition and reactivity of dissolved organic carbon. Environ. Sci. Technol. 37 (20), 4702–4708 (English). Yamashita, Y., Boyer, J.N., Jaffé, R., 2013. Evaluating the distribution of terrestrial dissolved organic matter in a complex coastal ecosystem using fluorescence spectroscopy. Cont. Shelf Res. 66, 136–144. Zhang, Y., Liu, X., Osburn, C.L., Wang, M., Qin, B., Zhou, Y., 2013. Photobleaching response of different sources of chromophoric dissolved organic matter exposed to natural solar radiation using absorption and excitation-emission matrix spectra. PLoS One 8 (10), e77515.

Suggest Documents