Design of an Optimal Soil Moisture Monitoring Network using SMOS ...

SUBMITTED TO IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

1

Design of an Optimal Soil Moisture Monitoring Network using SMOS Retrieved Soil Moisture Kurt C. Kornelsen, Member, IEEE, and Paulin Coulibaly

Abstract— Many methods have been proposed to select sites for grid-scale soil moisture monitoring networks; however calibration/validation activities also require information about where to place grid representative monitoring sites. In order to design a soil moisture network for this task in the Great Lakes Basin (522 000 km2), the dual entropy multi-objective optimization algorithm (DEMO) was used to maximize the information content and minimize the redundancy of information in a potential soil moisture monitoring network. Soil moisture retrieved from the Soil Moisture and Ocean Salinity (SMOS) mission during the frost free periods of 2010-2013 were filtered for data quality and then used in a multi-objective search to find Pareto optimum network designs based on the joint entropy and total correlation measures of information content and information redundancy, respectively. Differences in the information content of SMOS ascending and descending overpasses resulted in distinctly different network designs. Entropy from the SMOS ascending overpass was found to be spatially consistent, whereas descending overpass entropy had many peaks which coincided with areas of high sub-grid heterogeneity. A combination of both ascending and descending overpasses produced network designs which incorporated aspects of information from each overpass. Initial networks were designed to include 15 monitoring sites, but the addition of network cost as an objective demonstrated that a network with similar information content could be achieved with fewer monitoring stations.

Index Terms—Information entropy, Remote sensing, Soil moisture, Soil Moisture and Ocean Salinity (SMOS)

I. INTRODUCTION

S

OIL moisture is a key environmental variable which is important for partitioning energy and water at the surface as well as providing a medium for the transport of bio-geo-chemicals [1]-[3]. With increasing recognition of the importance of soil

moisture as a geophysical variable [1]-[3], soil moisture monitoring networks began to be established [4]-[7] and large databases of soil moisture information made available [7]. Most established soil moisture networks are of limited geographical extent and there are only a few networks that cover an area of large extent [4]-[6], [8]. Because of the spatial heterogeneity and variability of soil moisture, capturing the dynamics of this variable with point scale in situ monitoring is difficult [1], [9], [10]. To complement in situ soil moisture monitoring, dedicated remote sensing platforms for soil moisture monitoring such as the Soil Moisture and Ocean Salinity (SMOS) [11] and planned Soil Moisture Active Passive (SMAP) [12] missions were established.

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). These data were obtained from the “Centre Aval de Traitment des Données SMOS” (CATDS), operated for the “Centre National d’Etudes Spatiales” (CNES, France) by IFREMER (Brest, France) K. C. Kornelsen is with the School of Geography and Earth Science, McMaster University, Hamilton, ON L8S 4L8 Canada (e-mail: [email protected]). P. Coulibaly is with jointly with the Department of Civil Engineering and the School of Geography and Earth Science, McMaster University, Hamilton, ON L8S 4L8 Canada, (e-mail: [email protected]).


2

Soil moisture observations from non-dedicated platforms are also available from the C Band Advanced Scatterometer (ASCAT) onboard the EUMETSAT satellite [13] and the Advanced Microwave Scanning Radiometer 2 (AMSR2) onboard the Global Change Observation Mission 1st-Water (GCOM-W1) [14]. SMOS carries a microwave radiometer which images the surface at the 1.4 GHz frequency (L-Band) [11]. This frequency is preferred for monitoring soil moisture as the wavelength penetrates the atmosphere to monitor surface emissivity. At the surface, the main factors affecting the observed brightness temperatures (TB) are the soil moisture, vegetation and surface temperature, whereas factors such as surface roughness and soil texture have secondary impacts on TB [15]. Besides soil moisture, vegetation can have a particularly dynamic impact on TB as stored water on the canopy and changes in vegetation water content can both contribute to overall TB and attenuate TB from the soil surface [15]. SMOS has an average ground resolution of 43 km and images the surface at 6 A.M. (ascending half-orbit) and 6 P.M. (descending half-orbit) local solar time every 3 days [11]. SMOS multi-angular TB measurements are used to retrieve surface (~5 cm) soil moisture using a single day level 2 processor [16]. Using similar concepts to the level 2 processor, a level 3 (L3) multi-day soil moisture retrieval is also available on a widely used 25 km EASE (equal-area scalable Earth) grid [16]. SMOS L3 soil moisture products are retrieved for both overpasses every day and aggregated temporally to provide global coverage in a 3 day product [16]. A primary use for many soil moisture networks is the calibration and validation (cal/val) of satellite soil moisture retrievals [8], [17]. Concepts such as temporal stability have been proposed to select a small number of monitoring sites that are most representative of mean areal conditions for monitoring soil moisture [9], [10], [18] within a grid cell. What has not been addressed is which grid cells are best to locate these cal/val sites within. Many dedicated cal/val sites are selected because they have good characteristics for soil moisture retrieval (i.e. [8], [10], [18]), and unfortunately an ad hoc approach is often applied for siting many distributed geophysical monitoring networks [19]. Careful selection of grid cells within which to place a cal/val site is important for the improvement of future soil moisture products, since monitoring in many different surface conditions can be expected to provide more insight into the shortcomings of retrieval methods. It will also help in situ networks move beyond cal/val activities and add value to operational soil moisture products. In order to select the best monitoring locations for a large scale network, Mishra and Coulibaly [19] recommend the use of entropy methods and multi-objective optimization. Entropy theory [20] provides a nonparametric measure of the uncertainty of an outcome of a discrete random process and can be used to determine the information content of a particular set of data [19], [20]. The dual-entropy multi-objective optimization (DEMO) system was developed based on the entropy concept for the design of optimal hydrometric networks [21]. DEMO implements the epsilon-dominance hierarchical Bayesian optimization algorithm (ε-hBOA) [22] to find a set of monitoring sites that maximizes the amount of information defined by the joint entropy while minimizing the shared information content [21]. In order to determine the information content in a potential network, DEMO requires estimates or observations of the geophysical variable for which the network is to be designed at each potential monitoring location. DEMO has also been applied in the Great Lakes Basin (GLB) of Canada and the U.S.A to design an optimum soil moisture monitoring network using SMOS L3 soil moisture for either ascending or descending overpasses [23]. Despite the minor differences of ’true’ soil moisture expected between the ascending and descending overpasses, differences in satellite geometry and surface conditions have resulted in different soil moisture retrievals between the two overpasses [23], [24] which was found to result in different optimal network designs [23]. In the current study, we will extend the work of [23] by applying the DEMO algorithm using combined information from both overpasses in the network design to determine if a trade-off could be found between both overpasses. The cost of network design was considered as an additional factor in designing an optimum soil moisture monitoring network. An analysis of the entropy of individual grid cells and visual comparison with high resolution optical remote sensing data provided insights about the selection of particular grid cells. The advantage of using SMOS L3 soil moisture in DEMO is that the resulting network design(s) will be


3

synergistic with the SMOS soil moisture product, resulting in a network that is designed for cal/val operations but does not make use of areas for which there is poor correlation between soil moisture and TB, or for which retrievals rarely occur. Also, the potential network designs will contain the maximum amount of information content and reduce the cost of networks by minimizing redundant monitoring sites. Given the shared frequency band, it is also anticipated that a SMOS designed network would be applicable for calibration/validation of SMAP data as well.. II. STUDY AREA The Great Lakes Basin (GLB) was chosen as a challenging study area in order to evaluate the potential of combining SMOS and DEMO for monitoring network design. The GLB is located on the boundary between Canada and the United States and contains a diverse range of surface conditions over a land area of approximately 522 000 km2. The lower GLB is predominantly covered with agricultural land, providing nominal targets for SMOS soil moisture retrieval. The lower portion of the GLB also contains many large urban centers and non-nominal mixed forests for which retrieval of soil moisture is still possible [25]. The upper portion of the GLB is the transition to the lower boundary of the boreal forest and the Canadian Shield, in which the soils become thinner and retrieval of soil moisture by SMOS is expected to be less likely. Another challenge of the presented study area is the presence of open bodies of water which has a negative impact on SMOS soil moisture retrievals [25]. In order to include all grid cells located on the basin boundary, the study area has been extended by a buffer of 10 km beyond the natural basin boundary. There are currently few long term soil moisture monitoring stations present in the GLB and many are located in areas for which there is a low percentage of successful soil moisture retrievals from SMOS, as seen in Fig. 1. The represented stations are from a combination of networks including USDA SCAN [5], USCRN [6], FluxNetCanada [26] and the McMaster Mesonet [27]. III. METHOD The design of potential soil moisture monitoring networks was based solely on the data provided by SMOS. Therefore, it was important to determine if there were differences in network design based on differences between ascending and descending orbits. Differences in diurnal weather patterns, the impact of radio frequency interference (RFI) and errors resulting from surface temperature estimates have all been suggested as possible causes of systematic differences in volumetric soil moisture retrieval between overpasses [24], [28]-[30]. To determine if the overpass scenario makes a difference on soil moisture network design, DEMO will be applied using 1) ascending, 2) descending and 3) combined ascending/descending half-orbits. A. DEMO The approach of DEMO is to use an advanced multi-objective evolutionary algorithm (ε-hBOA) [22] to solve the multiobjective optimization problem which consisted of a set of entropy factors. A brief overview of entropy concepts and the evolutionary algorithm as applicable herein will be presented and readers are directed to [21], [22], [31], [32] for further details. The marginal entropy (Shannon entropy) is a measure of the amount of information retained by a variable (site), based on the uncertainty of a particular outcome (xi) of a finite sample of a discrete random process with n events. The terms marginal entropy, marginal uncertainty and entropy are all synonymous and can be used interchangeably. The marginal entropy (uncertainty) H(X) is measured in bits and defined in (1) as: n

H ( X ) = −∑ p( xi )log 2 p( xi )

(1)

i =1

where X represents the variable (site), and the probability of the occurrence of the ith event is denoted by p(xi). The number of


4

events is determined by discretizing the time series data into an arbitrary number of bins. In this study, the retrieved soil moisture x was converted to an integer, where each integer value was considered an individual bin. This allowed the retained entropy to be calculated without any discretization, providing more precise results, but also retaining noise [19]. Also, since the marginal entropy is determined by the probability of event i in a set X, it is not necessary that the ‘time series’ represented by X be continuous. This allows for the consideration of SMOS soil moisture retrievals which have large gaps during the winter months when soils would be expected to be frozen. Equation (1) can be expanded to the joint entropy (2), which is a measure of the amount of information contained in two variables as follows:

H ( X , Y ) = −∑∑ p (xi , y j )log 2 p (xi , y j ) n

m

(2)

i =1 j =1

where p(xi, yj) is the joint probability of variables X and Y with n and m events, respectively. The information retained by a particular network design with N sites of the set X1,X2,…,XN each having n1,n2,…,nN events is defined in (3): n1 n 2

nN

i =1 j =1

k =1

H ( X 1 , X 2 ,2 X N ) = −∑∑2 ∑ p(x1 , x2 ,2 x N ) log 2 p(x1 , x2 ,2 x N ).

(3)

The multivariate joint entropy, H(X1,X2,…,XN), provides a measure of the total amount of information contained within a particular network design [21], [32]. The information content in (3) is determined from the joint probability distribution of all stations in a potential network and can be interpreted as the variability of states monitored by the network. Therefore, a network where sites had similar states and high co-variability would have low joint entropy. The total correlation [21], [32] is a measure of the nonlinear redundancy of the information contained by the sites as defined in (4): N

C ( X 1 , X 2 ,2 X N ) = ∑ H ( X i ) − i =0

H ( X 1 , X 2 ,2, X N )

(4)

Total correlation is equal to mutual information for N=2 stations [32]. It is a measure of the uncertainty in the network that would be lost with knowledge of each particular site, or the amount of information that is shared by the sites. The entropy factors in (1)-(4) were computed using the grouping property of mutual information [21], [32], [33] and all are measured in bits. The multi-objective optimization problem was solved by ε-hBOA [22], where each potential monitoring site in a network was set either on or off (1 or 0). An evolutionary algorithm was used to select those sites which maximized the amount of network information defined by the joint entropy (3), while minimizing the network redundancy defined by the total correlation (4), subject to the total number of stations desired. The parameters used for DEMO can be found in Table I. Each SMOS-EASE grid cell was considered as a potential location for a station in a complete monitoring network, where ε-hBOA evolves a population of potential network designs containing the desired number of grid cells which were evaluated for Pareto non-dominance. Since ε-hBOA is initialized with a random population, the same procedure was re-applied using 50 different random seeds and the final Pareto front was calculated based on the results of all 50 simulations. All of the computational work was performed using an Opteron cluster available in SHARCNET (Shared Hierarchical Academic Research Computer Network), which is a consortium member of Compute/Calcul Canada. Samuel et al. [21] applied DEMO to determine the number of new monitoring sites to be added to an existing network to satisfy World Meteorological Organization (WMO) guidelines [34]. Since many existing sites of soil moisture networks in the


5

GLB are close to the lakes, and therefore have questionable data with respect to SMOS, and guidelines for the size of soil moisture monitoring networks are nonexistent [34], this research will seek to design a network with a total of 15 monitoring stations, disregarding the existence of current stations. To evaluate the impact of network size on total correlation and joint entropy, the combined overpass scenario was also evaluated a second time with a third objective function to minimize the size (cost) of the network allowing for between 1 and 15 stations to be added. B. SMOS Data Processing The calculation of entropy in DEMO requires a regular and consistently sampled time series of the desired geophysical variable. Also important, is the length and sampling of the time series. Too short a data record or one which has undergone significant temporal aggregation minimizes the inter-site variability which could lead to poor network design. To best satisfy these criteria, the SMOS L3 3-day product from the Centre Aval de Traitment des Données SMOS (CATDS) was selected. This was because the retrieval of soil moisture with the L3 algorithm has been found to have a higher percentage of successful retrievals [35] and the 3-day product provides regular global coverage. The 25 km L3 soil moisture is also closer to the native 43 km SMOS resolution and has less oversampling than the 15 km SMOS L2 soil moisture data product. This mitigates the evaluation of several potentially redundant grid cells by DEMO. Soil moisture retrievals were acquired for all possible sampling days between the months of April and September of 2010 – 2013, which approximately corresponds to the frost free season in the study area. Soil moisture data between 2010 and 2012 were a combination of operational and reprocessed data all using version 2.45 of the soil moisture processor. Soil moisture data from the beginning of April to June 28, 2013 were derived from version 2.52 of the soil moisture processor and data following June 28, 2013 were from processor version 2.60. The use of different processor versions resulted in temporal inconsistency in the dataset. Since entropy in DEMO is calculated based on the joint probability distribution at each instance in time, therefore analyzing spatial differences, the potential errors resulting from this inconsistency were presumed to be minimal. A filtering strategy was required to ensure that the soil moisture monitoring networks were designed with only high quality data. Each grid cell was first filtered by the percentage of successful retrievals. The percentage of successful retrievals was calculated based on the number of L3 data products for which a soil moisture value was available for a particular grid cell, as compared to the potential number of successful retrievals. For each overpass scenario, any grid cell with less than 50% successful retrievals was not considered as a candidate for a soil moisture monitoring station for that scenario. The remaining grid cells were filtered using the SMOS data quality index (DQX), where individual retrievals with a DQX greater than 1.0 were removed and replaced by a missing value. This DQX was relatively high, but retains a higher number of retrievals and allows for the designed network to account for ‘low quality’ retrievals which provides important information for cal/val applications. The entire dataset was filtered a third time, where remaining grid cells with greater than 30% missing values, either from failed retrievals or high DQX, were also discarded from the analysis to minimize potential errors from infilling. A simple temporal interpolation approach was used for each grid cell to infill missing values [36]. The longest period of consecutive missing values was 87 days (12%), although this occurred at only one grid cell. The average maximum number of consecutive missing values per grid cell was 18 (3%) with a standard deviation of 14 (2%). The use of interpolation for infilling SMOS data resulted in lower variability at sites with more missing values. This effectively penalized the entropy of those sites, reducing their selection by DEMO. Periods with consecutive missing values most often occurred at the beginning and end of each study year. This coincides with the spring melt/rains and the onset of snow in some northern regions of the study area. The longest period with missing values that impacted much of the study domain occurred near the end of April 2010 when SMOS had a globally low percentage of successful retrievals [35]. The final amount of spatial, (grid cells after filtering (crosses in Fig. 1, 3 & 6)) and


6

temporal data (successful retrievals) used for each overpass scenario can be found in Table II. IV. RESULTS AND DISCUSSION A. Length of Required Time Series The main objective functions for the DEMO approach were based on the accurate determination of the marginal entropy for each potential site. Since marginal entropy was calculated from the probability density function (PDF) of the measured variable, it must be established that a complete form of the PDF was being presented to DEMO for evaluation. Lack of complete information will result in a set of non-dominated networks that are based on an inadequate representation of joint entropy and total correlation. Since a definite form of the soil moisture temporal probability distribution is presumed to be unknown, the principle of maximum entropy [37], [38] can be used to determine if the time series are of sufficient length. The principle of maximum entropy states that the probability distribution, which in this case is determined by sampling the time series, which best represents the current state of knowledge is the one with the largest entropy [37], [38]. Figure 2 demonstrates that as the number of sampled data increased, the entropy for each potential station, represented by the gray lines, tended toward a stable maximum value. Since the data represented in Fig. 2 are organized as a ‘time series’, given a long period of time there was the possibility of introducing new information, such as unusual wet or dry periods. This resulted in a slight increase in entropy as more information was added. Despite this, the approximate maximization of the entropy when using the entire data series suggests the sampled series were of sufficient length to reasonably represent the entropy for all of the SMOS-EASE grid cells. Comparison of the ascending Fig. 2 a) and descending Fig. 2 b) overpasses demonstrates an interesting contrast in the stability of SMOS retrievals over time. Ignoring the leap day and periods of SMOS data loss, each sampling year (e.g. April-September) covers a period of approximately 180 days. It can be seen that for descending overpasses, the approximate maximum entropy was found by sampling the time series up to (inclusive) the middle of second sampling year (around data point 300) and that the variance of entropy between grid cells reached a near minimum prior. For the ascending overpasses, the variability and entropy stabilized near a maximum with approximately the same number of data in the sample; however, there was a slight increase in entropy as the number of data points considered increased. This can be partially explained by the updates in the soil moisture processor from version 2.45 to 2.60, which impacted the entropy of both ascending and descending orbits when including data points 600-700, but did not explain the increase in the ascending overpass retrievals prior to that time. A possible explanation for this trend was the higher probability of RFI found for ascending overpasses, particularly during the early period of the SMOS mission [35], which would provide possibly contaminated soil moisture values, therefore inflating the entropy. For the combined overpass scenario, (Fig. 2 c) the samples from the descending overpasses were added to the end of the dataset of the ascending overpasses. The result was a slight increase in the information content (entropy) of the dataset when more than 700 data points were considered. B. Entropy of SMOS Retrievals The marginal entropy of the soil moisture retrievals at each SMOS grid cell were interpolated using inverse distance weighting (IDW) in Fig. 3. The soil moisture retrievals from the ascending overpass had a greater range of uncertainty (2.4 - 5.8 bits) as compared to the descending overpass (3.6 - 6.0 bits). Qualitatively, the ascending entropy was more spatially homogenous (smooth), as compared to the descending overpass entropy. To make a further qualitative assessment of the source of marginal uncertainty seen in the soil moisture retrievals of Fig. 3, the entropy was superimposed on a Landsat image of the study area (not shown). As expected, those regions with high entropy were often in areas with high surface heterogeneity. For ascending overpasses, the high entropy regions on the eastern edge of the study area were a mix of forest, meadows, bare rock, small open


7

water bodies and some agriculture. Similarly, the multiple ‘hot spots’ of entropy found in the descending overpass were collocated with areas that would represent high sub-pixel heterogeneity. Focusing on the central portion of the study area (Michigan Peninsula and southern Ontario), allowed comparison of ascending and descending half-orbits. Much of the area that was found to have mid-high entropy in the ascending overpass was dominated by agricultural fields and was relatively homogenous. A possible explanation of the high uncertainty of the soil moisture state, which is similar to variability [31], may be due to morning condensation [29], or irrigation inflating the soil moisture range. In contrast the ‘hot spots’ in the descending overpass largely occur in mixed grid cells where there was both agriculture and forest/ urban areas/ open water. This may suggest that retrievals from the SMOS descending overpass were more sensitive to sub-grid heterogeneity, possibly caused by heterogeneity in late day temperatures [29], [30]. These considerations are currently speculative and future research will be required to validate some of the inferred conclusions. It should be understood that discrepancies between ascending and descending entropies do not necessarily indicate better accuracy of one overpass retrieval compared the other, but rather a difference in dispersion of values (uncertainty of a particular value), the reasons for which are not fully assessed here. A combination of ‘hot spots’ in terms of entropy does not necessarily represent good sites for a network, since the joint entropy considers the entropy contained at all sites. However, it is more likely that these regions will contain unique information which would contribute to their preferred inclusion in an optimum network. C. Design of Optimum Networks The final result of the DEMO algorithm was an ensemble of network designs which were non-dominated with respect to joint entropy and total correlation. The Pareto frontiers for each overpass scenario resulting from the multi-objective optimization can be seen in Fig. 4. Since each point on the Pareto front represents a network of 15 stations, it is difficult to make a comparison of all stations for all network designs. Following [21], [32], differences between the extreme network designs on the Pareto front, in terms of X and Y axes, and a middle solution (M) will be presented in detail in Fig. 5. Figure 2 shows that there is little difference in entropy between the ascending, descending and combined overpass scenarios; however, the networks designed using data from the combined overpasses had distinctly higher joint entropy than either individual overpass scenario, but a similar range of total correlation to the descending overpass scenario. Since the joint entropy can be considered as a sum of the mutual information at all stations in the network and the information content unique to each station [21], [32], this higher range demonstrates that combining data from both overpasses increased the degree to which each grid cell was unique from other cells. Similarly, the amount of shared information, represented by the total correlation, was similar to the descending orbit scenario, demonstrating that the addition of ascending and descending data did not increase the redundancy of information over the descending dataset alone. As expected, the extremes of the combined overpass scenario captured features of the extremes of both other scenarios. For example, the extreme Y network designed from ascending data (YA) did not select any sites north of Lake Superior, whereas the descending and combined networks (YD & YC) both have sites selected in this area. Also, the YA and YC networks have sites selected in the center of the Michigan Peninsula (between Lakes Michigan and Huron) but YD does not. D. Probability of Station Selection In order to make a spatial comparison of each overpass dataset for network design, the probability that an individual grid cell is selected for a Pareto optimum network is shown in Fig. 6. Care should be taken in the interpretation of Fig. 6, as a combination of grid cells with high probability should not be misinterpreted as an optimum network itself. Networks designed from the ascending overpass scenario had a relatively small number of grid cells that were selected in two or more networks, and almost all grid cells were selected in at least one network design [23]. This large number of singularly


8

selected grid cells comes from the XA end of the ascending Pareto front in Fig. 4a where there were 378 network designs found with the same amount of joint entropy (H = 9.4635), but dissimilar total correlation. This result suggests that, for the ascending overpass dataset, the selection of 15 stations to represent the study area was more than was required. The inclusion of a small number of grid cells with high probability of selection was likely responsible for a majority of the measured joint entropy, and inclusion of extra grid cells beyond that high entropy group had little impact on network information content. The reason for the similar information content across grid cells may be due to the stability of the SMOS ascending overpass retrieval [28], [29]. Possible explanations for the ascending retrieval stability are the greater consistency in surface temperature between grid cells during the early morning [29], or the impact of hydrologic redistribution by vegetation to the surface, resulting in more homogenous soil moisture [1]. The increased stability and slightly greater accuracy of the ascending overpass soil moisture retrieval, as found by [28], [29], may provide more accurate information for soil moisture network design but is limited by decreased heterogeneity. Also, there were fewer potential grid cells to place a monitoring site for the ascending overpass (crosses in Fig. 6a) as more grid cells were removed from analysis by the SMOS data pre-processing step. The presence of condensation on the vegetation canopy complicates the retrieval of soil moisture resulting in either a failed retrieval or poor quality retrieval [15] and the removal of these grid cells from this analysis. This problem was particularly prevalent in the northern part of the study area, which is the southern extent of the boreal forest, and explains why there were fewer grid cells considered north of Lakes Superior and Huron for the ascending overpass scenario [23]. The descending and combined overpass datasets had 18 and 16 network designs, respectively, that were located on the Pareto front. The smaller number of potential network designs was due to greater variability in the entropy between grid cells from the descending overpass, resulting in limited optimum network designs [23]. For all three overpass scenarios, those stations which had the highest probability of being selected for a soil moisture monitoring network were located in grid cells of intermediate entropy that surrounded cells of high entropy. The grid cells selected for network design could, therefore, be interpreted as areas which have high contrast to surrounding grid cells. For the application of DEMO to design an actual soil moisture monitoring network, it is recommended that soil moisture retrievals from both half-orbits (combined) be used. This recommendation partially results from the finding that the probability of network selection from the combined overpass scenario does not appear to be dominated by the information from either single overpass scenario, a distinct advantage compared to the individual retrievals presented in [23]. That is, the combined overpass scenario appears unbiased and was found to be representative of information from both single overpass scenarios. Also, since the entropy from each overpass had a different spatial distribution (Fig. 3) and the accuracy of soil moisture retrieval was considered unknown, preference should not be given to either scenario. Additionally, the mixed selection of stations in the combined overpass scenario resulted in network designs that can be used to assess the differences between retrievals from ascending and descending overpasses. Such information is important for cal/val activities and may provide insights for the development/advancement of soil moisture retrieval algorithms. E. Effect of Number of Stations and Search Space To investigate the impact of the number of stations on network design, cost was incorporated as a third objective to be minimized by ε-hBOA, where each station was considered to have a cost of 1. The Pareto front for this DEMO evaluation (Fig. 7) demonstrates that the majority of the joint entropy (H>10.3) can be captured by 4 to 8 monitoring sites and the inclusion of additional sites in the monitoring network also resulted in an increase in the total correlation. The large number of networks with only 1 site resulted because 1 is the minimum of the third objective dimension. Therefore, any unique value of joint entropy and total correlation with a cost of 1 is non-dominated in the cost dimension. The grid cells with the highest probability of selection


9

when cost is also considered can be seen in Fig. 6 d. The GLB was chosen as a relatively large area with a natural geographic boundary within which to apply DEMO. For future uses the computational limitations of the DEMO approach are considered. To provide an indication of computational requirements, the serial DEMO algorithm was run on a single processor using data from the combined overpass scenario. The total number of grid cells considered was incrementally increased (200- 1 100) and an optimum network consisting of 15 stations was sought. As the number of potential stations increased, the memory requirements increased linearly, whereas the CPU time increased exponentially, such that the scenario with 1 100 potential sites required greater than 8 days to run per random seed. The CPU time limitation of the current version of DEMO means that it is impractical to design an optimal global (very large scale) network using DEMO without first sub-dividing the area. Future work will mitigate this limitation by parallelization and algorithmic improvements. V. CONCLUSION An ensemble of Pareto non-dominated soil moisture monitoring networks were designed for the Great Lakes Basin by combining entropy methods with an evolutionary algorithm [21] using soil moisture information from the SMOS L3 soil moisture processor [16], [23]. The joint entropy and total correlation of soil moisture retrieved using the SMOS L3 algorithm for ascending, descending and combined overpasses was calculated over a selection of 15 grid cells. By minimizing the total correlation (redundancy) and maximizing the joint entropy (information content) the optimum 15 grid cells could be selected to produce a Pareto optimum monitoring network. Differences in the information content provided by ascending and descending overpasses were evident, as the ascending overpass had fewer successful soil moisture retrievals and had greater spatial homogeneity in terms of entropy than descending soil moisture retrievals. The percentage of successful soil moisture retrievals was high for the descending overpass, and the spatial distribution of entropy was heterogeneous having grid cells with distinct peaks of entropy. Areas of high entropy for both the ascending and descending overpass soil moisture retrievals tended to coincide with surface regions of high sub-grid heterogeneity. This was particularly evident for SMOS descending overpasses, which appear more sensitive to sub-grid variability. The spatial dissimilarity between grid cell entropy from the ascending and descending overpasses resulted in distinct network designs for the two overpass scenarios [23]. A combination of data from both overpasses produced Pareto-optimal monitoring networks which did not preference data from either overpass scenario. The result was a designed network which is suitable for cal/val and process understanding of both SMOS overpasses. When designing a network for large geographical areas the computational requirements may make it unfeasible to directly apply DEMO to all potential grid cells, requiring future research to address how to pre-determine which grid cells should be considered. The use of SMOS retrieved soil moisture resulted in the design of a soil moisture monitoring network which is synergistic with the satellite retrieved soil moisture. However, such a network will also suffer all of the limitations of the SMOS retrieval algorithm, including under-representation of forested areas, errors in the presence of frozen soils, open water, sub-grid heterogeneity, etc. [15], [16]. This disadvantage may limit the applicability of such a network for activities such as land surface model evaluation, but may prove advantageous with respect to providing information for improving future retrieval algorithms. Future research is required to determine if similar spatial patterns are common to other missions such as SMAP. A limitation of this application of DEMO is that the selected locations for a monitoring site are only selected to a grid cell scale (25 km × 25 km), but provides no information for locating the point scale monitoring sites within the grid cell that are representative of the ‘true’ surface state. Fortunately, this task can be accomplished with accepted analytical methods [6], [9], [11].


10

ACKNOWLEDGEMENT The authors would like to thank Y. Kerr, S. Mecklenburg, and the team at CESBIO and ESA for providing training and guidance on the use of SMOS data products. We would also like to thank J. Kollat for providing ε-hBOA and J. Samuel for his contribution in the development of DEMO. We are grateful for the comments from two anonymous reviewers who have helped improve this manuscript. REFERENCES [1]

S. I. Seneviratne, T. Corti, E. L. Davin, M. Hirschi, E.B. Jaeger, I. Lehner, B. Orlowsky, and A.J. Teuling, “Investigating soil moisture-climate interactions in a changing climate: A review,” Earth Sci. Rev., vol. 99, no. 3-4, pp. 125-161, May 2010.

[2]

R. D. Koster, P. A. Dirmeyer, Z. C. Guo, G. Bonan, E. Chan, P. Cox, C. T. Gordon, S. Kanae, E. Kowalczyk, D. Lawrence, P. Liu, C. H. Lu, S. Malyshev, B.McAvaney, K. Mitchell, D. Mocko, O. Taikan, K. Oleson, A. Pitman, Y. C. Sud, C. M. Taylor, D. Verseghy, R. Vasic, Y. K. Xue, and T. Yamada, “Regions of strong coupling between soil moisture and precipitation,” Science, vol. 305, no. 5687, pp. 1138–1140, Aug. 2004.

[3]

D. R. Legates, R. Mahmood, D. F. Levia, T. L. DeLiberty, S. M. Quiring, C. Houser, and F. E. Nelson, “Soil moisture: A central and unifying theme in

[4]

T.E. Ochsner, M.H. Cosh, R.H. Cuenca, W.A. Dorigo, C.S. Draper, Y.

physical geography,” Prog. Phys. Geog., vol. 35, no. 1, pp65-86. Feb. 2011.

Hagimoto, Y.H. Kerr, K.M. Larson, E.G. Njoku, E.E. Small, and M. Zreda, “State of the art in large-scale soil moisture monitoring,” Soil Sci. Soc. Am. J., vol. 77, no. 6, pp. 1888-1919, Nov 2013. [5]

G. L. Schaefer, M. H. Cosh, and T. J. Jackson, “The USDA Natural Resources Conservation Service Soil Climate Analysis Network (SCAN),” J. Atmos. Oceanic Technol., vol. 24, no. 12, pp. 2073-2077. Dec. 2007.

[6]

J. E. Bell, M. A. Palecki, C. B. Baker, W. G. Collins, J. H. Lawrimore, R. D. Leeper, M. E. Hall, J. Kochendorfer, T. P. Meyers, T. Wilson, and H. J.

[7]

W.A. Dorigo, W. Wagner, R. Hohensinn, S. Hahn, C. Paulik, A. Xaver, A. Gruber, M. Drusch, S. Mecklenburg, P. van Oevelen, A. Robock, and T.

Diamond, “U.S. Climate Reference Network soil moisture and temperature observations,” J. Hydrometeorol., vol. 14, no. 3, pp. 977-988, Jun. 2013.

Jackson, “The International Soil Moisture Network: A data hosting facility for global in situ soil moisture measurements,” Hydrol. Earth Syst. Sci., vol. 15, no. 5, pp. 1675-1698, May 2011. [8]

J. C. Calvet, N. Fritz, F. Froissard, D. Suquia, A. Petitpa, and B. Piguet, “In situ soil moisture observations for the CAL/VAL of SMOS: The SMOSMANIA network,” in Proc. IGARSS, Barcelon, Spain, pp. 1196-1199, Jul. 7-11 2008.

[9]

L. Brocca, F. Melone, T. Moramarco, and R. Morbidelli, “Spatial-temporal variability of soil moisture and its estimation across scales,” Water Resour. Res., vol. 46, no. 2, pp. W02516, Feb. 2010.

[10] J.M. Jacobs, B.P. Mohanty, E-C. Hsu, and D. Miller, “SMEX02: Field scale variability, time stability and similarity of soil moisture,” Remote Sens. Environ., vol. 92, no. 4, pp. 436-446, Sep. 2004. [11] Y.H. Kerr, P. Waldteufel, J.-P. Wigneron, S. Delwart, F. Cabot, J. Boutin, M.J. Escorihuela, J. Font, N. Reul, C. Gruhier, S. E. Juglea, M.R. Drinkwater, A. Hahne, M. Martin-Neira, and S. Mecklenburg, “The SMOS mission: New tool for monitoring key elements of the global water cycle,” Proc. IEEE, vol. 98, no. 5, pp. 666–687, May 2010. [12] D. Entekhabi, E.G. Njoku, P.E. O’Neill, K.H. Kellogg, W.T. Crow, W.N. Edelstein, J.K. Entin, S.D. Goodman, T.J. Jackson, J. Johnson, J. Kimball, J.R. Piepmeier, R.D. Koster, N. Martin, K.C. McDonald, M. Moghaddam, S. Moran, R. Reichle, J.C. Shi, M.W. Spencer, S.W. Thurman, L. Tsang, and J. van Zyl, “The Soil Moisture Active Passive (SMAP) mission,” Proc. IEEE, vol. 98, no. 5, pp. 704–716, May 2010. [13] Z. Bartalis, W. Wagner, V. Naeimi, S. Hasenauer, K. Scipal, H. Bonekamp, J. Figa, and C. Anderson, “Initial soil moisture retrievals from the METOP-A Advanced Scatterometer (ASCAT),” Geophys. Res. Lett., vol. 34, pp. L20401, Oct. 2007. [14] H. Fujii, T. Koike, and K. Imaoko, “Improvement of the AMSR-E algorithm for soil moisture estimation by introducing a fractional vegetation coverage dataset derived from MODIS data,” J. Remote Sens. Soc. Jpn., vol. 29, no. 1, pp. 282-292., Jan. 2009. [15] Y. H. Kerr, P. Waldteufel, P. Richaume, J. P. Wigneron, P. Ferrazzoli, A. Mahmoodi, A. Al Bitar, F. Cabot, C. Gruhier, S. E. Juglea, D. Leroux, A. Mialon, and S. Delwart, “The SMOS soil moisture retrieval algorithm,” IEEE Trans. Geosci. Retmote Sens., vol. 50, no. 5, pp. 1984-1403, May 2012. [16] Y. H. Kerr, E. Jacquette, A. Al Bitar, F. Cabot, A. Mialon, P. Richaume, A. Quesney, and L. Berthon, “CATDS SMOS L3 soil moisture retrieval processor: Algorithm theoretical baseline document (ATBD),” CATDS, Toulouse, FR, Rep. SO-TN-CBSA-GS-0029, 2013. [17] S. Peischl, J. P. Walker, C. Rudiger, N. Ye, Y. H. Kerr, E. Kim, R. Bandara, and M. Allahmoradi, “The AACES field experiments: SMOS calibration and validation across the Murrumbidgee River catchment,” Hydrol. Earth Syst. Sci., vol. 16, pp. 1697-1708, Jun. 2012. [18] M. Choi, and J. M. Jacobs, “Soil moisture variability of root zone profiles within SMEX02 remote sensing footprints,” Adv. Water Resour., vol. 30, no. 4, pp. 883-896, Apr. 2007. [19] A.K. Mishra, and P. Coulibaly, “Developments in hydrometric network design: A review,” Rev. Geophysics., vol. 47, no. 2, pp. RG2001, Jun. 2009.


11

[20] C.E. Shannon, “A mathematical theory of communication,” Bell Syst. Technical J., vol. 27, no. 3, pp. 379-423, Jul. 1948. [21] J. Samuel, P. Coulibaly, and J. Kollat, “CRDEMO: Combined regionalization and dual entropy-multiobjective optimization for hydrometric network design,” Water Resour. Res., vol. 49, no. 12, pp. 1-20, Dec. 2013. [22] J. B. Kollat, P. M. Reed, and J. R. Kasprzyk, “A new epsilon-dominance hierarchical Bayesian optimization algorithm for large multiobjective monitoring network design problems,” Adv. Water Resour., vol. 31, no. 5, pp. 828-845, May 2008. [23] K.C. Kornelsen, and P. Coulibaly, “Design of an optimum soil moisture monitoring network using SMOS,” in Proc. IEEE IGARSS, Quebec City, PQ, 2014, pp. 3206-3209. [24] T. L. Rowlandson, B.K. Hornbuckle, L.M. Bramer, J.C. Patton, and S.D. Logsdon, “Comparisons of evening and morning SMOS passes over the Midwest United States,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1544-1555, May 2012. [25] A. Al Bitar, D. Leroux, Y. H. Kerr, O. Merlin, P. Richaume, A. Sahoo, and E. F. Wood, “Evaluation of SMOS soil moisture products over continental U.S. using the SCAN/SNOTEL network,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1572-1586, May 2012. [26] Oak Ridge National Laboratory Distributed Active Archive Centre (ORNL DAAC). (2013). FLUXNET Web Page. ORNL DAAC, TN. [Online]. Available: http://fluxnet.ornl.gov. [27] K. C. Kornelsen, and P. Coulibaly, “McMaster Mesonet soil moisture dataset: description and spatio-temporal variability analysis,” Hydrol. Earth Syst. Sci., vol. 17, pp. 1589-1606, Apr. 2013. [28] T. J. Jackson, R. Bindlish, M. H. Cosh, T. Zhao, P. J. Starks, D. D. Bosch, M. Seyfried, M. S. Moran, D. C. Goodrich, Y. H. Kerr, and D. Leroux, “Validation of Soil Moisture and Ocean Salinity (SMOS) soil moisture over watershed networks in the U.S.,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1530-1543, May 2012. [29] T. Lacava, P. Matgen, L. Brocca, M. Bittelli, N. Pergola, T. Mormarco, and V. Tramutoli, “A first assessment of the SMOS soil moisture product with in situ and modeled data in Italy and Luxembourg,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1612-1622, May 2012. [30] T. W. Collow, A. Robock, J. B. Basara, and B. G. Illston, “Evaluation of SMOS retrievals of soil moisture over the central United States with currently available in situ observations,” J. Geophys. Res., vol. 117, no. D9, pp. D09113, May 2012. [31] V. P. Singh, “The use of entropy in hydrology and water resources,” Hydrol. Process., vol. 11, no. 6, pp. 587-626, May 1997. [32] L. Alfonso, A. Lobbrecht, and R. Price, “Optimization of water level monitoring network in polder systems using information theory,” Water Resour. Res., vol. 46, no. 12, pp. W12553, Dec. 2010. [33] A. Kraskov, H. Stoegbauer, R. G. Andrzejak, and P. Grassberger, “Hierarchical clustering based on mutual information,” Europhys. Lett., vol. 70, no. 2, pp. 278-284, Apr. 2005. [34] World Meteorological Organization (WMO), “Guide to hydrological practices, volume I: Practices hydrology- From measurement to hydrological information,” WMO, Geneva, Switzerland, WMO 168, 16th ed., 2008. [35] Cesbio. (2013) “SMOS Blog: Performance Monitor” [Online]. Available: http://www.cesbio.ups-tlse.fr/SMOS_blog/?page_id=1235. [36] K. C. Kornelsen, and P. Coulibaly, “Comparison of interpolation, statistical and data-driven methods for imputation of missing values in a distributed soil moisture dataset,” J. Hydrol. Eng., vol. 19, no. 1, pp. 26-43, Jan. 2014. [37] E. T. Jaynes, “Information theory and statistical mechanics, I,” Phys. Rev., vol. 106, no. 4, pp. 620-630, May 1957. [38] E. T. Jaynes, “Information theory and statistical mechanics, II,” Phys. Rev., vol. 108, no. 2, pp. 171-190, Oct. 1957.

Kurt C. Kornelsen (M’14) received the B.Sc. degree in geography and B.Ed. in education from Brock University, St. Catharines, Ontario (Canada) in 2006 and 2007 respectively. He is currently pursuing the Ph.D. degree in the School of Geography and Earth Sciences at McMaster University in Hamilton, Ontario (Canada). He formerly taught secondary school science at the District School Board of Niagara, Ontario (Canada) from 2007-2010. His research focuses on methods for assimilation of soil moisture and L-band microwave emissions from the Soil Moisture and Ocean Salinity (SMOS) satellite into land surface models, soil moisture modelling and optimizing the design of environmental monitoring networks.


12

Paulin Coulibaly, received a B.ASc. and a M.ASc. from the University of Nice, Sophia (France) and a PhD from Laval University (Canada). He is currently Professor at McMaster University (Ontario, Canada). His research interests include: Hydrologic modeling and forecasting; Statistical hydrology; Hydrologic data assimilation; Soil moisture modeling; Analysis of climate change and variability impact on water resources; Water monitoring networks design and optimization. Dr. Coulibaly is currently leading the NSERC Canadian FloodNet – a Strategic research network for enhancing flood forecasting capacity in Canada.


13

Fig. 1. Percentage of successful soil moisture retrievals for each SMOS grid cell in the Great Lakes Basin for a) ascending and b) descending half-orbits. Crosses represent the center of each SMOS-EASE grid cell. Existing soil moisture monitoring stations from USDA SCAN, USCRN, FluxNet Canada and the McMaster Mesonet are shown as white crossed circles. These sites are presented for reference but are ignored in this analysis. (See online version for color figure.)

Fig. 2. The mean (black line) and variance (red line) of entropy for each SMOS grid cell (gray lines) of the a) ascending, b) descending and c) combined overpasses as the number of data points in the sample increases. Maximization of the entropy implies the sample pdf is representative of the population. (See online version for color figure.)

Fig. 3. The marginal entropy of the SMOS retrieved soil moisture for each grid cell considered by DEMO after pre-processing. The crosses represent the center of the considered SMOS-EASE grid cells for each overpass. The spatial consistency of SMOS ascending overpass a) as compared to the descending overpass b) is evident when comparing the entropy of the Michigan Peninsula and southern Ontario (lower middle of the study area). (See online version for color figure.)

Fig. 4. Pareto frontier of optimized soil moisture monitoring networks based on the maximization of joint entropy and the minimization of total correlation for a) ascending, b) descending and c) combined SMOS overpasses. For comparison the extreme X and Y and middle (M) soil moisture networks are identified on each Pareto front and correspond to the networks shown in Fig. 5.

Fig. 5. Extreme X, Y and middle (M) soil moisture networks from the Pareto front determined by DEMO based on data from SMOS a) ascending, b) descending and c) combined overpasses. (See online version for color figure.)

Fig. 6. Probability that a single grid cell will be selected for inclusion in a Pareto optimum network by DEMO. A combination of high probability grid cells do not represent an optimum network, but do indicate the relative importance of a particular grid cell in the design of a network. Crosses represent the center of the SMOS-EASE grid cells that were included in the DEMO analysis. (See online version for color figure.)

Fig. 7. Pareto frontier of optimized soil moisture monitoring networks based on the maximization of joint entropy and the minimization of total correlation and cost for combined SMOS overpasses. (See online version for color figure.)


TABLE I DEMO PARAMETERS Model Parameters

Ascending

Descending

Combined

Initial population size

10 000

10 000

10 000

Min. population size

10 000

10 000

10 000

Max. population size

100 000

100 000

100 000

Population sizing scheme

Injection

Injection

Injection

530

751

678

1060

1502

1356

ε for min. total correlation

0.00001

0.00001

0.00001

ε for max. joint entropy

0.00001

0.00001

0.00001

Number of decision variables (n)a Max. Generation (2n)

Crossover probability Mutation probability (1/n) Distribution index for SBX crossover Di t ib ti

i d

1.0

1.0

1.0

0.0019

0.0013

0.0015

15

15

15

f

TABLE II DESCRIPTION OF SMOS DATA USED

Scenario

Number of Grid Cells

Number of SMOS Retrievals

Mean Missing Values per Grid Cell

Full Study Area

1281

-

-

Ascending Overpass

530

706

71

Descending Overpass

751

706

58

14

100% Ascending Overpass

b)

Descending Overpass

0%

Retrieval Success

a)

a)

0.8

6 0.6 4

0.4

2

0.2

0

0 0.8

H(X) (bits)

6 0.6 4

0.4

2 0

0.2

0

100

200

300

400

500

c)

600

0 700 0.8

6 0.6 4

0.4

2 0

0.2

0

200

400

600

0 800 1000 1200 1400

Number of Data Points

σ2(H(X)) (bits)

b)

a)

Ascending Overpass

≤ 2.0

Descending Overpass

b)

4.0

6.0

Marginal Entropy (bits)

9.39

9.39

9.4

9.4

9.41 9.42 9.43 9.44

[MA]

9.45

[XA]

9.46 9.47

c)

[YD]

9.41 9.42 9.43 9.44

10.4 10.42

[MD]

9.45

[XD]

9.46

Joint Entropy (bits)

b)

[YA]



a)

[YC]

10.44

[MC] 10.46

[XC]

10.48

9.47 46 48 50 Total Correlation (bits)

52

55

60 65 Total Correlation (bits)

70

10.5 50

55 60 65 Total Correlation (bits)

70

XA MA YA

a)

b)

! ! (

!

! (! (

!

!

! (

!

( !!

XD MD YD

! (

! (

! (

!

! ! ! (

! ! ! !! ! !

! ! ! !

!

!

! !!! !! ! !

!

!

! (! ! ( ! ( ! (

( !

! ! !!

!

!

!

! (

!

! !

!

! ( ( !! ! (

! !

! ! ! (

! (

!

!

!

!

! (

(! ( !

! (

! (

! (

! (

! ((

!

! (

! (

! (

!

! (

! (

! (

!

!

c)

! (

XC MC YC

! (

! (

! (

(! ! (! ! (! ( !

! (

( !

!

! ( ! (

! !( !

! (

! ( ! ( ! (

!! ! (

! (

! (

! ( ! (

! (

! (! (!

!!

a)

Ascending Overpass

b)

Descending Overpass

c)

Combined Overpasses

d)

Combined + Cost

0.1

0

Probability of Selection

>0.2

10

9.8 9.9

8

10 10.1

6

10.2 10.3

4

10.4 10.5 40

2 50 60 Total Correlation (bits)

70

Cost


12 9.7