Theor Appl Climatol DOI 10.1007/s00704-015-1461-7
ORIGINAL PAPER
Local-scale spatial modelling for interpolating climatic temperature variables to predict agricultural plant suitability Mathew A. Webb 2,4 & Andrew Hall 1,3 & Darren Kidd 2,4 & Budiman Minansy 4
Received: 5 May 2014 / Accepted: 7 April 2015 # Springer-Verlag Wien 2015
Abstract Assessment of local spatial climatic variability is important in the planning of planting locations for horticultural crops. This study investigated three regression-based calibration methods (i.e. traditional versus two optimized methods) to relate short-term 12-month data series from 170 temperature loggers and 4 weather station sites with data series from nearby long-term Australian Bureau of Meteorology climate stations. The techniques trialled to interpolate climatic temperature variables, such as frost risk, growing degree days (GDDs) and chill hours, were regression kriging (RK), regression trees (RTs) and random forests (RFs). All three calibration methods produced accurate results, with the RK-based calibration method delivering the most accurate validation measures: coefficients of determination (R2) of 0.92, 0.97 and 0.95 and root-mean-square errors of 1.30, 0.80 and 1.31 °C, for daily minimum, daily maximum and hourly temperatures, respectively. Compared with the traditional method of calibration using direct linear regression between short-term and long-term stations, the RK-based calibration method improved R2 and reduced root-mean-square error (RMSE) by at least 5 % and 0.47 °C for daily minimum temperature, 1 % and 0.23 °C for daily maximum temperature and
* Mathew A. Webb
[email protected] 1
School of Environmental Sciences, Charles Sturt University, Albury, NSW, Australia
2
Department of Primary Industries, Parks, Water and Environment, 167 Westbury Road, Prospect, TAS 7250, Australia
3
National Wine & Grape Industry Centre, Charles Sturt University, Wagga Wagga, NSW, Australia
4
Faculty of Agriculture and Environment, The University of Sydney, Eveleigh, NSW, Australia
3 % and 0.33 °C for hourly temperature. Spatial modelling indicated insignificant differences between the interpolation methods, with the RK technique tending to be the slightly better method due to the high degree of spatial autocorrelation between logger sites.
1 Introduction Long-term economically successful propagation of horticultural crops often requires specific suitable climatic conditions. If the climate suitability of a region is marginal for a specific horticultural activity, a slight change in location can lead to an increase in the number of frosts or heat events at phenologically vulnerable times to make particular locations economically unviable. Rigorous consideration of the spatial climatic variability prior to plant establishment can therefore lead to greater economic profitability through the lifetime of the horticultural activity. Accurate local-scale climate mapping is an important tool in the planning process (Wratt et al. 2006). It is generally accepted that the optimum method of climate mapping at the local scale (0.1–10 km) involves a dense network of measurement sites (Thom 1976; Skaar 1980; Turner and Fitzharris 1986; Sansom and Tait 2004). However, as Sansom and Tait (2004) discovered, climate stations with long data series are generally spread too far apart for local-scale climate mapping, and although temporary climate stations can be installed in dense networks, data collection is often limited to a short period of time, typically a single season or year. In light of this, Sansom and Tait (2004) trialled four methods of estimating long-term climate information at locations with short-term data, with the two best methods being the following: linear regression and a percentile-adjusted spline method. The linear regression method, which was first advocated by Turner and Fitzharris (1986), involved the daily
M.A. Webb et al.
mean temperatures at each short-term station being regressed against daily mean temperatures from a nearby long-term station. The equations derived between short-term and long-term stations can subsequently be used to generate long-term datasets at the locations of the short-term stations. The percentile-adjusted spline method used a process whereby the decile temperature estimates based on spline interpolation of the deciles from long-term stations were shifted via percentile adjustment to align with the actual deciles of the shortterm stations. This method is dissimilar to the regression method in that the splines take into account all available long-term station records at once. This is in addition to using a digital elevation model (DEM), which acts as an explanatory variable in the spline estimation to account for temperature gradients (Hutchinson 1991). In comparing the two methods, Sansom and Tait (2004) found that the percentile-adjusted spline method was superior for estimating long-term temperature datasets at short-term climate stations, particularly for deciles ranging from 2 to 10. However, either method would still give acceptable results with a mean absolute error difference of less than 0.5 °C for deciles 2 and above. The application of the regression method as implemented by Turner and Fitzharris (1986) and Sansom and Tait (2004) was trialled in this study. In addition, a way of optimizing the regression method via the formation of daily/hourly temperature grids as well as daily/hourly temperature equations was implemented by simultaneously incorporating elevation and multiple long-term datasets. Furthermore, using calibration functions between each short-term station (i.e. temperature loggers) and estimates from the daily/hourly temperature grids/equations, a 20-year historical temperature dataset was derived for each of the loggers. From this and using certain temperature parameters for a range of crops that determine successful propagation, spatial modelling was undertaken to determine feasibility of quantifying climatic temperature variables such as frost risk, growing degree days (GDDs) and chill hours using long-term data estimated for the temperature loggers. The method of using short-term data to predict long-term temperature variables such as frost risk has been performed in the past by Wratt et al. (2006), using the previously discussed spline method (in combination with infrared satellite imagery). In that study, minimum temperatures derived from the fitted splines and frost incidence were related to the imagery using least-squares regression. In this paper, however, multiple data sources or covariate data were employed to predict incidences of frost. Specifically, three multivariate spatial modelling methods, which have the ability to incorporate multiple covariates into the modelling process, were trialled and contrasted: regression kriging (RK), regression trees (RTs) and random forest (RF) methods. The approach of utilizing covariate data for multivariate modelling purposes has been widely successful in the field of digital soil mapping (DSM) in combination with
the aforementioned spatial modelling methods (McBratney et al. 2003; Hengl et al. 2004). This paper demonstrates that a similar methodological approach can be applied to the spatial interpolation of climatic temperature variables to achieve a more robust and statistically superior product compared to traditional local-scale climate modelling approaches.
2 Methods 2.1 Study area The region studied covers 60,000 ha and forms the Meander Valley Irrigation area in central Northern Tasmania, Australia (Fig. 1). The topography of the study region is characterized by undulating plains interspersed with small- to medium-sized hills and ridge tops, with elevation ranging from 90 to 490 m and slopes generally not exceeding 3°. The area supports irrigated agriculture, with beef, sheep and vegetable production. 2.2 Deployment of temperature loggers/weather stations From 1 August 2011, 170 Tinytag temperature loggers (model no. TGP 4017 and manufactured by Gemini data loggers in Chichester, West Sussex, UK) and four weather stations (model DL30 Weather Maestro manufactured by Environdata in Warwick, Queensland, Australia) were deployed to the study area at a density of approximately one logger/station per 250 ha and programmed to record air temperature at 1.2 m above ground level (°C) every 10 min. The temperature loggers were housed in a Datamate Weather Screen (model no. ACS-5050 and manufactured by Hastings Data Loggers in Port Macquarie, NSW, Australia), which was protected within a purpose built steel cage with dimensions measuring 150× 30×30 cm. The temperature loggers were deployed according to a stratified-random pattern, i.e. to take into account major topographical features of the landscape within the study area and be randomly distributed within each of the features. Fuzzy k-means clustering was employed (Bezdek, 1984; Burrough et al. 2000), using the software ‘FuzMe’ (Minasny and McBratney 2002) to form statistically distinct clusters from which loggers stations could be evenly allocated. The 1-s Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) (Version 1.0, Geoscience Australia, Canberra) and the terrain derivatives of slope and aspect were used as the input data to the clustering. Results from the cluster analysis and the subsequent distribution of loggers within those clusters are illustrated in Fig. 2. To determine an optimum amount of clusters, validity functions such as the fuzzy performance index (FPI) and the modified partition entropy (MPE) were examined (Roubens 1982; Odeh et al. 1992). The FPI is a measure of the degree of fuzziness of the clusters while the MPE indicates the degree
Local-scale spatial modelling Fig. 1 Hill shade map of study region with location of temperature logger/weather station sites
of disorganization in the classification. The least amount of fuzziness together with the least disorganized number of classes, i.e. the minimum values, is considered suitable. A range of potential homogenous cluster groups, in this case two to ten, was compared. Based on the FPI and MPE measure, five clusters were deemed sufficient for the study area registering a value of 0.12 and 0.11, respectively. Note that the temperature loggers were placed randomly within each cluster type in a manner where an equal number of loggers resided within each cluster type but were randomly dispersed. Hence, the initial logger sampling regime involved 35 temperature loggers Fig. 2 Clusters generated from the fuzzy K-means clustering algorithm within which temperature loggers were allocated using a stratified random sampling regime
randomly located in each of the five clusters. Final logger placement was constrained by land use and therefore placed along fence lines to avoid damage by stock or farming operations and away from vegetation and infrastructure which could potentially affect reliability of results. Due to the placement and access constraints, manual logger allocation in the study area had a final tally within each cluster group of A, B, C, D and E amounting to 34, 35, 31, 33 and 41, respectively. To statistically test whether the loggers were successfully distributed across all of the topographic variables, the distribution of the entire topographic variable distribution was
M.A. Webb et al.
assessed against the distribution of the location of the temperature loggers (in conjunction with their location intersected with the topographic variables). The median and quartile ranges between the two distributions were similar (Table 1), indicating that the distribution of the loggers/stations was representative of the entire distribution of elevation, slope and aspect (Fig. 3). Note that the Fuzzy k-means algorithm treats aspect as a continuous and not a circular variable (i.e. recognizing 0 and 360 as dissimilar values). Despite this, it had an insignificant bearing on the final cluster output and was reflected by the final logger distributions with respect to its spatial locations (Fig. 3d); i.e. the logger distribution was able to sufficiently resemble the entire distribution of aspect (Fig. 3c) regardless if 0° and 360° should be recognized as similar values.
2.3 Adjusting local temperature data using long-term Bureau of Meteorology data Linear least-squares regression was used to describe the relationships between data from each logger/weather station and the corresponding temperature records from the nearest longterm Australian Bureau of Meteorology (BoM) stations (Launceston Ti Tree Bend, Launceston Airport, Cressy, Liawenee, Sheffield, Devonport airport, low head; refer to Fig. 4). Temperature data (August 2011–August 2012) recorded at 10-min intervals were acquired from the logger and weather stations in September 2012. From this dataset, for each logger/station location, daily minimum temperature (required for frost risk and GDD modelling), daily maximum temperature (required for GDD modelling) and hourly temperature (required for chill hour modelling) datasets were determined. For each long-term BoM station, data spanning 20 years (from 1992 to 2011), including daily minimum, daily maximum and hourly temperatures, were acquired. In addition, data for the 12 months that coincide with the temperature logger/station readings (i.e. between August 2011 and August 2012) were also acquired. To relate the long-term BoM stations to each one of the short-term temperature logger/stations in the 12-month recording period, three methods were trialled:
formation of daily/hourly temperature grids (BoM grids) using long-term BoM data (where the logger records could undergo linear regression with each estimated cell value at the XY logger location), formation of daily/hourly BoM equations (between long-term BoM temperatures and elevation to which the logger records could be related) and direct linear regression of each logger/station to its nearest long-term BoM station as advocated by Turner and Fitzharris (1986) and Sansom and Tait (2004).
2.3.1 Method 1: calibration using Bureau of Meteorology grids The BoM grids were produced using regression kriging (RK), a combination of stepwise linear regression of the dependent variable (i.e. temperature) with auxiliary variables (such as terrain parameters) followed by simple kriging of the regression residuals. It is mathematically equivalent to ‘universal kriging’ and ‘kriging with external drift’, where the covariates are incorporated directly to solve the kriging weights (Hengl et al. 2007; Minasny and McBratney 2007). The method of ordinary kriging to interpolate long-term daily temperature data (after de-trending via topoclimatic analysis) has been used in a similar manner by Jarvis and Stuart (2001b) and was found to produce reliable results. The 9-s (270-m resolution) digital elevation model (DEM) (Hutchinson et al. 2008), reprojected to Geocentric Datum or Australia, Zone 55, was used as the explanatory variable in the RK to explain the temperature gradients associated with elevation (Hutchinson 1991). The resolution of the DEM was deemed appropriate to form the BoM grids due to the sparseness of available BoM data but primarily to reduce considerable computer processing time that otherwise would have been encountered with higherresolution DEMs. The GIS software ‘System for Automated Geoscientific Analyses (SAGA version 2.08)’ (Böhner and Conrad 2007) was used to perform the RK, using the Universal Kriging module in SAGA with the option of weighting the matrix globally for all points. The default option of fitting linear models to the semivariogram plots was nominated. All BoM grids were produced at the same resolution of the 9-s SRTM DEM. In total, 14,610 BoM grids of daily minimum and
Table 1 Median and upper (75 %) and lower (%) quartile values comparing the entire study area distribution of elevation, slope and aspect versus the distribution coinciding with the logger locations Elevation (m)
Upper (75 %) quartile Median Lower (25 %) quartile
Aspect (degrees)
Slope (degrees)
Entire distribution
Logger location distribution
Entire distribution
Logger location distribution
Entire distribution
Logger location distribution
272.7 230.7 172
267.5 221.6 173
278.4 159 67
288.2 190.2 86
3.5 1.5 0.6
3.7 1.4 0.7
Local-scale spatial modelling Fig. 3 Histograms with nonparametric density curves comparing the entire distribution of the elevation model (a) versus the elevation model distribution coinciding with the logger locations (b), the entire distribution of the aspect model (c) versus the aspect model distribution coinciding with the logger locations (d) and the entire distribution of the slope model (e) versus the slope model distribution coinciding with the logger locations (f)
maximum temperature were produced for the entire 20-year period. Since only consistent hourly data was available after 2002, BoM grids of hourly temperature were interpolated for the 10year period from 2002 to 2011 and totalled 87,648 BoM grids. Temperature series concerning daily minimum, daily maximum and hourly temperatures were derived from the BoM grids for each of the logger/weather station X&Y locations (BoM grid cells intersected with the actual logger location). Linear equations were determined via least-squares regression using actual temperature data from each logger/weather station and the corresponding BoM grid intersects for days that coincided with the logging period (August 2011 to August 2012). The procedure produces the following calibration equation for each logger/weather station that describes the relationship to the BoM grid intersects: Y ¼ B1 X þ B0
ð1Þ
where Y is the predicted (calibrated) temperature estimate for the logger/weather station, X is the independent temperature
intersect value from the BoM grids, B0 represents the intercept and B1 represents the slope. Using the derived equations, the remaining long-term BoM grid estimates (i.e. grid estimates prior to the logging period) were sequentially used as the X variable to derive a history of daily (20 years) and hourly (10 years) temperatures for each logger/weather station: Y ði; jÞ ¼ B1ðiÞ X ði; jÞ þ B0ðiÞ
ð2Þ
where i and j denote the temperature estimate with respect to the logger/weather station (i) and the corresponding date/hour (j) (prior to the logging period). The equations used in this manner enable the long-term BoM grid estimates to be calibrated specifically to each temperature logger/weather station. Subsequently, a new calibrated dataset was created for each logger/station location, encompassing 20-year series of daily minimums and daily maximums and a 10-year series of hourly temperature data. The process is summarized in Fig. 5.
M.A. Webb et al. Fig. 4 Long-term Bureau of Meteorology (BoM) stations in the vicinity of the study area. 1—Devonport Airport, 2— Sheffield, 3—Liawenee, 4— Cressy, 5—Launceston Airport, 6—Launceston Ti Tree Bend and 7—low head
2.3.2 Method 2: calibration using Bureau of Meteorology temperature and elevation
2.3.3 Method 3: calibration using nearest Bureau of Meteorology station
For each daily/hourly record given by each of the long-term BoM station, linear least-squares regression using temperature together with the elevation values at each BoM site (derived from the 9-s DEM) was used to form daily/hourly linear equations (i.e. BoM equations; 14,610 and 87,648 equations for daily max/min and hourly BoM temperature data, respectively). Then, by using the elevation values of each of the logger/station locations (acting as the independent X variable), temperature values were estimated for each logger/station by using the coefficients generated from each daily/hourly BoM equation. For days coinciding with logging period, the estimated values together with the actual logger/station temperatures were then used in a similar linear regression procedure as outlined to produce Eq. 1, where the independent X variable would, in this case, represent the temperature values derived from the BoM equations. BoM equation estimates prior to the logging period were then sequentially used as the X variable in Eq. 2 to derive a history of daily (20 years) and hourly (10 years) temperatures for each logger/ weather station (and therefore enable the long-term BoM equation estimates to be calibrated specifically to each temperature logger/weather station). Figure 5 summarizes the procedure.
Each logger/station was partnered to its nearest available long-term BoM station by comparing the distance between each respective logger/station to all available BoM stations in proximity to the study area. The minimum straight-line distance was used as the basis to selecting a BoM station where a logger/station was partnered and their respective data series (for days coinciding with the logging period) could undergo direct linear least-squares regression to derive Eq. 1. BoM readings prior to the logging period were sequentially used as the X variable (using Eq. 2) to calibrate the history of daily (20 years) and hourly (10 years) BoM temperatures to each relevant (i.e. partnered) logger/weather station. Figure 5 outlines the tasks. 2.4 Applying the climatic temperature variables The following temperature variables were acquired from the Tasmanian Government Wealth from Water Pilot Program (http://dpipwe.tas.gov.au/agriculture/investing-in-irrigation) developed in partnership with the Tasmanian Institute of
Local-scale spatial modelling Fig. 5 Framework for calibrating temperature loggers to the Bureau of Meteorology (BoM) stations
Agriculture, Tasmania, Australia, and details soil and climate parameters that would be required for the successful propagation of a range of crops. It forms part of a land suitability study which is currently being undertaken for the irrigation areas of Tasmania. Climatic temperature variables including frost risk, GDD and chill hours are stipulated for 20 crop types. In this study, the following temperature rules were selected to be modelled and applied to the derived long-term temperature dataset of each logger/weather station using data estimated from calibrations using BoM grids (method 1).
2.4.1 Frost risk The frequency of years for which a value of less than −2 °C occurs for Tmin, the minimum air temperature at 1.2 m above the ground, in the period 15 September to 15 October, determines the level of frost risk for wine grapes. Categories of wine grape production suitability with respect to frost risk are defined by Sparrow L, Smith R, Cotching B and Kerslake F (personal communication, 1 September 2012) for frequencies 1 frost per year (>100 % of years) being ‘well suited for wine grape propagation’, ‘suitable for wine grape propagation’, ‘marginally suitable for wine grape propagation’ and ‘unsuitable for wine grape propagation’, respectively. Frost risk frequency was determined for each temperature logger/weather station by counting years that had at least 1 day of frost occurring at less than −2 °C (for days between 15 September and 15 October) for the 20-year period (i.e. 1992 through to 2011). This count was summed and divided by the total number years (i.e. 20 for this study) to derive the average value.
2.4.2 Growing degree days GDDs are a measure of heat accumulation that is used as a tool to predict plant phenology. It is quantified for each day to give a GDD unit and is calculated by taking the average of the daily maximum and minimum temperatures compared to a base temperature, Tbase (McMaster and Wilhelm 1997): GDD ¼
T max þ T min T base 2
ð3Þ
For the successful propagation of wine grapes, a base temperature of 10 °C was used to model GDD from October through to April. Categories of wine grape production suitability were defined with reference to Hall and Jones (2010): GDD >800, 750