Theor Appl Climatol DOI 10.1007/s00704-014-1253-5
ORIGINAL PAPER
Statistical downscaling of temperature using three techniques in the Tons River basin in Central India Darshana Duhan & Ashish Pandey
Received: 22 January 2014 / Accepted: 6 August 2014 # Springer-Verlag Wien 2014
Abstract In this study, downscaling models were developed for the projections of monthly maximum and minimum air temperature for three stations, namely, Allahabad, Satna, and Rewa in Tons River basin, which is a sub-basin of the Ganges River in Central India. The three downscaling techniques, namely, multiple linear regression (MLR), artificial neural network (ANN), and least square support vector machine (LS-SVM), were used for the development of models, and best identified model was used for simulations of future predictand (temperature) using third-generation Canadian Coupled Global Climate Model (CGCM3) simulation of A2 emission scenario for the period 2001–2100. The performance of the models was evaluated based on four statistical performance indicators. To reduce the bias in monthly projected temperature series, bias correction technique was employed. The results show that all the models are able to simulate temperature; however, LS-SVM models perform slightly better than ANN and MLR. The best identified LS-SVM models are then employed to project future temperature. The results of future projections show the increasing trends in maximum and minimum temperature for A2 scenario. Further, it is observed that minimum temperature will increase at greater rate than maximum temperature.
1 Introduction The solar radiation absorbed by the atmosphere and the heat emitted by the Earth increases the air temperature, which alter D. Duhan (*) : A. Pandey Department of Water Resources Development and Management, IIT Roorkee, Roorkee, Uttarakhand, India e-mail:
[email protected] A. Pandey e-mail:
[email protected]
the evaporation and transpiration processes. The alteration in temperature and atmospheric circulation due to climate change inexorably cause acceleration of the hydrological cycle and redistribution of water resources on spatial and temporal scales. This will affect the availability of water for the purpose of domestic, agriculture, hydropower generation, and ecological environment in a region and season, which ultimately affect the social economy of the region. IPCC (2007) reported that the global average surface temperature has increased by 0.74±0.18 °C during the last 100 years (1906–2005). During the 21st century, it is likely to increase further by 1.1 to 2.9 °C for their lowest emission scenario and 2.4 to 6.4 °C for their highest emission scenario (IPCC 2007). In previous studies, several researchers have reported temperature rise at different time scales over various parts of India (Chattopadhyay et al. 2011; Duhan et al. 2013; Jhajharia et al. 2012, 2013, 2014; Jhajharia and Singh 2011; Suryavanshi et al. 2014). In order to provide input to regional hydrological models and to assess the impacts of climate change on water resources or natural resources in a region, climate variables and climate change scenarios must be developed on a regional or sitespecific scale (Wilby et al. 2002). To obtain these values, projections of climate variables must be downscaled from the Global Climatic Model (GCM) simulation due to coarse resolutions of GCM, which is unable to provide local climate details to study climate change impact on water resources at a drainage basin. Dynamic and statistical downscaling methodologies are employed in past for downscaling climatic variables from GCM. However, statistical downscaling is most widely used due to comparatively cheap and computationally efficient than dynamic downscaling. Statistical downscaling methodologies can be broadly classified into three categories (Murphy 1999; Wilby et al. 2004), namely, weather generators, weather typing, and transfer function. The most commonly used statistical downscaling
D. Duhan, A. Pandey
approaches are the transfer functions, which model the relationships between large-scale atmospheric variables (predictor) and local surface variables (predictand) using traditional linear and nonlinear regression methods (Wilby et al. 2002). The traditional downscaling regression-based methods include linear regression, canonical correlation analysis (CCA), and principal component analysis (PCA) (Dibike and Coulibaly 2005). However, nonlinear regression models include artificial neural network (ANN) and support vector machine (SVM). In previous studies, temperature was downscaled using linear regression (Kostopoulou et al. 2007; Schoof and Pryor 2001; Goyal and Ojha 2010), CCA (Chen and Chen 2003; Kostopoulou et al. 2007; Skourkeas et al. 2010), ANN (Goyal and Ojha 2012; Kostopoulou et al. 2007; Schoof and Pryor 2001), and SVM (Anandhi et al. 2009) in different parts of the world. Kostopoulou et al. (2007) reported that MLR and CCA seemed to perform reasonably better than ANNs for simulation of minimum and maximum temperatures over Greece. Schoof and Pryor (2001) reported the superiority of ANN models compared to that of MLR for daily temperature at Indianapolis, USA. Tripathi et al. (2006) reported that least square support vector machine (LS-SVM) is superior to multilayer backpropagation ANN for downscaling of precipitation in meteorological subdivisions in India. The objective of this study is to examine the ability of three downscaling techniques to simulate maximum and minimum temperatures at three stations in the Tons River basin in Central India. To the author’s best knowledge, this is first attempt to compare MLR, ANN, and LS-SVM models for temperature downscaling in basin scale in India.
2 Materials and method 2.1 Study area The Tons River is a sub-basin of the Ganges River flowing through the states of Madhya Pradesh (MP) and Uttar Pradesh (UP) in Central India. It originates from Kamore hills in Satna district of MP which lies between 23° 57′ N to 25° 20′ N latitudes and 80° 20′ E to 83° 25′ E longitudes (Fig. 1). It flows through the fertile districts of Satna and Rewa in MP. Mean annual average precipitation of 930–1,116 mm/year and reference evapotranspiration (ETo) of 1,486–1,578 mm/year characterize Tons River basin (Darshana et al. 2013). Basin receives about 90 % of its annual precipitation during monsoon (June to September) season. The hottest month is May and coolest month is January. The main crops grown in the basin are wheat, rice, soyabean, millets, and pulses.
2.2 Data description 2.2.1 Temperature data Daily measured maximum and minimum air temperature data of three stations, namely, Satna, Rewa, and Allahabad, were procured from India Meteorological Department (IMD), Pune. The details of stations are given in Table 1. Quality of data was checked prior to analysis. By visual inspection, some noticeable error was found. There were few missing observations in the time series of maximum and minimum temperature, which were substituted with the corresponding long-term mean. The daily temperature values of each month were averaged to obtain the monthly values for all stations and used for further analysis.
2.2.2 Large-scale atmospheric variables The monthly mean atmospheric variables at different pressure levels were downloaded from the monthly reanalysis dataset of National Centers for Environmental Prediction (NCEP) at a scale of 2.5° (latitude)×2.5° (longitude) (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep. reanalysis.derived.pressure.html). The data was extracted for the period from January 1969 to December 2008 for nine grid points whose latitude ranges from 22.5° N to 27. 5° N and longitude ranges from 80.0° E to 85.0° E. The simulated data of CGCM3, T47 version, were downloaded from the Canadian Center for Climate Modeling and Analysis (CCCma) (http://www.cccma.ec.gc.ca/data/cgcm3/ cgcm3_t47_sresa2.shtml). The CGCM3 have a horizontal resolution of 3.75° latitude by 3.75° longitude and a vertical resolution of 31 levels. The data comprise present-day (20C3M) and future simulations forced by four emission scenarios, namely, A1B, A2, B1, and COMMIT. A2 scenarios are based on assumption that the atmospheric CO2 concentrations will reach 850 ppm in the year 2100 in a world characterized by high population growth, medium GDP growth, high energy use, medium/ high land use changes, low resource availability, and slow introduction of new and efficient technologies which is matching with the Indian condition. Therefore, in the present study, only A2 scenario data of CGCM3 was used. The monthly climate data of A2 scenario for nine grid points whose latitude ranges from 20.41° N to 27.83° N and longitude ranges from 78.75° E to 86.25° E for the period of January 2001 to December 2100 were downloaded. The nine grid points of spatial domain for climatic variables were chosen as suggested by Wilby and Wigley (2000). The CGCM3 data are regridded to NCEP grid size using the inverse square weighted interpolation method (Willmott et al. 1985).
Statistical downscaling of temperature using three techniques Fig. 1 Location map of the Tons River basin
2.3 Methodology Following methodology has been adopted for analysis: (1) selection of potential predictors using cross-correlation method; (2) brief description of PCA, MLR, ANN, and LS-SVM methods; (3) development of models by training and validation; (4) projection of maximum and minimum temperature from the best identified calibrated and validated model using simulated data of CGCM3 and bias correction in projected predictands; and (5) trend analysis in future
simulated predictands using Mann-Kendall (MK) and Sen slope estimator tests. The details of methods employed are discussed below: 2.3.1 Selection of predictors One of the most important steps in a downscaling is choice of appropriate predictors (Hewitson and Crane 1996). The selection of predictors varies from region to region based on the predictand and the characteristics of the large-scale
Table 1 Location of temperature stations and descriptive statistics (mean, standard deviation (SD), coefficients of skewness (CS), coefficients of kurtosis (Ck), and coefficient of variation (CV)) during study period S.N.
Station name
Lat.(N)
Long.(E)
Alt., m amsl
Duration of data
mean
SD
CS
Ck
CV
1. 2. 3.
Allahabad Satna Rewa
25.27 24.34 24.32
81.44 80.5 81.18
98 317 299
1969–2008 1969–2008 1969–2003
26.12 25.80 25.14
6.11 5.76 5.87
−0.41 −0.21 −0.24
−0.96 −1.04 −1.10
23 22 23
Lat., long., alt., and m amsl denote latitude, longitude, altitude and meter above mean sea level, respectively
D. Duhan, A. Pandey
atmospheric circulation. Any type of variable can be used as predictor, if a relationship exists between the predictor and the predictand (Wetterhall et al. 2005). As suggested by Wilby et al. (2004)), the predictors are selected using the following criteria: (1) The large-scale predictors should be physically relevant to the local-scale features and realistically simulated by the GCMs, (2) the predictors are readily available from archives of GCM output and reanalysis datasets, and (3) the predictors strongly correlated with the predictand. The potential predictors for temperature are selected based on physical processes. The temperature at any place generally depends upon circulation variables (i.e., represented by geopotential or the wind component) and other variables such as temperature (through geopotential heights at various levels and precipitable water content). Further, the temperature that occurs at any location is a result of the net radiation available and the way that radiation is budgeted. The net radiation (latent heat+sensible heat+horizontal heat transfer) depends on gains of solar and terrestrial energy. The available energy is then used for sensible heat transfer and evaporation. Therefore, in the present study, the downloaded potential predictors are air temperature (°C), geopotential height (m), precipitable water content (kg/m2), and zonal and meridional wind velocities (m/s) at different pressure levels (from 1,000 to 10 mb). Further, surface flux variables, which control temperature of the Earth’s surface are latent heat flux (W/m2), sensible heat flux (W/m2), net shortwave, and longwave radiation (W/m2), were also downloaded which is found to be important predictors in temperature downscaling (Anandhi et al. 2009). The product moment correlation method was used to select appropriate predictors at different pressure levels and grid point. 2.3.2 Description of methods used
so that the first eigenvector corresponds to the largest eigenvalue and in general the kth eigenvector to the kth largest eigenvalue k. The kth principal component at time t (pckt) is computed as follows: X ekq ½ðpt ðqÞ pðqÞÞ=S ðqÞ ð1Þ pckt ¼ q
where pt(q) is the value of qth variable (mean sea level pressure/geopotential height at any node) at time t. π(q) and S(q) are the mean and standard deviation of the variable p(q). ekq is the qth element of the eigenvector corresponding to kth eigenvalue. The percentage of total variance ωk explained by the kth principal component is the following: λk ωkt ¼ XM
λ m¼1 m
100
ð2Þ
where “m” is the dimensionality of the original dataset. The main advantage of PCA is that using a small number of principal components, it is possible to represent the variability of the original multivariate dataset. In this study, the number of principal components, which can together preserve more than 96 % of the total variance of original dataset, is used. Multiple linear regressions Multiple linear regression (MLR) method fits linear equations between a dependent variable and two or more independent variables. A good MLR model is able to explain most of the variance of the dependent variable with the minimum number of independent variables (Helsel and Hirsch 2002). The formula of MLR equation is described below: Y ¼ β 0 þ β1 X 1 þ β2 X 2 þ β 3 X 3 þ β4 X 4 þ ⋯⋯βn X n ð3Þ
The brief descriptions of methods employed, namely, PCA, MLR, ANN, and LS-SVM, are discussed below.
where Y is the dependent variable (temperature), β0 is the intercept, βi is the coefficient of the ith independent variable Xi (ith predictor).
Principal components analysis PCA is a statistical procedure to identify the patterns of multidimensional variables and to transfer correlated variables into a set of uncorrelated variables. Starting with the set of all variables (GCM outputs), the method generates a new set of variables called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other; so, there is no redundant information. For performing PCA (Gadgil and Iyengar, 1980), first, the covariance matrix of the normalized variables is computed. Each variable is normalized by subtracting the mean from it and then dividing the result by the standard deviation of the original variable. Eigenvectors resulting from the covariance matrix are used for PCA. The eigenvectors are orthonormal, and the indices are arranged
Artificial neural networks ANNs are a form of soft computing motivated by the functioning of the nervous system (ASCE 2000). It is a multilayer perceptrons used to map relationships between input variables and dependent output variables. The goal of the neural network is to minimize the root mean squared error (RMSE) between the predicted and observed value of the dependent variable. The basic structure of ANN usually consists of three layers, namely, input layer (where the data are introduced to the network), the hidden layer(s) (where data are processed), and the output layer (where the results of given outputs are produced). Typically, the number of hidden nodes are approximately 1.5 times the number of predictor variables (Eberhart and Dobbins, 1990). The neural network can be trained by adjusting the values of the weights between elements. The incoming data are processed by nonlinear
Statistical downscaling of temperature using three techniques
functions at hidden and output layers to get the output. The commonly used nonlinear function is logsigmoid function (ASCE 2000). Among the various type of neural network proposed in literature, the most popular one is feed-forward model. In this study, the ANN was trained using a backpropagation algorithm (Rumelhart et al. 1986). The backpropagation algorithm adjusts the connection weights according to the backpropagated error computed between the observed and the estimated results. This is a supervised training procedure that attempts to minimize the error between the desired and the predicted output (Lek and Guegan 2000). This algorithm repeatedly runs through the training data, comparing the predicted values and the observed values. The backpropagation learning algorithm has two parameters: the learning rate (α) and the momentum factor (η). The learning rate determines how much the weights are allowed to change each time they are updated. The momentum factor determines how much the current weight change is affected by the previous weight change. The weights of the neural network are adjusted as follows: wi; j ðnewÞ ¼ wi; j ðold Þ þ ηδi o j þ α Δwi; j ðold Þ ð4Þ where wi,j is the weight associated with the jth node in the ith layer, η is the momentum factor, α is the learning rate, oj is the output from the jth output node, and δi is the error signal determined by the following: δ i ¼ ðt i
oi Þoi ð1
oi Þ
ð5Þ
where ti is the observed value for the ith output node. The neural network sums the weight adjustments over an epoch and then adjusts the weights. After an initial period of rapid adjustment of the weights, the ANN reaches a stable solution indicating that the model has “learned” the data structure and may be applied for prognostic analysis of “new” data. Least square support vector machine LS-SVM is a least square version of SVMs, which are a set of related supervised learning methods that analyze data and recognize patterns and which are used for classification and regression analysis. In this method, the solution can be found by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. LSSVM proposed by Suykens and Vandewalle (1999) are a class of kernel-based learning methods. Considering a finite training sample of N patterns, {(xi, yi), i=1, . . . , N, where xi denotes the ith pattern in N-dimensional space (i.e., xi =[x1i, . . . , xNi ] ∈R N)} constitutes input to LS-SVM and yYi ∈R N is the corresponding value of the desired model output. Further, learning machine is defined by a set of possible mappings x→f (x, w), where f (·) is a deterministic function, which for a given input pattern x and
adjustable parameters w (w ∈R N) always gives the same output. Training phase of the learning machine involves adjusting the parameter w. The parameters are estimated by minimizing the cost function ΨL (w, e). The LS-SVM optimization problem for function estimation is formulated by minimizing the cost function. N 1 T 1 X 2 w wþ C e 2 2 i¼1 i
Ψ L ðw; eÞ ¼
ð6Þ
Subject to the equality constraint yi ¼ byi ¼ ei ; i ¼ 1; ::::::::; N
ð7Þ
where C is a positive real constant and ŷ is the actual model output. The first term of the cost function represents weight decay or model complexity penalty function. It is used to regularize weight sizes and to penalize large weights. The second term of the cost function represents penalty function. The solution of the optimization problem is obtained by considering the Lagrangian as Lðw; b; e; αÞ ¼
N 1 T 1 X 2 w wþ C e 2 2 i¼1 i
N X
n αi byi þ ei
yi ð8Þ
i¼1
where αi is Lagrange multipliers and b is the bias term. The conditions for optimality are given by 8 > > N > X > ∂L > > ¼ w− α i φ ð xi Þ ¼ 0 > > ∂w > > i¼1 > > > > N > > ∂L X > > ¼ αi ¼ 0 < ∂b i¼1 ð9Þ > > > > ∂L > > > ¼ αi −Cei ¼ 0; i; ::::::::::N > > ∂ei > > > > > > > > ∂L > : ¼byi þ ei −yi ¼ 0; i ¼ 1; ::::::::N ∂αi The elimination of w and e will yield a linear system instead of a QP problem. The above conditions of optimality can be expressed as the solution to the following set of linear equations after elimination of w and ei. " # T b 0 0 !1 ¼ ð10Þ −1 α y !1 Ω þ C I 2 3 3 y1 1 6 y2 7 ! 6 1 7 6 7 7 y¼6 4 ⋮ 5; 1 ¼ 4 ⋮ 5 yN 1 N N 2
where
ð11Þ
D. Duhan, A. Pandey Table 2 Potential predictors selected for maximum temperature and their correlations Variable name
S.No.
Potential variable selected
Pressure level
Grid location
Correlation with temperature
Allahabad
1
Temperature
1,000 925 850 700
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.82–0.95 0.81–0.85 0.79–0.95 0.71–0.80
2
Geo-potential height
3
U wind
4
V wind
5 6 7
Latent heat flux Sensible heat flux Net shortwave radiation Temperature
1,000 925 1,000 925 850 1,000 925 surface surface surface
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1) (1,1), (3,1) (1,1), (3,1) (2,3), (3,3) (1,3), (2,2), (2,3), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (2,1), (2,2), (3,1), (3,2) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
(-) 0.68–(-) 0.78 (-)0.64–(-)0.72 0.55, 0.64 0.57, 0.62 0.55, 0.64 0.51, 0.67 0.53, 0.70 0.55 0.52–0.73 (-) 0.76–(-) 0.92
1,000
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.77–0.97
925 850 700 1,000 925 1,000 925 850 1,000 925 surface surface surface
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (2,1) (1,1), (2,1) (1,1) (3,3) (3,3) (3,1) (1,1), (1,2), (2,1), (2,2), (3,1), (3,2) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.75–0.97 0.73–0.96 0.64–0.78 (-) 0.62–(-) 0.73 (-)0.57–(-)0.66 0.58, 0.60 0.57, 0.60 0.65 0.69 0.69 0.56 0.57–0.77 (-) 0.81–(-)0.95
1,000 925 850 700 1,000 925 1,000 925 850 1,000 925 surface surface surface
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (2,1) (1,1), (2,1) (1,1) (3,3) (3,3) (3,1) (1,1), (1,2), (2,1), (2,2), (3,1), (3,2) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.80–0.97 0.78–0.97 0.76–0.96 0.68–0.80 (-) 0.65–(-) 0.76 (-)0.61–(-)0.69 0.59, 0.62 0.57, 0.61 0.64 0.70 0.70 0.56 0.59–0.77 (-) 0.80–(-) 0.95
Satna
Rewa
1
2
Geopotential height
3
U wind
4
V wind
5 6 7 1
Latent heat flux Sensible heat flux Net shortwave radiation Temperature
2
Geopotential height
3
U wind
4
V wind
5 6 7
Latent heat flux Sensible heat flux Net shortwave radiation
Statistical downscaling of temperature using three techniques Table 3 Potential predictors selected for minimum temperature and their correlations
Allahabad
Rewa
Satna
S.No.
Potential variable selected
Pressure level
Grid location
Correlation with precipitation
1
Temperature
1,000 925 850 700
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.81–0.96 0.82–0.97 0.82–0.96 0.90–0.94
600 1,000 925 500 400 300 250 200 1,000 925 850 surface surface surface surface surface 1,000 925
(1,1), (1,2), (1,3), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,3) (1,2), (1,3), (2,2), (2,3), (3,3) (1,2), (1,3), (2,3), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (2,1), (3,1), (3,2), (3,3) (2,1), (3,1) (1,1), (2,1), (3,1) (3,1), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3)
0.81–0.90 (-) 0.87–(-) 93 (-) 0.84–(-) 97 (-) 0.80–(-)0.84 (-)0.84–(-)0.87 (-)0.83–(-)0.87 (-)0.83–(-)0.86 (-)0.82–(-)0.86 0.62–0.79 0.65–0.83 0.60–0.79 0.54–0.61 0.56–0.66 0.50, 0.63 (-) 0.68–(-) 0.72 (-) 0.55, (-) 0.56 0.82–0.95 0.80–0.95
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (2,1), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,3)
0.83–0.95 0.89–0.92 0.80–0.87 (-) 0.86–(-) 92 (-) 0.83–(-) 90 (-) 0.80–(-)0.83 (-) 0.80–(-)0.81 (-)0.80–(-)0.83 (-)0.80–(-)0.83 (-)0.80–(-)0.83 (-)0.81–(-)0.83 0.63–0.77
2
Geo-potential height
3
U wind
4
V wind
5 6 7 8 9 1
Precipitable water content Latent heat flux Sensible heat flux Net shortwave radiation Net longwave radiation Temperature
2
Geo-potential height
3
U wind
4
V wind
850 700 600 1,000 925 850 500 400 300 250 200 1,000
5 6 7 8 9
Precipitable water content Latent heat flux Sensible heat flux Net shortwave radiation Net longwave radiation
925 850 surface surface surface surface surface
(1,2), (1,3), (2,3), (3,3) (1,3), (2,3), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (2,1), (3,1), (3,2) (2,1), (3,1) (1,1), (1,2), (2,1), (2,2), (3,1), (3,2) (3,3)
0.61–0.79 0.70–0.75 0.52–0.56 0.58–0.64 0.54, 0.65 (-) 0.60–(-) 0.75 (-) 0.51
1
Temperature
2
Geo-potential height
1,000 925 850 700 600 1,000 925
(1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)
0.85–0.97 0.82–0.97 0.80–0.97 0.90–0.94 0.80–0.89 (-) 0.87–(-) 94 (-) 0.84–(-) 91
D. Duhan, A. Pandey Table 3 (continued) S.No.
Potential variable selected
3
U wind
4
V wind
5 6 7 8 9
Precipitable water content Latent heat flux Sensible heat flux Net shortwave radiation Net longwave radiation
3 2 1 0 α1 6 α2 7 6 0 1 7 6 α¼6 4 ⋮ 5; I ¼ 4 ⋮ ⋮ 0 0 αN 2
⋯ ⋯ ⋮ ⋯
Pressure level
Grid location
Correlation with precipitation
850 500 400 300 250 200 1,000 925
(1,1), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,3) (1,2), (1,3), (2,2), (2,3), (3,3)
(-) 0.80–(-)0.84 (-) 0.80–(-)0.82 (-)0.82–(-)0.85 (-)0.81–(-)0.85 (-)0.81–(-)0.85 (-)0.80– (-)0.84 0.64–0.79 0.62–0.82
850 surface surface surface surface surface
(1,3), (2,3), (3,3) (1,1), (1,2), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3) (2,1), (3,1), (3,2) (2,1), (3,1) (1,1), (1,2), (2,1), (2,2), (3,1), (3,2) (3,1), (3,3)
0.71–0.75 0.50–0.57 0.52–0.64 0.53, 0.62 (-) 0.61–(-) 0.75 (-) 0.52–0.53
3 0 0 7 7 ⋮5 1 N N
ð12Þ
Here, IN is an N×N identity matrix, and Ω∈ R is the kernel matrix defined by Ω which is obtained from the application of Mercer’s theorem. N×N
Ωi; j ¼ K xi; x j ¼ φðxi ÞT φ x j ⋯∀i; j
ð13Þ
where φ (·) represents the nonlinear transformation function defined to convert a nonlinear problem to linear problem in a higher dimensional feature space. K(xi, xj) is defined as the kernel function. The value of the kernel is equal to the inner
product of two vectors xi and xj in the feature space (xi) and (xj). The resulting LS-SVM model for function estimation is as follows: f ð xÞ ¼
X
αi K xi ; x j þ b*
ð14Þ
K(xi,xj) is the inner product kernel function defined in accordance with Mercer’s theorem (Mercer 1909) and b* is the bias. There are several possibilities for the choice of the kernel function, including linear, polynomial, and radial basis function (RBF). The linear kernel is a special case of RBF (Keerthi and Lin 2003). Further, the signed kernel behaves like RBF for certain parameters (Lin and Lin 2003). They are defined as follows.
Table 4 Total variance explained by principal component analysis for maximum temperature during the study period at different stations Station name Component
Allahabad Initial eigenvalues
Satna Initial eigenvalues
Rewa Initial eigenvalues
Total
% of variance
Cumulative %
Total
% of variance
Cumulative %
Total
% of variance
Cumulative %
1
58.06
72.57
72.57
57.08
74.13
74.13
57.39
74.53
74.53
2 3 4 5 6 7 8 9 10
12.07 4.72 1.61 .93 .77 .46 .27 .23 .20
15.09 5.90 2.01 1.16 .96 .57 .34 .29 .25
87.66 93.56 95.57 96.73 97.69 98.27 98.60 98.90 99.15
12.62 2.40 1.60 1.13 .58 .41 .28 .23 .16
16.39 3.12 2.08 1.46 .75 .54 .37 .30 .21
90.52 93.64 95.71 97.17 97.92 98.46 98.83 99.13 99.34
12.38 2.42 1.57 1.09 .59 .39 .28 .23 .16
16.08 3.14 2.04 1.41 .76 .51 .36 .30 .21
90.61 93.75 95.79 97.20 97.97 98.48 98.84 99.15 99.35
Statistical downscaling of temperature using three techniques Table 6 Details of multiple linear regression models at different stations for maximum (Tmax) and minimum (Tmin) temperature
Linear kernel K xi ; x j ¼ xTi x j
ð15Þ
Stations name
Predictands
Equation
Allahabad
Tmax
Tmax =32.16+5.15PC1 +1.43PC2 −1.06PC3 +0.58PC4 Tmin =19.38+6.22PC1 +2.40PC2 +0.34PC3 −0.53PC4 Tmax =32.26+2.74PC1 +4.21PC2 +0.44PC3 +1.25PC4 −0.09PC5 Tmin =19.33+5.85PC1 +2.53PC2 +0.41PC3 −0.45PC4 Tmin =18.12+5.88PC1 +2.64PC2 +0.30PC3 −0.50PC4 Tmin =18.12+5.88PC1 +2.64PC2 +0.30PC3 −0.50PC4
Polynomial kernel Tmin
d
K xi ; x j ¼ xTi x j þ t ; t ≥ 0
ð16Þ
Satna
Tmax Tmin
Radial basis function kernel
K xi; x j ¼ e
−
kx i
Rewa
2 x jk
Tmax
ð17Þ
2σ2
Tmin
where t is the intercept and d is the degree of the polynomial, σ is the width of RBF kernel, which can be adjusted to control the expressivity of RBF. In this study, LSSVMlabv1_8_R2006a_R2009a (http://www.esat. kuleuven.be/sista/lssvmlab/) is used. The LS-SVM model requires the regularization parameter “gam” and the squared kernel parameter (Sig2 in the case of the “RBF kernel”). The MATLAB function “tunelssvm” is employed for obtaining the values of the tuning parameters. There are three kernel types, namely, linear, polynomial, and RBF kernel, and three optimization algorithms: simplex (which works for all kernels), grid search (this one is restricted to two-dimensional tuning parameter optimization), and linesearch (used with the linear kernel) which is available for LS-SVM model. A possible combination of kernel types and optimization algorithms is used to identify the optimal value of tuning parameters based on the minimum value of the cost function. The tuning parameters are tuned via leave-one-out cross-validation or 10-fold crossvalidation depending on the size of the dataset. Leave-oneout cross-validation is used when the size is less than or
PC indicate the principle component of predictors in the regression equations
equal when et al. mean
to 300 points, and 10-fold cross-validation is used the size is greater than 300 points (De Brabanter 2011). The loss function used for cross-validation is square error (mse).
2.3.3 Model development For the model development, the temperature and NCEP data were divided into two datasets: One was for calibration and the other for validation. The available dataset is randomly partitioned into a training set and a test set following the multifold cross-validation procedure (Haykin 2003). About 70 % of data are selected for training, and remaining 30 % of data are selected for testing the model. For Allahabad and Satna stations, data from 1969 to 1996 are used for training,
Table 5 Total variance explained by principal component analysis for minimum temperature during the study period at different stations Component
1 2 3 4 5 6 7 8 9 10
Allahabad Initial eigenvalues
Satna Initial eigenvalues
Rewa Initial eigenvalues
Total
% of variance
Cumulative %
Total
% of variance
Cumulative %
Total
% of variance
Cumulative %
239.97 28.32 10.91 5.08 2.68 1.72 1.46
80.80 9.53 3.67 1.71 0.90 0.58 0.49
80.80 90.33 94.01 95.72 96.62 97.20 97.69
239.17 29.94 10.85 5.08 2.68 1.72 1.49
80.26 10.05 3.64 1.71 0.90 0.58 0.50
80.26 90.31 93.95 95.65 96.55 97.13 97.63
235.71 29.14 10.58 4.79 2.87 1.60 1.28
80.72 9.98 3.62 1.64 0.98 0.55 0.44
80.72 90.70 94.33 95.97 96.95 97.50 97.94
1.35 0.87 0.63
0.45 0.29 0.21
98.14 98.44 98.65
1.36 0.95 0.67
0.45 0.32 0.23
98.08 98.40 98.63
1.09 0.84 0.58
0.37 0.29 0.20
98.31 98.60 98.80
D. Duhan, A. Pandey
and data from 1997 to 2008 are used for testing. For Rewa station, data from 1969 to 1993 are used for training, and data from 1994 to 2003 are used for testing. Feature vectors in the training set are used for calibrating the model, and those in the test set are used for validation.
correlation coefficient (CC), root mean square error (RMSE), normalized mean square error (NMSE), and Nash-Sutcliffe coefficient (NASH). These performance indexes are explained below: (1) RMSE vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N
2 u1 X RMSE ¼ t yi byi N i¼1
2.3.4 Model performance analysis In this study, the performance of calibrated and validated models is evaluated by four performance indexes such as
Table 7 Results of “gam” and squared kernel parameter using different combinations of kernel types (linear, polynomial, and RBF kernel) and optimization algorithms (simplex, grid search, and line Variable name
Station names
Parameters
search) for maximum and minimum temperature based on minimum value of cost function
Kernel type RBF
Maximum temperature
Allahabad
Satna
Rewa
Minimum temperature
Allahabad
Satna
Rewa
ð18Þ
Linear
polynomial
simplex
gridsearch
simplex
linesearch
simplex
cost function gam sig2
2.3216 210.216 22.166
2.3213 401.416 33.68
2.5607 3.008 -
2.5607 2.9485 -
2.3244 0.00150852 -
t d cost function gam sig2 t d cost function gam sig2 t d cost function gam sig2 t d cost function
0.903 27582.86 60.769 1.224 105704.8 108.797 1.9219 140.869 10.08 1.1589
0.8876 122718.10 109.84 1.223 1591033.67 276.798 1.9022 81631.365 69.15 1.1446
1.05 2.988 1.424 4.04 2.572 2.829 1.86197
1.06 1.497 1.797 3.769 3.148 2.8416 2.338
42.0387 3 0.900 0.021 7.016 3 1.225 0.019 476.6 3 1.883 0.064 5.159 3 1.1378.
gam sig2 t d cost function gam sig2 t d
844.23 21.00 3.5475 631120.51 108.992 -
1344334.82 175.53 3.5458 115.149 8.3433 -
3.3539 4.389 2.167 -
3.75 4.389 2.1364 -
0.0785 6.2951 3 3.577 0.1033 2.427 3
Sig2 is the width of RBF kernel, “t” is the intercept, and “d” is the degree of the polynomial
Statistical downscaling of temperature using three techniques
(2) NMSE, defined as
NMSE ¼
N 1X yi N i¼1
(4) CC
byi
X X
X byi N yi byi yi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi CC ¼ s 2 2 X X 2 2 byi byi N yi N yi
2 ð19Þ
ðS obs Þ2
ð21Þ (3) Nash-Sutcliffe error estimate given as
NASH ¼ 1−
N 1X yi N i¼1 N 1X ð yi N i¼1
byi
2 ð20Þ
yi Þ
2
where yi and byi are the observed and simulated predictand time series. N is the training and testing sample size. In general, higher NASH and CC indicate better accuracy of model prediction whereas lower NASH indicates a poor model prediction. A smaller value of RMSE and NMSE represents a smaller discrepancy between observed and predicted time series, hence high prediction accuracy.
Table 8 Weights and biases for ANN models using backpropagation algorithm for maximum and minimum temperature at different stations Stations name
Variables name
Weights
h11
Allahabad
Maximum
i1 i2
−0.154 2.2653
3.7636 1.2926
3.5833 1.1760
−4.955 −1.248
4.6177 −4.549
i3 i4 i5 i1 i2 i3 i4 i5 i6 i1 i2 i3 i4 i5 i1 i2 i3 i4 i5
0.505 1.6377 1.6148 2.9078 4.4364 1.4444 0.1369 1.3094 2.0345 2.2962 0.1444 0.3419 1.2074 0.1714 3.2758 1.9094 0.9062 7.1223 4.0857
−0.378 0.3434 1.3769 −1.080 1.0839 0.4266 1.9173 1.6404 0.9868 0.5058 1.4451 1.1961 1.3108 3.0861 1.2218 1.2373 −1.047 4.0057 3.0958
−1.164 −0.674 0.4258 1.0541 −3.562 −0.456 −0.727 0.3020 0.8425 0.2853 0.1486 −1.199 0.4070 0.5576 0.7631 0.3954 −1.893 1.3430 −0.449
1.3530 0.2339 −0.361 −0.103 −0.328 −0.278 0.0484 −0.726 0.5560 0.4453 0.3235 1.5734 0.0371 −1.133 −0.332 0.0397 0.2836 −0.728 −0.771
−0.506 1.1646 3.3838 0.1097 8.7914 −0.044 −2.549 1.304 0.498 2.1056 1.428 −0.589 −2.857 5.2084 −0.006 −2.865 2.5694 6.7698 6.6073
i1 i2 i3
2.3649 2.7858 0.8429
4.5015 −1.696 1.1069
−0.082 −0.221 −0.173
0.4746 0.8492 0.4459
4.4363 3.3267 −0.702
i4 i1 i2 i3 i4 i5
−0.377 −1.0066 3.8180 4.0228 0.3924 3.9328
0.4843 2.0359 1.7984 0.0428 0.3086 0.7067
0.4139 0.2947 −0.077 0.7392 −1.348 −0.443
−0.267 −0.625 −0.836 −0.051 1.3293 −1.562
0.0692 2.142 3.9866 1.3927 0.9749 −0.432
Minimum
Satna
Maximum
Minimum
Rewa
Maximum
Minimum
h12
h13
h14
h15
−0.000 0.0239 −0.402 0.0957 0.2945
Biases
D. Duhan, A. Pandey
2.3.5 Projection of predictands using GCM simulation variables The feature vectors that are prepared from GCM simulations of A2 emission scenario were run through the best identified calibrated and validated downscaling models to obtain future projections of predictand. Further, the projected values of predictands were separated into nine parts 2011–2020, 2021–2030, 2031–2040, 2041–2050, 2051–2060, 2061– 2070, 2071–2080, 2081–2090, and 2091–2100 on a decadal basis to depict the decadal changes in predictand series. 2.3.6 Bias correction in projected predictands The goal of the bias correction method is to use the probability of exceedance of monthly projected values and match this probability to the historical observed climate value (Wood et al. 2007). The steps followed for the bias correction are explained below. Step 1 Compute empirical probability of exceedance curve Pobs(T) from observed predictand values at particular month. Step 2 Compute empirical probability of exceedance curve PGCM (t) for a GCM results, for the same period and same month as from predictand values at particular month historical values of step 1. Step 3 Do bias correction approach, i=1, number of observations (a) Select a given value TGCM (b) For this particular value TGCM, compute its probability of exceedance from the curve derived in step 2.
(c) Using the probability obtained in step 3b, go to the probability of exceedance curve Pobs(T) and assume PGCM(TGCM)=Pobs(TGCM) (d) Using the probability Pobs(TGCM), compute Tobs (e) Do this for all months. Step 4 Repeat steps 2 to 3.
2.3.7 Trend analysis using Mann-Kendall (MK) and Sen slope estimator tests There are various parametric and nonparametric tests, for identifying trends; however, nonparametric MK (Kendall 1975; Mann 1945) test has been widely used in trend detection studies in temperature series (Duhan et al. 2013). Therefore, in the present study, MK test and Sen slope estimator test (Sen, 1968) are applied to detect the direction and magnitude of trends in annual and seasonal (i.e., winter (December–February), pre-monsoon (March–May), monsoon (June–September), and post-monsoon (October–November)) time series during 2001–2100. The details of MK test and Sen slope estimator test can be found in the previous research papers (Darshana et al. 2013; Duhan and Pandey 2013).
3 Results and discussion 3.1 Selected predictors for maximum and minimum temperature Pearson CCs between appropriate predictors and predictand (minimum and maximum temperature) are calculated at nine
Table 9 Result of performance statistics during calibration for minimum and maximum temperature at different stations using multiple linear regression (MLP), least square support vector machine (LS-SVM), and artificial neural network (ANN) Station name
Variable ANN CC RMSE NMSE NASH name model type MLR ANN LS-SVM MLR ANN LS-SVM MLR ANN LS-SVM MLR ANN LS-SVM
Calibration Allahabad Tmax Tmin Satna Tmax
4-5-1 5-5-1 5-5-1
0.963 0.970 0.973 0.974 0.984 0.985 0.983 0.988 0.989
1.579 1.440 1.359 1.579 1.242 1.211 1.012 0.840 0.809
0.072 0.060 0.053 0.051 0.031 0.030 0.034 0.020 0.022
0.928 0.940 0.947 0.949 0.969 0.970 0.966 0.980 0.978
Tmin Tmax Tmin Allahabad Tmax Tmin Satna Tmax Tmin Rewa Tmax Tmin
5-5-1 4-4-1 4-5-1 4-5-1 5-5-1 5-5-1 5-5-1 4-4-1 4-5-1
0.980 0.977 0.954 0.966 0.960 0.979 0.970 0.976 0.971
1.344 1.174 2.061 1.492 2.060 1.092 1.764 1.162 1.970
0.040 0.045 0.090 0.188 0.128 0.090 0.090 0.123 0.230
0.960 0.955 0.910 0.928 0.906 0.955 0.933 0.951 0.915
Rewa Validation
0.990 0.983 0.970 0.971 0.974 0.981 0.979 0.975 0.978
0.991 0.983 0.968 0.980 0.982 0.988 0.986 0.984 0.988
0.953 0.997 1.664 1.382 1.891 1.036 1.442 1.215 1.899
0.917 1.010 1.708 1.160 1.289 0.780 1.126 0.935 1.034
0.020 0.033 0.059 0.026 0.034 0.017 0.019 0.021 0.031
0.019 0.034 0.062 0.040 0.037 0.020 0.027 0.030 0.023
0.980 0.967 0.941 0.938 0.923 0.960 0.955 0.946 0.923
0.981 0.966 0.938 0.950 0.963 0.980 0.973 0.968 0.977
Statistical downscaling of temperature using three techniques
grid points for different pressure levels at all stations. The predictors that were selected and used in present study are shown in Tables 2 and 3 for maximum and minimum temperature at Allahabad, Rewa, and Satna stations at different grid points. Most of the predictor variables, which are screened from a pool of possible predictors by using their crosscorrelations with predictand, are highly correlated and provide similar information. High dimensionality of the predictors may be computationally complicated (Ghosh and Mujumdar 2006). Therefore, to obtain relevant predictors as an input to the downscaling models, PCA used by previous researchers (Tripathi et al., 2006; Ghosh and Mujumdar, 2006) was performed on selected predictors. Before the PCA, the predictors were standardized by subtracting the mean from the original values and then dividing the results by standard deviation of original variables. The PCA method was then applied on standardized NCEP predictor variables to extract principal
3.2 Model development As stated previously, to develop model, about 70 % of the data was randomly selected for calibration, and remaining 30 % of the data were selected for validation of models. The developed MLR models for maximum and minimum temperature for Allahabad, Satna, and Rewa stations are shown in Table 6. The development of LS-SVM models requires the “gam” and the squared kernel parameter whose values are obtained using
Maximum temperature (deg C)
55 50
Observed temperature during validation Simulated temperature during calibration Simulated temperature during validation Observed temperature during calibration
Allahabad station
a
45 40 35 30 25 20 15 1
51
101
151
201
251
301
351
401
451
490
Maximum temperature (deg C)
Time in months (1969-2008) 55 50
b
Observed temperature during calibration Simulated temperature during calibration Simulated temperature during validation Observed temperature during validation
Satna station
45 40 35 30 25 20 15 1
51
101
151
201
251
301
351
401
451
490
Time in months (1969-2008)
Maximum temperature (deg C)
Fig. 2 Comparison of monthly observed and simulated maximum temperature using LS-SVM model at a Allahabad station, b Satna station, and c Rewa station
components (PCs) which are orthogonal. The obtained PCs preserve more than 96 % of the variance present in them for maximum temperature (Table 4) and minimum temperature (Table 5) at all the stations. A feature vector is formed for each month of the record using PCs. This feature vector is used as an input to the models, whereas temperature (predictand) represents the output from model.
55 50
c
Observed temperature during calibration Simulated temperature during calibration Observed temperature during validation Observed temperature during validation
Rewa station
45 40 35 30 25 20 15 1
51
101
151
201
251
301
Time in months (1969-2003)
351
401
D. Duhan, A. Pandey
122718.10, and 1591033.67 for Allahabad, Satna, and Rewa stations, respectively (Table 7). For minimum temperature, the obtained optimal values of sig2 are 69.15, 175.53, and 8.3433 for Allahabad, Satna, and Rewa stations, respectively, and values of gam are 81631.365, 1344334.82, and 115.149 for Allahabad, Satna, and Rewa stations, respectively (Table 7). The resulted optimal values of sig2 and gam are used for training the models. The training of LS-SVM models gave the value of alpha and b, which are used to simulate the training values. For the ANN model, the architecture of ANN was decided by trial and error procedure. The incoming data are processed by nonlinear functions at hidden and output layers to get the output. A comprehensive search of ANN architecture is done by varying the number of nodes from 1 to 10 in hidden layer(s). Log-sigmoid (ASCE 2000) and linear transfer functions are used in hidden layer(s) and output layer, respectively.
40
Observed temperature during validation Simulated temperature during validation Observed temperature during calibaration Simulated temperature during calibration
Allahabad station
a
35 30 25 20 15 10 5 1
51
101
151
201
251
301
351
401
451
490
Minimum temperature (deg C)
Time in months (1969-2008) 40 35
b
Observed temperature during calibration Simulated temperature during calibration Observed temperature during validation Simulated temperature during validation
Satna station
30 25 20 15 10 5 1
51
101
151
201
251
301
351
401
451
490
Time in months (1969-2008)
Minimum temperature (deg C)
Fig. 3 Comparison of monthly observed and simulated minimum temperature using LS-SVM model at a Allahabad station, b Satna station, and c Rewa station
Minimum temperature (deg C)
possible combination of three kernel types (linear, polynomial, and RBF kernel) and three optimization algorithms (simplex, gridsearch, and linesearch) on the basis of minimum value of cost function. It is observed from Table 7 that RBF kernel with gridsearch procedure gave the minimum value of cost function at all the stations for both maximum and minimum temperatures. The main reason for RBF kernel to the best kernel among others is because RBF kernel maps the training data into a possibly infinite dimensional space, and thus, it effectively handles the situations when the relationship between predictors and predictand is nonlinear. The development of LS-SVM with RBF kernel involves selection of RBF kernel width sig2 and parameter gam. For maximum temperature, the obtained optimal values using RBF kernel with gridsearch procedure for sig2 are 33.68, 109.84, and 276.798 for Allahabad, Satna, and Rewa stations, respectively (Table 7). Further, optimal value of gam is 401.416,
40 35
c
Observed temperature during calibration Simulated temperature during calibration Observed temperature during validation Simulated temperature during validation
Rewa station
30 25 20 15 10 5 1
51
101
151
201
251
Time in months (1969-2003)
301
351
401 425
Statistical downscaling of temperature using three techniques
The network was trained using backpropagation algorithm (Rumelhart et al. 1986) with 2,000 epochs, 0.01 of learning rate, and 0.9 of momentum constant (Samadi et al. 2013). The network error was computed by comparing the network output with the target output, and optimal network was selected based on minimum RMSE and maximum CC between output and input. The model has been subjected to the aforementioned cross-validation procedure. MATLAB 2008 was used to develop the ANN models. Table 8 shows the weights and biases used in ANN model development for maximum and minimum temperature of Allahabad, Satna, and Rewa stations. 3.3 Comparisons of MLR, ANN, and LS-SVM models Table 9 shows the performance statistics of developed MLR, ANN, and LS-SVM models during calibration and validation for Allahabad, Satna, and Rewa stations. For maximum temperature, during calibration (Table 9), the CC varied between 0.963 (MLR) and 0.989 (LS-SVM), and for minimum temperature, CC varied from 0.954 (MLR) to 0.991 (LS-SVM) between different stations. The RMSE of calibrated maximum temperature models varied between 0.809 (LS-SVM) and 1.579 (MLR), and for minimum temperature, CC ranged from 0.917 (LS-SVM) to 2.061 (MLR) between different stations. The calibrated maximum temperature models showed an
45
Maximum temperature (Deg C)
Maximum temperature (Deg C)
55
Satna Station
a
35
25
15
5
2011-20 21-30
31-41
41-51
51-61
61-70
NMSE of 0.02 (ANN) to 0.072 (MLR), and for minimum temperature, it is 0.019 (LS-SVM) to 0.090 (MLR) between different stations. Further, NASH efficiency between different stations varied from 0.928 for MLR to 0.980 for ANN for maximum temperature and from 0.910 (MLR) to 0.981 (LS-SVM) for minimum temperature. During validation (Table 9), CC varied between 0.966 (MLR) and 0.988 (LS-SVM) for maximum temperature, and it varied from 0.960 (MLR) to 0.988 (LS-SVM) for minimum temperature between different stations. The RMSE of calibrated maximum temperature models varied between 0.78 (LSSVM) and 1.492 (MLR), and for minimum temperature, it ranged from 1.034 (LS-SVM) to 2.060 (MLR) between different stations. For maximum temperature, the models showed an NMSE of 0.017 (ANN) to 0.188 (MLR), and for minimum temperature, it is 0.019 (ANN) to 0.230 (MLR), between different stations. Further, NASH efficiency between different stations varied from 0.928 for MLR to 0.980 for LS-SVM for maximum temperature and from 0.906 (MLR) to 0.977 (LS-SVM) for minimum temperature. The results of calibration and validation of three downscaling models show that all the models are very good models. During calibration and validation, the maximum values of CC and NASH and minimum value of RMSE and NMSE show that the LS-SVM models slightly outperform followed by ANN and MLR models. Therefore, future simulation of
71-80
81-90 91-2100
50 45
Allahabad Station
b
40 35 30 25 20 15 10
2011-20 21-30 31-41
41-51
51-61 61-70
71-80 81-90 91-2100
Year
Year Maximum temperature (Deg C)
55 50
Rewa Station
c
45 40 35 30 25 20 15
2011-20
21-30
31-41
41-51
Fig. 4 Box plot depicts the decadal changes in downscaled maximum temperature using LS-SVM model from 2011 to 2100 at a Satna, b Allahabad, and c Rewa stations. The horizontal red line in the box
51-61 Year
61-70
71-80
81-90
91-2100
denotes median. The black square represents the mean value of simulated maximum temperature, while the pink line with circle depicts the mean value of observed maximum temperature
45 40
Minimum temperature (Deg C)
Minimum temperature (Deg C)
D. Duhan, A. Pandey Satna Station
a
35 30 25 20 15 10 5 2011-20 21-30
31-41
41-51
51-61
61-70
71-80
40
Allahabad Station
b
35 30 25 20 15 10 5
81-90 91-2100
2011-20 21-30
31-41
41-51
Minimum temperature(Deg C)
Year 30
51-61
61-70
71-80
81-90 91-2100
Year Rewa Station
c
25 20 15 10 5
2011-20 21-30
31-41
41-51
51-61
61-70
71-80
81-90 91-2100
Year
Fig. 5 Box plot depicts the decadal changes in downscaled minimum temperature using LS-SVM model from 2011 to 2100 at a Satna, b Allahabad, and c Rewa stations. The horizontal red line in the box
denotes the median value. The black square represents the mean value of simulated minimum temperature, while the pink line with circle depicts the mean value of observed minimum temperature
predictands was performed only using LS-SVM developed models. Figures 2a–c and 3a–c show the monthly observed and simulated maximum and minimum temperature using the LS-SVM models during calibration and validation periods for Allahabad, Satna, and Rewa stations, respectively. For Allahabad and Satna stations, months from 1 to 336 (January 1969 to December 1996) are used for calibration, and the rest is used for validation (January 1997 to December 2008). For Rewa station, months from 1 to 300 (January 1969 to December 1993) are used for calibration, and the rest are used for validation (January 1994 to December 2003). It is inferred that the observed values of maximum and minimum temperature are quite close to those of simulated data at all the stations.
3.4 Projection of maximum and minimum temperature using CGCM3 simulated A2 scenario variables The GCM simulations of A2 emission scenario are run through the calibrated and validated LS-SVM downscaling model to obtain future simulations of predictand. The box plots of 10-year time slices are used to determine patterns in predictand. The projected maximum and minimum temperature are shown in Figs. 4a–c and 5a–c at Allahabad, Satna, and Rewa stations, respectively, for the years 2011–2020, 2021– 2030, 2031–2040, 2041–2050, 2051–2060, 2061–2070, 2071–2080, 2081–2090, and 2091–2100. The middle line of the box shows the median values, whereas the upper and
Table 10 Annual and seasonal trends in temperature for future (2001–2100) Seasons
Annual
Winter
Pre-monsoon
Monsoon
Post-monsoon
Station
Temperature
Z
β (°C/year)
Z
β (°C/year)
Z
β (°C/year)
Z
β (°C/year)
Z
β (°C/year)
Allahabad
Tmax Tmin Tmax Tmin Tmax Tmin
12.20 11.35 1.99 12.49 3.79 11.38
0.048 0.043 0.005 0.077 0.011 0.068
10.70 4.58 5.36 8.96 7.98 5.02
0.07 0.01 0.03 0.04 0.06 0.02
10.11 9.42 5.39 10.63 3.45 10.93
0.044 0.062 0.018 0.100 0.009 0.141
9.51 12.29 -3.73 12.61 -5.43 10.92
0.029 0.055 −0.019 0.089 −0.028 0.142
11.11 5.76 0.20 11.67 2.36 6.52
0.052 0.022 0.001 0.079 0.013 0.034
Rewa Satna
Significant trends at 1 % significance level indicated by bold numbers
Statistical downscaling of temperature using three techniques
lower edges give the 75 percentile and 25 percentile of the dataset, respectively. The box plots in Fig. 4a–c show that there is no significant change in the median of the future temperature for A2 scenario at all the stations. However, the projected increase in predictand is high at Allahabad station (Fig. 4b), which is the hottest place as compared to other two stations (Fig. 4a, c). The box plots of minimum temperature in Fig. 5a–c show the increase in future minimum temperature for A2 scenario in Satna, Allahabad, and Rewa stations, respectively. The results are in conformity with temperature projections for A2 scenario at another river basin in India (Anandhi et al. 2009; Goyal and Ojha 2012). Further, the projected increase in minimum temperature is high at Allahabad station, which is hotter than other two stations. Further, it is inferred that minimum temperature will increase at a higher rate than the maximum temperature. These results follow the same pattern of past studies on historical temperature in the study area (Duhan et al. 2013) which has shown that the increase in minimum temperature is more than maximum temperatures during 1901–2002. 3.5 Results of Mann-Kendall and Sen slope estimator tests Table 10 shows the trend analysis results of maximum and minimum temperature using the MK test and Sen slope estimator test on annual and seasonal scale for the years 2001– 2100. The significant increasing trend was obtained in annual maximum and minimum temperature at all the three stations. The magnitudes of annual maximum temperature are 0.47, 0.05, and 0.1 °C per decade at Allahabad, Rewa, and Satna stations, respectively. For annual minimum temperature, the increase in magnitude is 0.42, 0.76, and 0.68 °C per decade at Allahabad, Rewa, and Satna stations, respectively. Seasonally, significant increasing trends were observed in all the seasons in both temperatures (maximum and minimum) except in monsoon season maximum temperature at Rewa and Satna stations, which shows the significant decreasing trends in temperature. The increasing trends in maximum temperature varied between 0.09 °C per decade (pre-monsoon) and 0.7 °C per decade (winter) at Satna and Allahabad stations, respectively. For minimum temperature, increases in magnitude are 0.1 °C per decade in winter and 1.42 °C per decade in monsoon at Allahabad and Satna stations, respectively. However, decrease in magnitude is −0.19 and −0.28 °C per decade in monsoon season at Rewa and Satna stations, respectively. It can be inferred that the climate of the study area will be warmer in future, and this warming is more pronounced during the night than day. This temperature increases during reproductive grain formation, and ripening phase of crops can be detrimental to productivity of wheat and other Rabi season crops due to terminal stress (Duhan et al., 2013). The increase in winter temperature for the sowing of Rabi crops in October to November may be detrimental for seed germination.
Similarly, for Rabi crops in reproductive phase (March to April), it may cause hastening in the maturity, reduce grain size and grain number, and thereby reduce yield. Lal et al. (1999) reported that increase in maximum and minimum temperatures by 1 and 1.5 °C, respectively, will result in reduction of soybean yield by 35 % compared to 1998, which is the main crop in the study area. This may ultimately affect the food security and economy of the state and country.
4 Conclusions In this study, three downscaling techniques, namely, MLR, ANN, and LS-SVM, for the projections of monthly maximum and minimum temperatures are developed for three stations, namely, Allahabad, Satna, and Rewa in the Tons River basin. Using cross-correlation method, the selected predictors, which have high correlation, are air temperature, geopotential height, zonal wind, meridional wind, latent heat flux, sensible heat flux, net shortwave radiation, and longwave radiation at different grid points. The RBF kernel with gridsearch is identified as the best kernel, which is required in training of LS-SVM. The results of calibration and validation indicated that all the developed models are good models; however, LS-SVM models outperform than ANN and MLR models at all the stations for both the maximum and minimum temperature. Therefore, future projections of predictand were carried out only by employing LS-SVM models using CGCM3 variables for A2 scenarios. The results of future projections show that maximum and minimum temperature will increase in future and minimum temperature will increase at higher rate than maximum temperature. Further, increases in the magnitudes of annual maximum temperature are 0.47, 0.05, and 0.1 °C per decade at Allahabad, Rewa, and Satna stations, respectively, during 2001–2100. For annual minimum temperature, the increases in magnitude are 0.42, 0.76, and 0.68 °C per decade at Allahabad, Rewa, and Satna stations, respectively. Acknowledgments The authors are thankful to the Department of Science and Technology (DST), New Delhi for providing financial support during the study period. We are also thankful to anonymous reviewers for their thoughtful suggestions to improve this manuscript significantly.
References Anandhi A, Srinivas VV, Nagesh KD, Ravi S, Nanjundiah (2009) Role of predictors in downscaling surface temperature to river basin in India for IPCC SRES scenarios using support vector machine. Int J Clim 29:583–603 ASCE Task Committee on Application of Artificial Neural Networks in Hydrology (2000) Artificial neural networks in hydrology-I: Preliminary concepts. J Hydrol Eng 5(2):115–123
D. Duhan, A. Pandey Chattopadhyay S, Jhajharia D, Chattopadhyay G (2011) Univariate modelling of monthly maximum temperature time series over northeast India: neural network versus Yule-Walker equation based approach. Meteorol Appl 18(1):70–82 Chen DL, Chen YM (2003) Association between winter temperature in China and upper air circulation over East Asia revealed by canonical correlation analysis. Global Planet Change 37:315–325 Darshana, Pandey A, Pandey RP (2013) Analyzing trends in reference evapotranspiration and weather variables in the Tons River Basin in central India. Stoc Environ Res Risk Assess 27(6):1407–1421 De Brabanter K, De Brabanter J, De Moor B (2011) Nonparametric derivative estimation. BNAIC 2011, Gent Dibike YB, Coulibaly P (2005) Hydrologic impact of climate change in the Saguenay watershed: comparison of downscaling methods and hydrologic models. J Hydrol 307(1–4):145–163 Duhan D, Pandey A (2013) Statistical analysis of long term spatial and temporal trends of precipitation during 1901-2002 at Madhya Pradesh, India. Atm Res 122:136–149 Duhan D, Pandey A, Gahalaut KPS, Pandey RP (2013) Spatial and temporal variability in maximum, minimum and mean air temperatures at Madhya Pradesh in central India. CR Geosci 345:3–21 Eberhart R, Dobbins B (1990) Neural network PC tools: a practical guide. Academic Press, San Diego, CA Gadgil S, Iyengar RI (1980) Cluster analysis of rainfall stations of the Indian Peninsula. Q J Roy Meteorol Soc 106:873–886 Ghosh S, Mujumdar PP (2006) Future rainfall scenario over Orissa with GCM projections by statistical downscaling. Current Sci 90(3):396–404 Goyal MK, Ojha CSP (2010) Evaluation of various linear regression methods for downscaling of mean monthly precipitation in arid Pichola watershed. Nat Resour 1(1):11–18 Goyal MK, Ojha CSP (2012) Downscaling of surface temperature for lake catchment in an arid region in India using linear multiple regression and neural networks. Int J Clim 32:552–566 Haykin S (2003) Neural networks: a comprehensive foundation. Fourth Indian Reprint, Pearson Education, Singapore, pp. 842 Helsel DR, Hirsch RM (2002) Statistical methods in water resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geol. Surv, pp 522 Hewitson BC, Crane RG (1996) Climate downscaling: techniques and application. Clim Res 7:85–95 IPCC (2007) Impacts, adaptation and vulnerability. Contribution of working group II. In: Parry ML, Canziani OF Jhajharia D, Singh VP (2011) Trends in temperature, diurnal temperature range and sunshine duration in northeast India. Int J Climatol 31: 1353–1367 Jhajharia D, Dinpashoh Y, Kahya E, Singh VP, Fakheri-Fard A (2012) Trends in reference evapotranspiration in the humid region of northeast India. Hydrol Process 26:421–435 Jhajharia D, Chattopadhyay S, Choudhary RR, Dev V, Singhe VP, Lal S (2013) Influence of climate on incidences of malaria in the Thar Desert, northwest India. Int J Climatol 33:312–325 Jhajharia D, Dinpashoh Y, Kahya Y, Choudhary RR, Singh VP (2014) Trends in temperature over Godavari river basin in southern peninsular India. Int J Clim 34(5):1369–1384 Keerthi SS, Lin CJ (2003) Asymptotic behaviours of support vector machines with Gaussian kernel. Neural Comp 15(7):1667–1689 Kendall MG (1975) Rank correlation methods, 4th edn. Charles Griffin, London, p 202 Kostopoulou E, Giannakopoulos C, Anagnostopoulou C, Tolika K, Maheras P, Vafiadis M, Founda D (2007) Simulating maximum and minimum temperatures over Greece: a comparison of three downscaling techniques. Theor Appl Clim 90:65–82
Lal M, Singh KK, Srinivasan G, Rathore LS, Naidu D, Tripathi CN (1999) Growth and yield responses of soybean in Madhya Pradesh, India to climate variability and change. Agric For Meteorol 93:53–70 Lek S, Guegan JF (2000) Artificial neuronal networks: application to ecology and evolution. Springer, Berlin Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Technical report. Department of Computer Science and Information Engineering, National Taiwan University Mann HB (1945) Non-parametric test against trend. Econometrica 13: 245–259 Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc Lond A 209:415–446 Murphy JM (1999) An evaluation of statistical and dynamical techniques for downscaling local climate. J Climate 12:2256–2284 Rumelhart DE, Hilton GE, Willams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536 Samadi S, Wilson CAME, Moradkhani H (2013) Uncertainty analysis of statistical downscaling models using Hadley Centre Coupled Model. Theor Appl Clim 114:673–690 Schoof JT, Pryor SC (2001) Downscaling temperature and precipitation: a comparison of regression-based methods and artificial neural networks. Int J Climatol 21:773–790 Sen PK (1968) Estimates of the regression coefficient based on Kendall’s tau. J Am Stat Assoc 63:1379–1389 Skourkeas A, Kolyva-Machera F, Maheras P (2010) Estimation of mean maximum summer and mean minimum winter temperatures over Greece in 2070–2100 using statistical downscaling methods. Euro Asian J Sustain Energy Dev Policy 2:33–44 Suryavanshi S, Pandey A, Chaube UC, Joshi N (2014) Long term historic changes in climatic variables of Betwa Basin, India. Theor Appl Clim 117(3–4):403–418 Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Proc Lett 9(3):293–300 Tripathi S, Srinivas VV, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640 Wetterhall F, Halldin S, Xu CY (2005) Statistical precipitation downscaling in central Sweden with the analogue method. J Hydrol 306:174–190 Wilby RL, Wigley TML (2000). Downscaling general circulation model output: a reappraisal of methods and limitations. In Climate Prediction and Agriculture, M.V.K. Sivakumar (ed.). Proceedings of the START/WMO International Workshop, 27-29 September, 1999, Geneva. International START Secretariat, Washington, DC, pp. 39-68 Wilby RL, Dawson CW, Barrow EM (2002) SDSM—a decision support tool for the assessment of regional climate change impacts. Environ Model Software 17:147–159 Wilby RL, Charles SP, Zorita E, Timbal B, Whetton P, Mearns LO (2004) Guidelines for use of climate scenarios developed from statistical downscaling methods. IPCC Data Distribution Centre Report, UEA, Norwich, UK, p 27 Willmott CJ, Rowe CM, Philpot WD (1985) Small-scale climate maps: a sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring. Am Cartog 12:5–16 Wood AW, Maurer E, Kumar A, Lettenmaier D (2007) Uncertainty in hydrologic impacts of climate change in the Sierra Nevada, California, under two emissions scenarios. Clim Change 82: 309–325