Neural Comput & Applic (2011) 20:151–159 DOI 10.1007/s00521-010-0363-y
ORIGINAL ARTICLE
Support vector regression with reduced training sets for air temperature prediction: a comparison with artificial neural networks Robert F. Chevalier • Gerrit Hoogenboom Ronald W. McClendon • Joel A. Paz
•
Received: 16 July 2009 / Accepted: 30 March 2010 / Published online: 22 April 2010 Ó Springer-Verlag London Limited 2010
Abstract Sudden changes in weather, in particular extreme temperatures, can result in increased energy expenditures, depleted agricultural resources, and even loss of life. However, these ill effects can be reduced with accurate air temperature predictions that provide adequate advance warning. Support vector regression (SVR) was applied to meteorological data collected across the state of Georgia in order to produce short-term air temperature predictions. A method was proposed for reducing the number of training patterns of massively large data sets that does not require lengthy pre-processing of the data. This method was demonstrated on two large data sets: one containing 300,000 cold-weather training patterns collected during the winter months and one containing 1.25 million training patterns collected throughout the year. These patterns were used to produce predictions from 1 to 12 h ahead. The mean absolute error (MAE) for the evaluation set of winter-only patterns ranged from 0.514°C for the 1-h prediction horizon to 2.303°C for the 12-h prediction horizon. For the evaluation set of year-round patterns, the
R. F. Chevalier G. Hoogenboom R. W. McClendon Institute for Artificial Intelligence, The University of Georgia, Athens, GA 30602, USA R. F. Chevalier Lockheed Martin Advanced Technology Laboratories, Cherry Hill, NJ 08002, USA G. Hoogenboom (&) J. A. Paz Department of Biological and Agricultural Engineering, The University of Georgia, Griffin, GA 30223, USA e-mail:
[email protected] R. W. McClendon Faculty of Engineering, The University of Georgia, Athens, GA 30602, USA
MAE ranged from 0.513°C for the 1-h prediction horizon to 1.922°C for the 12-h prediction horizon. These results were competitive with previously developed artificial neural network (ANN) models that were trained on the full data sets. For the winter-only evaluation data, the SVR models were slightly more accurate than the ANN models for all twelve of the prediction horizons. For the year-round evaluation data, the SVR models were slightly more accurate than the ANN models for three of the twelve prediction horizons. Keywords Support vector machines Artificial neural networks Air temperature Frost Weather modeling
1 Introduction Accurate air temperature prediction is an essential component in decision-making related to many areas, including agricultural production, energy management, and human and animal health. Extreme air temperatures can cause crop damage that leads to a reduction in yield and production as well as creating dangerous working conditions for those exposed to the elements. In 2007, 50% of Georgia’s peach crop and 87% of its blueberry crop were lost due to devastating freezes in early April [7]. Extreme temperatures have also been shown to have an adverse effect on livestock. Heat stress of dairy cows can cause a reduction in milk production of around 20–30% and in some cases can result in their death [15, 23]. There are approximately 400 human fatalities each year, in the United States alone, which are attributed to extreme heat [2]. Air temperature prediction is an important factor in providing early warnings for both dangerous frost and heat stress conditions. With sufficient notice, crop loss due to frost can be reduced
123
152
through the use of preventative measures such as orchard heaters and irrigation. During times of extreme heat, proper precautions can also be taken to ensure the safety of workers and livestock. The Georgia Automated Environmental Monitoring Network (AEMN) was created in 1991 [11]. The goal of the AEMN is to collect environmental data across the state of Georgia and to disseminate the weather data and associated data products in near real-time via the World Wide Web. The AEMN is comprised of over 75 stations that are located mainly in rural areas where the National Weather Service does not have a presence. One of the services currently offered on the AEMN website (http:// www.georgiaweather.net) is a short-term air temperature prediction tool. This tool offers hourly predictions ranging from 1 to 12 h ahead. These predictions are generated using ANN models [13, 27, 28]. An ANN is a popular data modeling tool that mimics the operation of neurons in the human brain. It is used to express complex functions that perform a nonlinear mapping from > < 2 i;j¼1 max > l l > X X > > ð5Þ > ðai þ ai Þ þ yi ðai ai Þ : e i¼1
subject to
l X
i¼1
ðai ai Þ ¼ 0 and
ai ; ai 2½0; C
i¼1
where xi and yi are the input–output pairs of the training data, ai and a*i are the variables to be discovered, and e and C are constants. The constant e represents the SVR algorithm’s tolerance for errors. The area within ±e of the learned function is referred to as the SVR regression tube, and any errors that fall within this tube are ignored. The constant C is referred to as the penalty factor, and it
Neural Comput & Applic (2011) 20:151–159
controls the trade-off between the complexity of the function and the frequency with which errors are allowed to fall outside of the SVR regression tube. Comparisons between SVR and ANNs have been made in several domains. Khan and Coulibaly [16] used SVR for lake water level prediction and demonstrated a slightly better performance than that of a multilayer perceptron. The training set for this work was comprised of approximately 850 patterns. Least squares support vector machines were used by Qin et al. [22] to model water vapor and carbon dioxide fluxes over a cropland. Using training sets that ranged in size from 390 to 1,561 patterns, they were able to achieve better results than those obtained using radial-basis function ANNs. SVR has previously been used by Mori and Kanaoka [19] to predict daily maximum air temperature. The predictions that they obtained using SVR were shown to reduce the average error of 1-day ahead maximum air temperature by 0.8% when compared to a multilayer perceptron. However, these comparisons were drawn between models that were trained on a relatively small training set of 366 patterns and evaluated using 122 patterns. The success of SVR in these instances can be explained by a few key characteristics which differentiate it from the ANN approach. ANNs use empirical risk minimization as a basis for model selection [6]. Since empirical risk is simply a measure of the error over a given set of data, ANNs attempt to fit the training data as closely as possible. This often leads to a solution function which performs extremely well on the training data but performs poorly on patterns not used in training. This is referred to as overfitting. SVR is much less susceptible to overfitting because it is based on the principle of structural risk minimization. Structural risk minimization seeks to minimize not only empirical error but also model complexity [31]. The less complex a model is, the more likely it will be to generalize to patterns not used in training. Another advantage of the SVR algorithm over ANNs is that it always converges to a solution that is globally unique and optimal as a result of being framed as a convex optimization problem. Back propagation ANNs, on the other hand, are susceptible to local minima and will discover different solutions depending upon the initial weights and the order of the training patterns. One critical advantage that ANNs have over SVR is that they are less computationally expensive. In certain instances, SVR training times have been shown to scale somewhere between quadratic and cubic with respect to the number of training patterns [5]. For this reason, they are typically not considered a practical alternative to ANNs for problems involving massively large data sets (defined herein as data sets with more than 100,000 training patterns). When working with massively large data sets, SVR is limited by both the speed and memory constraints of the
153
available hardware. Several attempts have been made to address both issues. In the original SVR algorithm, the results of kernel calculations were cached in memory. Therefore, as the number of training patterns increases, the memory requirements become excessive. The Kernel Adatron algorithm addresses this problem by removing the cached data and re-evaluating kernel components as necessary [8]. Training time, however, is increased with this algorithm. Chunking [16] and decomposition [3, 14, 21, 32] are methods that address both issues to some degree by incrementally solving subsets of the overall optimization problem. Another approach is to replace the quadratic programming problem with a fast, iterative approach as suggested in [9, 17, 18]. Even with these enhancements to the original algorithm, SVR is typically not thought to be a viable solution to problems with massively large data sets. One approach to deal with the speed and memory constraints of SVR involves simply removing a portion of the training data and relying on the algorithm’s ability to generalize from the remaining training patterns. Since the decision function obtained by the SVR algorithm is determined exclusively by the support vectors, all nonsupport vector patterns could be removed from the training set without affecting the final solution. However, it is not possible to identify the support vectors without first solving the optimization problem. Nevertheless, several successful heuristic methods for reducing the number of training patterns have been suggested for use with both SVM and SVR [1, 10, 24, 34]. The drawback to these heuristic methods is that they involve analysis and pre-processing of the data, which can be quite time-consuming when working with massively large data sets. The ANN models previously created for air temperature prediction by Smith et al. [27, 28] were developed using massively large data sets. Twelve models were developed to predict air temperatures during the winter months from 1 to 12 h ahead, using training sets of 300,000 patterns. Additionally, twelve models were developed to predict year-round air temperatures from 1 to 12 h ahead, using training sets of 1.25 million patterns. The study presented herein used subsets of these same data sets to develop models with SVR. The overall goal of this study was to evaluate the capability of SVR in predicting air temperature for horizons from 1 to 12 h. The specific objectives included the following: (1) to develop a methodology for reducing the number of SVR training patterns that does not require lengthy analysis and pre-processing of the data (2) to develop SVR air temperature models for winter-only and year-round scenarios using the data collected by the AEMN, and (3) to compare the accuracy of these models with the ANN models developed in previous studies.
123
154
2 Model development 2.1 Data sets Smith et al. [27] created separate year-round ANN prediction models for prediction horizons from 1 to 12 h in advance, using historical data from AEMN weather stations. This data was partitioned into three data sets: a training set that was used for ANN model development, a selection set that was used to choose between alternate parameter settings, and an evaluation set that was reserved for a final measure of each ANN model’s accuracy. Data from the evaluation set was not used during model development or selection. The training set was comprised of meteorological data collected during the years 1997–2000 from the following AEMN sites: Alma, Arlington, Attapulgus, Blairsville, Fort Valley, Griffin, Midville, Plains, and Savannah. These locations were selected to provide patterns that were both geographically and agriculturally diverse. The selection and evaluation sets were both comprised of meteorological data taken from the following sites: Brunswick, Byron, Cairo, Camilla, Cordele, Dearing, Dixie, Dublin, Homerville, Nahunta, Newton, Valdosta, and Vidalia. The selection set data was collected during the years 2000–2003, while the evaluation set data was collected during the years 2004–2005. Most AEMN weather stations collect data in 15-min intervals. However, some of them only collect data in hourly intervals. Each station therefore produces either 24 or 96 distinct records per day. The training set for the yearround prediction models were comprised of readings from nine stations, collected over a 4-year period, and contained over 1.25 million data points. The selection set for the yearround prediction models, which was comprised of readings from 13 stations over a 4-year period, also contained about 1.25 million data points. There were approximately 800,000 data points in the year-round evaluation set, which were collected from 13 stations over a 2-year period. In addition to the year-round ANN prediction models, Smith et al. created winter-only prediction models for prediction horizons from 1 to 12 h in advance [28]. The data sets used to develop these winter-only prediction models were simply a subset of the data sets used to create the year-round models. The winter-only data sets were restricted to only those data points from the first 100 days of the calendar year for which the observed air temperature was 20°C or below. The winter-only training and selection sets therefore consisted of approximately 300,000 data points, while the winter-only evaluation set was comprised of about 200,000 data points. The AEMN weather stations monitor air temperature, relative humidity, vapor pressure deficit, precipitation,
123
Neural Comput & Applic (2011) 20:151–159
wind speed, wind direction, solar radiation, and soil temperature at depths of 5, 10, and 20 cm. Some stations are also equipped to monitor open pan evaporation, water temperature of the pan, leaf wetness, and soil surface temperature. Smith et al. [27, 28] used air temperature, solar radiation, wind speed, rainfall, and relative humidity to build the individual patterns in their data sets. Each pattern consisted of the current value for those five variables as well as their values at hourly intervals over the previous 24 h. Smith et al. [27, 28] also found that their ANN models performed better if they explicitly included the hourly difference between the values of each variable. For example, a pattern would include the following values relative to temperature: t; t1 ; t2 ; . . .; t24 ; t t1 ; t1 t2 ; . . .; t24 t25
ð6Þ
where t is the current temperature value, and tn is the temperature value n hours ago. Smith et al. scaled each of these values in the range [0.1, 0.9] in order to keep the inputs in the more linear range of the ANN’s logistic function. The scaling was performed using the following equation: xscaled ¼
x xmin 0:8 þ 0:1 xmax xmin
ð7Þ
where xmin and xmax were the minimum and maximum values of the variable found in the development set. In addition to these 250 inputs, Smith et al. [27, 28] included eight variables to describe the time-of-day and the day-of-the-year that the current readings were taken. These were cyclic variables that were created using fuzzy logictype membership functions. The day-of-the-year was expressed by associating a variable with each of the four seasons of the year and then assigning values in the range [0.0, 1.0] to those variables using triangular membership functions. The membership functions described the degree to which the given date fell within a particular season. Similarly, the time-of-day was expressed using four variables that were associated with the following times of the day: 0600, 1200, 1800, and 2400. The addition of these values increased the total number of inputs in each pattern to 258. In order to draw a fair comparison with this previous work, inputs to the SVR algorithm were identical to those used by Smith et al. [27, 28] to develop ANN models. Additionally, the data sets were partitioned in exactly the same manner. Due to the algorithm’s speed and memory requirements, however, developing SVR models using the full training sets that were used to develop the ANN models was deemed impractical. Instead, the SVR algorithm was provided with subsets of these training sets. Although the SVR models were not allowed to take advantage of the full training sets during model
Neural Comput & Applic (2011) 20:151–159
155
development, they were evaluated using all of the patterns in the selection and evaluation sets.
10 ε = 0.30 ε = 0.25 ε = 0.20 ε = 0.15 ε = 0.10 ε = 0.05
8
2.2 A method for reducing training patterns
expðcku vk2 Þ
6
MAE
The major factor that contributed to long SVR run-times was the number of training patterns. Since not all patterns are used by the SVR algorithm, an appropriate reduced training set can decrease the run-time without a loss of accuracy. The ideal reduced training set would contain only those patterns that the SVR algorithm would eventually determine to be support vectors. Without prior knowledge of which patterns comprise this set, a simple methodology was developed for creating an effective reduced training set, which does not require lengthy preprocessing of the data. This methodology was based on the observation that training time is strongly dependent upon both the size of the training set and the width of the SVR regression tube. Although optimal parameter settings cannot be discovered unless all parameters are tuned simultaneously, considerable time savings can be gained by keeping the parameter that controls the width of the regression tube, e, at a moderately high value while tuning the remaining parameters. For the SVR algorithm, the only other parameters to be considered are the penalty factor, C, and any kernel parameters. The kernel used with the SVR algorithm in these experiments was the radial-basis function. This kernel, which is shown below, was arbitrarily selected because it has been shown to be a good general purpose kernel [12].
4
2
0 0.00
where fi is the predicted value and yi is the true value. The selection set MAE was plotted for various models developed for the 4-h prediction horizon using a reduced training set of 60,000 patterns. Each line represents the MAE produced while using a different e value and varying the value of c from 0.01 to 0.06. It can be seen that regardless of the c value, the MAE consistently improved as the width of the regression tube was decreased. Since training time is less of
0.02
0.03
0.04
0.05
0.06
0.07
Kernel Width( γ ) Fig. 1 Mean absolute error versus kernel width for various values of regression tube widths, selection data sets, C = 25, 4-h prediction horizon
a concern with high values of e, several reduced training sets could be sampled from the complete training set and evaluated using the selection set. The SVR algorithm only needed to be applied with smaller values of e once one of these candidate-reduced sets had been selected. The basic steps in this methodology are summarized below: 1.
2.
ð8Þ
For the purposes of SVR, u represents a support vector and v represents the pattern that is being evaluated. The only free parameter in this kernel is c, which controls the width of the kernel. While smaller values of e prolonged training time, they typically resulted in improved model accuracy as shown in Fig. 1. Model accuracy was measured for a particular set of examples using the mean absolute error (MAE), shown below. n 1X MAE ¼ ð9Þ jfi yi j n i¼1
0.01
3.
Using a subset of the complete training data and a moderately high value of e, perform initial parameter tuning of the penalty factor, C, and any kernel parameters. Find a favorable training set by repeatedly sampling from the complete set, applying the SVR algorithm with a moderately high e value, and evaluating on the selection set. Reduce the value of the e parameter and apply the SVR algorithm to produce the final model.
All experiments for this research were conducted in a University of Georgia computer lab containing 30 computers, the fastest of which had an Intel Core Duo processor and 2 GB of RAM. SVR run-times varied depending on the training set size and parameter settings. The longest runs took up to 72 h on these machines. The SVR algorithm was implemented using SVMlight [14].
3 Results and discussion 3.1 Year-round models The initial experiments were conducted on the year-round data. Parameter tuning was performed using a reduced training set of 15,000 randomly sampled patterns for the 4-h prediction horizon. The results were then applied to all
123
156
123
1.56 1.54 1.52
MAE
other prediction horizons. While tuning the C and c parameters, e was held at a moderately large value of 0.05. Heuristic searches were performed with values of C ranging from 0 to 800 and values of c ranging from 0 to 1. The searches involved applying the SVR algorithm with varying parameter combinations and then evaluating the resulting model with the selection set. The resulting MAE ranged from 1.48°C, for the most accurate model, to 110.26°C, for a very inaccurate model. Since the models had been built using a training set of only 15,000 patterns, the accuracy of the models was not expected to be remarkable. However, of interest was the accuracy of each model relative to the other models in the search. This relative performance was used as a measure of the effectiveness of the corresponding parameter combinations. While this approach did not guarantee that optimal parameter settings would be found for the final model, it offered a quick method for discovering generally good values for C and c. Based on these searches, values of 25 and 0.0104 were selected for C and c, respectively. These were the parameter values used to build the model that achieved an MAE of 1.48°C, and they were used for all subsequent experiments. The trade-off between model accuracy and training time was the major factor in determining the size of the reduced training sets to be used for final model development. Larger training sets typically produced more accurate models, as demonstrated in Fig. 2. This figure shows the MAE for models created by randomly selecting training patterns from the complete year-round training set of 1.25 million patterns. Thirty separate reduced training sets were created for each data set size of 15,000, 30,000, 60,000, and 120,000 patterns. The model accuracy generally increased with larger training set sizes because the larger data sets are more likely to contain a greater number of support vectors from the complete training set. However, this improved accuracy had to be balanced with the increased training time that was associated with larger training sets. A run-time limit of 3 days was arbitrarily placed on any single SVR run. This limitation became a factor when building final SVR models that used smaller regression tube widths. As a result, training set sizes were capped at 60,000 patterns. For each year-round prediction horizon, 30 candidatereduced training sets were created by randomly sampling 60,000 patterns from the complete training set of over 1.25 million patterns. A model was then created from each candidate set using the C and c parameter values that were determined earlier. The e parameter was again kept at a moderately high value of 0.05. The candidate set whose model achieved the lowest MAE when applied to the selection set became the training set for that particular prediction horizon. Finally, the e parameter was adjusted for each prediction horizon in order to obtain the highest
Neural Comput & Applic (2011) 20:151–159
1.50 1.48 1.46 1.44 1.42 0
15
30
45
60
75
90
105
120
135
Training Set Size (x 1,000) Fig. 2 Mean absolute error versus training set size (30 trials for each size using year-round data), e = 0.05, C = 25, c = 0.0104, 4-h prediction horizon
possible accuracy on the selection set. This was accomplished by applying the SVR algorithm with values of e ranging from 0.003 to 0.01 with a step size of 0.001. The model with the lowest MAE for each prediction horizon was selected and a final performance estimate was obtained by applying it to the evaluation set. The MAE over the year-round evaluation set for the 1-h prediction horizon was 0.513°C. The MAE increased monotonically for each subsequent prediction horizon up to the 12-h prediction horizon that had an MAE of 1.922°C. This trend in model performance can be seen in Fig. 3, which shows the predicted and observed air temperatures for all patterns in the evaluation set, which were collected from the AEMN weather station in Byron, GA. The distribution of prediction errors was centered near zero for all prediction horizons, while the variance of the errors increased relative to the prediction horizon. The accuracy of the year-round SVR models was compared to that of the ANN models created by Smith et al. [27] and is shown in Table 1. For the selection set, the SVR models performed slightly better than the ANN models on every prediction horizon with improvements in MAE of up to 0.052°C, or 3.7%, when compared to the ANN models. For the evaluation set, however, the ANN models performed slightly better than the SVR models on nine of the twelve prediction horizons. Again, there was only a slight difference between the methods. The largest difference in MAE occurred for the 9-h prediction horizon where the ANN model had an MAE that was 0.057°C, or 3.4%, lower than that of the corresponding SVR model. 3.2 Winter models The methodology used to develop year-round air temperature prediction models was applied to the winter-only data
Neural Comput & Applic (2011) 20:151–159 40
Predicted Temperature (°C)
Fig. 3 Predicted versus observed air temperatures, evaluation data set, Byron, GA. a 1-h, b 4-h, c 8-h, and d 12-h prediction period
157
1:1 Regression
30
20
10
0
(a) 1-hour model y = 0.039 + 0.999x
-10
(b) 4-hour model y = 0.431 + 0.981x
2
2
R = 0.9914
R = 0.9636
-20
Predicted Temperature (°C)
40
30
20
10
0
(c) 8-hour model y = 0.980 + 0.952x
-10
(d) 12-hour model y = 1.416 + 0.931x 2
2
-20 -20
R = 0.9126
R = 0.9781 -10
0
10
20
30
40
-20
Observed Temperature (°C)
Table 1 Mean absolute error (MAE) of the year-round models, selection and evaluation sets
a
Smith et al. [27]
Horizon length (h)
Year-round selection set MAE (°C)
-10
0
10
20
40
Year-round evaluation set MAE (°C)
ANN modela
SVR model
Percent improvement (%)
ANN modela
SVR model
1
0.525
0.516
1.6
0.516
0.513
2
0.834
0.811
2.8
0.814
0.806
1.0
3
1.046
1.036
0.9
1.015
1.027
-1.2
4
1.226
1.214
1.0
1.187
1.203
-1.4
5
1.404
1.352
3.7
1.356
1.344
0.9
6
1.483
1.466
1.1
1.432
1.463
-2.2
7 8
1.577 1.669
1.570 1.659
0.4 0.6
1.532 1.623
1.569 1.664
-2.4 -2.5
9
1.734
1.727
0.4
1.686
1.743
-3.4
10
1.801
1.798
0.2
1.755
1.810
-3.1
11
1.865
1.855
0.5
1.815
1.875
-3.3
12
1.908
1.906
0.1
1.873
1.922
-2.6
set in order to create models specific to the colder weather months of the year. Instead of being drawn from a complete set of 1.25 million patterns, these candidate-reduced sets were randomly selected from the winter-only complete
30
Observed Temperature (°C)
Percent improvement (%) 0.5
set of 300,000 patterns. After tuning the e parameter, the resulting SVR models were applied to their respective winter-only evaluation set. The MAE over the evaluation set for the 1-h prediction horizon was 0.514°C. The MAE
123
158
Neural Comput & Applic (2011) 20:151–159
of each subsequent hourly prediction horizon increased monotonically up to the 12-h prediction horizon which had an MAE of 2.303°C. These MAE values were slightly higher than those obtained with the year-round models, which was consistent with the findings of Smith et al. [27] when using ANNs. The accuracy of the winter-only SVR models was compared to that of ANN models created by Smith et al. [28] and is shown in Table 2. For both the selection set and the evaluation set, the SVR models performed slightly better than the ANN models for all prediction horizons. The largest improvement in MAE for the evaluation set was for the 3-h prediction horizon where the SVR model had an improvement in MAE of 0.027°C, or 3.1%, when compared to the ANN model. These results demonstrate the advantage that ANNs have over SVR in terms of their ability to handle massively large data sets. However, the results also demonstrate the ability of SVR to generalize from comparatively smaller training sets. For the year-round data sets, the ANNs were able to take advantage of more than 20 times as many training patterns as the SVR algorithm. Yet, the SVR models were still competitive with the ANN models. For the winter-only data set, the SVR models achieved greater accuracy for all prediction horizons despite the fact that the SVR algorithm used fewer training patterns than the ANNs.
4 Summary and conclusions SVR was shown to be a viable alternative to ANNs for the problem of air temperature prediction. The method proposed in this study offers a quick means of producing a reduced training set from a massively large training set. It works by repeatedly and randomly sampling from the Table 2 Mean absolute error (MAE) of the winter-only models, selection and evaluation sets
a
Smith et al. [28]
123
Horizon length (h)
complete training set. Candidate training sets are quickly evaluated by applying the SVR algorithm with relaxed parameter settings, which reduces training time. The more time-consuming SVR experiments are only performed after an appropriate reduced set has been identified. Even with computational limitations on the number of training patterns that the SVR algorithm was able to handle, it produced results that were comparable and in some cases more accurate than those obtained with ANNs. With a data set of 300,000 patterns, the SVR algorithm achieved a higher accuracy than ANNs for all prediction horizons. With a larger data set of 1.25 million patterns, ANNs appeared to have a slight advantage over SVR. As more processing power becomes available, the advantage could likely shift to SVR. The SVR models described in this research were developed using the radial-basis function kernel. Evaluating other kernels was beyond the scope of this work, although, alternate kernels may prove more effective. Additionally, parameter tuning was performed using only the 4-h prediction horizon data. It is likely that model accuracy would be increased if parameters were tuned for each individual prediction horizon and if a consistent grid search method were employed. Future work may also involve exploring the effects of some limited pre-processing of the data. It is possible that a more intelligent method for creating a reduced training set would result in better performance. The method described by Guo and Zhang [10] takes advantage of the observation that target values for support vectors are typically local extremes. Pre-processing is performed on the data using a k-nearest neighbor approach in order to find extremes in each neighborhood of size k. The remaining patterns are then filtered out of the training set. As stated earlier, this pre-processing step will be rather lengthy when dealing with massively large data
Winter selection set MAE (°C)
Winter evaluation set MAE (°C)
ANN modela
SVR model
Percent improvement (%)
ANN modela
SVR model
Percent improvement (%)
1
0.534
0.528
1.1
0.527
0.514
2.4
2
0.884
0.861
2.7
0.864
0.837
3.1
3
1.167
1.110
4.9
1.118
1.110
0.8
4
1.401
1.386
1.1
1.338
1.329
0.7
5
1.624
1.592
2.0
1.546
1.518
1.8
6
1.811
1.768
2.4
1.715
1.692
1.3
7 8
1.987 2.126
1.929 2.072
2.9 2.6
1.874 2.007
1.832 1.964
2.3 2.1
9
2.243
2.205
1.7
2.091
2.078
0.6
10
2.362
2.315
2.0
2.191
2.173
0.8
11
2.443
2.405
1.5
2.250
2.245
0.2
12
2.526
2.481
1.8
2.333
2.303
1.3
Neural Comput & Applic (2011) 20:151–159
sets. However, the extra investment may prove worthwhile. In this work, SVR was implemented using SVMlight [14]. Core vector machines is an alternate implementation of the algorithm, which exploits the idea that kernel methods can be formulated as minimum enclosing ball problems [30]. On classification problems, it was shown to have a time complexity that is linear with respect to the training set size and a space complexity that is independent of training set size. It was also reported in [30] that there has been some preliminary success in extending this technique to SVR which could allow for larger training set sizes and presumably better results. Acknowledgments This work was funded in part by a partnership between the USDA-Federal Crop Insurance Corporation through the Risk Management Agency and the University of Georgia and by state and federal funds allocated to Georgia Agricultural Experiment Stations Hatch projects GEO00877 and GEO01654.
159
15. 16.
17.
18. 19.
20.
21.
References 22. 1. Almeida, MB, Braga AP, Braga JP (2000) Speeding svms learning with a prior cluster selection and k-means. In: proceedings of the 6th Brazillian symposium on neural networks, Rio de Jaaneiro, pp 162–167 2. Bernard SM, McGeehin MA (2004) Municipal heat wave response plans. Am J Public Health 94(9):1520–1522 3. Chang C, Lin C (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu. tw/*cjlin/libsvm 4. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297 5. Dong B, Cao C, Lee SE (2005) Applying support vector machines to predict building energy consumption in tropical region. Energy and Build 37:545–553 6. Engelbrecht AP (2002) Computational intelligence an introduction. Wiley, West Sussex 7. Fonsah EG, Taylor KC, Funderburk F (2007) Enterprise cost analysis for middle Georgia peach production. Technical report AGECON-06-118, The University of Georgia Cooperative Extension 8. Friess T-T, Cristianini N, Campbell C (1998) The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. In: 15th international conference on machine learning, Morgan Kaufman Publishers, Madison, pp 188–196 9. Fung G, Mangasarian OL (2003) Finite newton method for lagrangian support vector machine classification. Neurocomputing 55:39–55 10. Guo G, Zhang J-S (2007) Reducing examples to accelerate support vector regression. Pattern Recognit Let 28:2173–2183 11. Hoogenboom G (1993) The Georgia automated environmental monitoring network. In: Hatcher KJ (ed) 1993 Georgia water resources conference. The University of Georgia, Orlando, pp 398–402 12. Hsu C-W, Chang C-C, Lin C-J (2000) A practical guide to support vector classification. Technical report, National Taiwan University 13. Jain A, McClendon RW, Hoogenboom G (2006) Freeze prediction for specific locations using artificial neural networks. Transactions of the ASABE 49(6):1955–1962 14. Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel
23.
24.
25.
26.
27.
28.
29. 30. 31. 32.
33. 34.
methods—support vector learning, Chap. 11. MIT Press, Cambridge, pp 169–184 Jones GM, Stallings CC (1999) Reducing heat stress for dairy cattle. Technical report 404-200, Virginia Cooperative Extension Khan MS, Coulibaly P (2006) Application of support vector machine in lake water level prediction. J Hydrol Eng 11(3):199– 205 Mangasarian OL, Musicant DR (2001) Active set support vector machine classification. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems 13. MITPress, Cambridge, pp 577–583 Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177 Mori H, Kanaoka D (2007) Application of support vector regression to temperature forecasting for short-term load forecasting. In: IEEE joint conference on neural networks, IEEE, Orlando, pp 1085–1090 Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing VII. Proceedings of the 1997 IEEE workshop, IEEE, New York, pp 276–285 Platt JC (1999) Fast training of svms using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208 Qin Z, Yu Q, Li J, yi Wu Z, min Hu B (2005) Application of least squares vector machines in modeling water vapor and carbon dioxide fluxes over a cropland. J Zhejiang Univ 6(6):491–495 Ravagnolo O, Misztal I, Hoogenboom G (2000) Genetic component of heat stress in dairy cattle, development of heat index. J Dairy Sci 83(9):2120–2125 Rychetsky M, Ortmann S, Ullmann M, Glesner M (1999) Accelerated training of support vector machines. In: IJCNN ‘99 international joint conference on neural networks, vol 2, pp 998– 1003 Scholkopf B, Burges C, Vapnik V (1996) Incorporating invariances in support vector learning machines. In: von der Malsburg C, Seelen WV, Vorbruggen J, Sendhoff B (eds) ICANN ‘96 international conference on artificial neural networks, vol 1112. Springer, Berlin, pp 47–52 Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge Smith BA, Hoogenboom G, McClendon R (2009) Artificial neural networks for automated year-round temperature prediction. Comput Electron Agri 68:52–61 Smith BA, McClendon RW, Hoogenboom G (2006) Improving air temperature prediction with artificial neural networks. Int J Comput Intell 3(3):179–186 Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Statistics and Comput 14:199–222 Tsang IW, Kwok JT, Cheung P (2005) Core vector machines: fast svm training on very large data sets. J Mach Learn Res 6:363–392 Vapnik V (1995) The nature of statistical learning theory. Springer, New York Vishwanathan SVN, Smola AJ, Murty MN (2003) SimpleSVN. In: proceedings of the twentieth international conference on machine learning, Washington, DC, pp 760–767 Ward Systems Group (1993) Manual of neuroShell 2. Ward Systems Group Yang M-H, Ahuja N (2000) A geometric approach to train support vector regression. In: proceedings of the IEEE conference on computer vision and pattern recognition, Hilton Head, pp 430–437
123