Short-Term Electricity-Load Forecasting Using a TSK-Based ... - MDPI

3 downloads 0 Views 5MB Size Report
Oct 16, 2017 - Abstract: This paper discusses short-term electricity-load forecasting using an extreme learning machine (ELM) with automatic knowledge ...
energies Article

Short-Term Electricity-Load Forecasting Using a TSK-Based Extreme Learning Machine with Knowledge Representation Chan-Uk Yeom and Keun-Chang Kwak *

ID

Department of Control and Instrumentation Engineering, Chosun University, Gwangju 61452, Korea; [email protected] * Correspondence: [email protected] Received: 25 September 2017; Accepted: 10 October 2017; Published: 16 October 2017

Abstract: This paper discusses short-term electricity-load forecasting using an extreme learning machine (ELM) with automatic knowledge representation from a given input-output data set. For this purpose, we use a Takagi-Sugeno-Kang (TSK)-based ELM to develop a systematic approach to generating if-then rules, while the conventional ELM operates without knowledge information. The TSK-ELM design includes a two-phase development. First, we generate an initial random-partition matrix and estimate cluster centers for random clustering. The obtained cluster centers are used to determine the premise parameters of fuzzy if-then rules. Next, the linear weights of the TSK fuzzy type are estimated using the least squares estimate (LSE) method. These linear weights are used as the consequent parameters in the TSK-ELM design. The experiments were performed on short-term electricity-load data for forecasting. The electricity-load data were used to forecast hourly day-ahead loads given temperature forecasts; holiday information; and historical loads from the New England ISO. In order to quantify the performance of the forecaster, we use metrics and statistical characteristics such as root mean squared error (RMSE) as well as mean absolute error (MAE), mean absolute percent error (MAPE), and R-squared, respectively. The experimental results revealed that the proposed method showed good performance when compared with a conventional ELM with four activation functions such sigmoid, sine, radial basis function, and rectified linear unit (ReLU). It possessed superior prediction performance and knowledge information and a small number of rules. Keywords: short-term electricity-load forecasting; extreme learning machine; knowledge representation; TSK fuzzy type; hybrid learning

1. Introduction The electric power system is often considered the most complex system devised by humans. It consists of dynamic components, including transformers, generators, transmission and distribution lines, nonlinear and linear loads, protective devices, etc. These components must operate synergistically in a manner that ensures the stability and reliability of the electric power system during disturbances. An important aspect of electric-power system operation is that the system performance follows the load requirements. An individual decrease or increase in power generation depends on a respective decrease or increase in the system load. The electric utility operator allocates the system resources through prior knowledge of the load requirements. The ability to forecast electricity-load requirements plays an important role in providing effective power-system management [1]. Thus, accurate electricity-load forecasting is critical for short-term operations and long-term planning for every electric utility. The electricity-load forecasting system influences several decisions, including which generators to commit for a given period [2]. Energies 2017, 10, 1613; doi:10.3390/en10101613

www.mdpi.com/journal/energies

Energies 2017, 10, 1613

2 of 18

Over the past few decades, many studies have been conducted on short-term electricity-load forecasting. Dong [3] proposed a forecasting method model using data decomposition approach for short-term electricity load forecasting. Huang [4] proposed a permutation importance-based feature-selection method for short-term electricity-load forecasting using the random forest learning method. The original feature set was used to train random forest as the original model. Bennett [5] proposed a combined model using an autoregressive integrated moving average with exogenous variables and a neural network with back-propagation (BP) learning for the short-term load forecast of low-voltage residential distribution networks. Hernandez [6] proposed an electric-load forecast architectural model based on Multi-Layer Perceptron neural networks for microgrids. Amjady [7] proposed a new neural network approach to short-term load forecasting. The proposed network was based on a modified harmony search technique, which can avoid the overfitting problem and being trapped in local minima. Goude [8] proposed local short- and middle-term electricity-load forecasting with semi-parametric additive models to model the electrical load of over 2200 substations in the French distribution network. Guo [9] proposed accelerated continuous conditional random fields that can treat a multi-steps-ahead regression task as a sequence-labeling problem for load forecasting. Yu [10] proposed a sparse coding approach to household electricity-load forecasting in smart grids. The proposed methods were tested on a data set of 5000 households in a joint project with the electric power board of Chattanooga, Tennessee. Chen [11] proposed a support-vector regression model to calculate the demand-response baseline for short-term electrical-load forecasting of office buildings. Ren [12] proposed a random-vector functional-link network for short-term electricity-load demand forecasting. Lopez [13] proposed a combined model based on a wavelet neural network, particle swarm optimization, and an ensemble empirical-mode decomposition. Other studies have focused on extreme learning machines (ELMs) with fast learning using the LSE, in contrast to the traditional neural networks with BP learning schemes in previous literature. ELMs are feed forward neural networks for classification or regression with a single layer of hidden nodes. The weights connecting the inputs to the hidden nodes are randomly assigned and never updated. They are learned in a single step, which essentially amounts to learning the linear weights. Zhang [14] performed short-term load forecasting of the Australian National electricity market with an ensemble ELM model based on the ensemble learning strategy. Golestaneh [15] performed very short-term nonparametric probabilistic forecasting of renewable energy generation with applications to solar energy, using ELMs as a fast regression model. Li [16] proposed a novel wavelet-based ensemble method for short-term load forecasting with hybrid neural networks and feature selection. A wavelet-based ensemble scheme was employed to generate the individual ELM-based forecasters. Cecati [17] proposed a novel radial basis function (RBF) training method for short-term electric-load forecasting and comparative studies including ELMs. Li [18] proposed a hybrid short-term electric-load forecasting strategy base d on a wavelet transform, an ELM, and a modified artificial bee-colony algorithm. Yang [19] proposed a hybrid approach that combines the wavelet transform and a kernel ELM, based on self-adapting particle-swarm optimization. Li [20] proposed an ensemble method for short-term load forecasting based on a wavelet transform, an ELM, and partial least squares regression. Although these ELM methods have demonstrated their prediction and classification superiority, it is difficult to obtain meaningful information. Therefore, we propose a new ELM predictor with knowledge representation, e.g., if-then rules, for short-term electricity-load forecasting. For this, a TSK-based ELM is proposed as a systematic approach to generating fuzzy if-then rules. The TSK-ELM design includes a two-phase development. First, we generate an initial random-partition matrix and estimate the cluster centers. This is similar to generating random weights between the input and hidden layers in the original ELM design [21]. The obtained cluster centers are used as the premise parameters of the fuzzy if-then rules. Next, the TSK-type linear weights are estimated by LSE. These linear weights are used as the consequent parameters. The electricity-load data are used to forecast hourly day-ahead loads, given temperature forecasts, holiday information, and historical loads from the New England Power Pool region [22,23].

Energies 2017, 10, 1613

3 of 18

data are used to forecast hourly day-ahead loads, given temperature forecasts, holiday information, 3 of 18 and historical loads from the New England Power Pool region [22,23]. This paper is organized in the following manner. Section 2 describes the architecture and concept of conventional ELM as an intelligent predictor. Section 3 describes the design method of aand TSK-based This paper is organized in the following manner. Section 2 describes the architecture concept ELM with knowledge for short-term electricity-load forecasting. Section covers the of conventional ELM asrepresentation an intelligent predictor. Section 3 describes the design method of a4TSK-based performance measure representation and simulationfor results from electricity-load the electricity-load data. Finally, concluding ELM with knowledge short-term forecasting. Section 4 covers comments are presented Section 5. the performance measureinand simulation results from the electricity-load data. Finally, concluding

Energies 2017, 10, 1613

comments are presented in Section 5. 2. ELM as an Intelligent Predictor 2. ELM as an Intelligent Predictor Conventional studies on neural networks have focused on learning and adaptation for classification and regression. have been investigated the development of good Conventional studies on Several neural studies networks have focused on learning and adaptation for learning methods in the pastSeveral decades. In contrast to well-known neural networks, e.g., multilayer classification and regression. studies have been investigated the development of the good learning perceptron (MLP) and radial basis function networkneural (RBFN), ELMs e.g., possess a real-time learning methods in the past decades. In contrast to well-known networks, the multilayer perceptron capability good prediction as intelligent predictors [21,24,25]. (MLP) andand radial basis functionability network (RBFN), ELMs possess a real-time learning capability and 1 shows theas architecture an ELM predictor. goodFigure prediction ability intelligent of predictors [21,24,25].Given random hidden neurons, which need not be an algebraic or other ELM almostrandom any nonlinear continuous Figure 1 showssum the architecture of feature an ELMmappings, predictor. Given hiddenpiecewise neurons, which need hidden nodes can be represented as follows not be an algebraic sum or other ELM feature[21]: mappings, almost any nonlinear piecewise continuous hidden nodes can be represented as follows [21]: H i ( x) = Gi ( wi , bi , x) (1)

where

H ( x ) = G (w , b , x )

(1)

i i wi and bi are the weights andi biases i between the input layer and the hidden layer,

respectively. where wi and bi are the weights and biases between the input layer and the hidden layer, respectively.

w

β

G1 (⋅)

x1 G2 (⋅)





Gi (⋅)

x2 GL (⋅) Figure 1. 1. Architecture Architecture of of aa conventional conventional ELM ELM predictor. predictor. Figure

The output function of a generalized single layer feedforward network (SLFN) is expressed as The output function of a generalized single layer feedforward network (SLFN) is expressed as L

f L ( x) =  L β i Gi ( wi , bi , x )

f L (x) =

i =1 β G ( w , b , x ) i i i i ∑

(2) (2)

i =1

The output function in the hidden-layer mapping is as follows: The output function in the hidden-layer mapping is as follows:

H(x) =[G (w ,b , x),,G (w ,b , x)]

1 1 ,1b1 ,1x ), · · · , G L L (L H ( x ) = [ G1 (w w L ,LbL , x )]

(3) (3)

The output functions of the hidden nodes can be used by the various forms. The different types The output functions of the hidden nodes can be used by the various forms. The different types of of activation function include sigmoid networks, RBF networks, polynomial networks, complex activation function include sigmoid networks, RBF networks, polynomial networks, complex networks, networks, sine function, ReLU, and wavelet networks, some of which are described as follows: sine function, ReLU, and wavelet networks, some of which are described as follows: Sigmoid : G (wi , bi , x ) =

1 1 + e−(wi · x+bi )

Energies 2017, 10, 1613

4 of 18

RBF : G (wi , bi , x ) = e−(wi · x+bi )

2

Sine : G (wi , bi , x ) = sin(wi · x + bi ) ReLU : G (wi , bi , x ) = max(wi · x + bi , 0) where G (w, b, x ) and L are the output function of hidden node and the number of hidden node, respectively. In what follows, we shall review the processing procedure of an ELM predictor. ELM determines output weights using the following steps: 1. 2. 3.

Randomly assign hidden node parameters (wi , bi ), i = 1, 2, · · · , N.   h ( x1 )   .. Calculate the hidden-layer output matrix H =  . . h( x N ) Calculate the output weights β using a LSE: β = H∗ T

(4)

where H ∗ is the Moore–Penrose generalized inverse of matrix H. T is the target output vector. −1

When H T H is nonsingular, H ∗ = ( H T H ) H T is the pseudo-inverse of H. This is a standard LSE problem and the best solution for β is expressed as follows: β = (HT H)

−1

HT T

(5)

We can develop ELM by adding a positive value to the diagonal of H T H with the use of the ridge regression. By doing this, the resultant prediction can obtain better generalization performance [26]. The essence of an ELM can be summarized by the following features. First, the hidden layer does not need to be tuned. Second, the hidden layer mapping h(x) satisfies the universal approximation conditions [24]. Next, the ELM parameters are estimated to minimize, as follows:

k Hβ − T k2

(6)

3. Design of TSK-Based ELM for Short-Term Electricity-Load Forecasting In this section, we describe a new ELM predictor with knowledge information obtained from TSK-based rules for short-term electricity-load forecasting. 3.1. TSK-ELM Architecture and Knowledge Representation The proposed TSK-ELM is an innovative approach to constructing a computationally intelligent predictor. This predictor can automatically extract knowledge information from a numerical input-output data set. The knowledge is represented by TSK-type fuzzy rules, in which the consequent part is a polynomial in the input variables. Although the conventional ELM predictor shows good prediction performance with a single forward pass, it is difficult to extract and represent the knowledge information. In the following, we shall explain the architecture and each layer of the proposed TSK-ELM. For simplicity, we suppose that the TSK-ELM to be considered has two inputs x1 and x2 and one output. The TSK-type rule set is expressed in the following form: If x1 is Ai and x2 is Bi , then wi = pi x1 + qi x2 + ri . This rule is composed of the premise part and the consequent part. Figure 2 shows the architecture of the TSK-ELM predictor. The network consists of four layers. The nodes of the first layer are obtained

simplicity, we suppose that the TSK-ELM to be considered has two inputs

x1 and x2 and one

output. The TSK-type rule set is expressed in the following form: If

x1 is Ai and x2 is Bi , then wi = pi x1 + qi x2 + ri .

Energies 2017, 10, 1613

5 of 18

This rule is composed of the premise part and the consequent part. Figure 2 shows the architecture of the TSK-ELM predictor. The network consists of four layers. The nodes of the first by random clustering. We generate an initial and estimate the cluster centers. layer are obtained by random clustering. We random generatepartition an initialmatrix random partition matrix and estimate This process is very similar to the concept of generating random weights between the input and hidden the cluster centers. This process is very similar to the concept of generating random weights between layer in the original ELM design The obtained cluster centers are used cluster as the premise parameters the input and hidden layer in the[21]. original ELM design [21]. The obtained centers are used as for fuzzyparameters if-then rules. the the premise for the fuzzy if-then rules.

Linear weights

x1

f1

w1

f2

w2



x2

Figure 2. 2. Architecture of TSK-ELM TSK-ELM with with knowledge knowledge information. information. Figure Architecture of

The membership function is characterized by a Gaussian membership function, which is The membership function is characterized by a Gaussian membership function, which is specified specified by two parameters as follows: by two parameters as follows: 1 x −c 2

µ A ( x ) = e− 12 ( x σ- c )2

(7) (7) A ( x) = e A Gaussian membership function is μ determined by cluster center c and width σ. Each cluster center c is obtained by an initial random partition matrix uijcluster as follows: A iGaussian membership function is determined by center c and width σ . Each -   2 σ 

matrix cluster center ci is obtained by an initial randomnpartition ∑ uijm x j ci =

ci = µij =

c



μij =k=1

j= 1 n m n uuij mx j j =∑ 1 ij j=n1 m

 uij

j =11  d 2/(m−1) ij

dkj

uij

as follows: (8)

(8) (9)

1

2/( m -1)

c  d  (9) ij where uij is between 0 and 1. dij = kc j − x j k is  the Euclidean distance between i-th cluster center and   k =1 d kj j-th data point. m ∈ [1, ∞) is a weighting exponent.   The width σj of each input variable is obtained by a statistical data distribution as follows:

where

uij

is between 0 and 1.

dij = c j − x j xmax − Euclidean xmin is the distance between i-th cluster center

√ σj = α (10) 2 2 and j-th data point. m ∈ [1, ∞) is a weighting exponent. where xmax and xmin are the maximum and minimum values in each input variable dimension, The width σ of each input variable is obtained by a statistical data distribution as follows: respectively. The j fuzzy set A is a linguistic label and specifies the degree to which the given input satisfies the quantifier. These parameters are referred to as premise parameters. In the second layer, the output is the product of all the incoming signals and represents as the firing strength f i of a rule. The output of the third layer is obtained by a combination of the firing strength of a rule and the linear weights wi . Here, the consequent parameters are estimated by LSE, i.e., the same method as in the

satisfies the quantifier. These parameters are referred to as premise parameters. In the second layer, the output is the product of all the incoming signals and represents as the firing strength fi of a rule. The output of the third layer is obtained by a combination of the firing strength of a rule and the linear weights w i . Here, the consequent parameters are estimated by LSE, i.e., the same method as Energies 2017, 10, 1613

18 in the ELM design [21]. The single node in the final layer computes the overall output as6 ofthe summation of all incoming signals, as follows:

ELM design [21]. The single node in the computes the overall output as the summation of r final layer r  fi wi  fi ( pi x1 + qi x2 + ri ) all incoming signals, as follows: yˆ = i =1r = i =1 (11) r r r f f f i wii +i qi x2 + ri ) ∑ i ∑ f i ( pi xi =1 =11 i =1 i =1 yˆ = r = (11) r f f ∑ ∑ i simplified structure i In what follows, we shall present more than that of Figure 2. Figure 3 shows i =1

i =1

the simplified architecture of TSK-ELM as ELM structure. This network consists of three layers. Each we shall present more simplified structure than thatfi ofofFigure 2. Figure shows nodeIninwhat the follows, second layer represents the normalized firing strength the rule. The 3weights the simplified architecture of TSK-ELM as ELM structure. This network consists of three layers. between the second layer and third layer can be expressed by a constant or linear equation. The Each node in the second layer represents the normalized firing strength f of the rule. The weights i overall output was calculated by the linear combination of the consequent part and the weights. The between the second are layer and thirdaslayer can be expressed by a constant or linear equation. The overall linear combination expressed follows output was calculated by the linear combination of the consequent part and the weights. The linear combination are expressed follows yˆ = fas 1w1 + f2 w2 (12) yˆ = f 1=w(1 f+ 1x1f)2pw 1+ 2 ( f1x2 )q1 + ( f1 )r1 + ( f2 x1 ) p2 + ( f2 x2 )q2 + ( f2 )r2 (12) = ( f 1 x1 ) p1 + ( f 1 x2 ) q1 + ( f 1 )r1 + ( f 2 x1 ) p2 + ( f 2 x2 ) q2 + ( f 2 )r2 The TSK-ELM structure can be viewed as the integration of adaptive neuro-fuzzy inference The(ANFIS) TSK-ELM structure viewed asaspect, the integration of adaptive neuro-fuzzy inference system and ELM. Incan thebestructural the TSK-ELM is similar to that of ANFIS.system In the (ANFIS) and ELM. In the structural aspect, the TSK-ELM is similar to that of ANFIS. In the case of case of conventional ANFIS with grid partition, it encounters the curse of dimensionality problem conventional ANFIS withlarge grid number partition,ofitinputs, encounters of dimensionality problem when have when have a moderately whilethe wecurse use scatter partition method using random aclustering moderately large number of inputs, while we use scatter partition method using random clustering with initial random partition matrix. Furthermore, in the theoretical aspect, we follow the with random matrix. Furthermore, in the aspect, we follow the main concept maininitial concept of thepartition conventional ELM. Thus, we use antheoretical initial random-partition matrix and estimate of the conventional ELM. Thus, we useAs an for initial and estimate cluster cluster centers for random clustering. the random-partition width of Gaussianmatrix membership function, wecenters obtain for random clustering. As for the width of Gaussian membership function, we obtain statistical values statistical values from Equation (10) without the assignment of random values. These cluster centers from Equation without the assignment random values. These weights cluster centers are used the are used as the (10) premise parameters of fuzzyofif-then rules. The linear of the TSK fuzzyastype premise parameters of fuzzy if-then the rules. The random-partition linear weights of the TSK fuzzy type are obtained using are obtained using LSE. Recently, initial method has been demonstrated the LSE. Recently, the initial random-partition method has been demonstrated the effectiveness in [27]. effectiveness in [27].

x1

f1

f2

w1

yˆ w2

x2

Figure of TSK-ELM. TSK-ELM. Figure 3. 3. Simplified Simplified architecture architecture of

3.2. TSK-ELM’s Fast Learning and Hybrid-Learning The consequent parameters are estimated by LSE to obtain the optimal parameters in the same manner as ELM. This process is achieved by one pass, based on LSE. To obtain better prediction performance, a hybrid-learning algorithm that combines BP and LSE for fast parameter identification can be applied directly. Figure 4 shows a diagram of the hybrid-learning method in the TSK-ELM design. In the forward pass, the input values go forward until the second layer, and the consequent parameters are estimated by LSE. In the backward pass, the error signals propagate backward, and the premise parameters are

manner as ELM. This process is achieved by one pass, based on LSE. To obtain better prediction performance, a hybrid-learning algorithm that combines BP and LSE for fast parameter identification can be applied directly. Figure 4 shows a diagram of the hybrid-learning method in the TSK-ELM design. In the forward Energies 2017, 10, 1613 7 of 18 pass, the input values go forward until the second layer, and the consequent parameters are estimated by LSE. In the backward pass, the error signals propagate backward, and the premise parameters are updated by by the the BP BP method. method. The The hybrid hybrid approach approach converges converges much much faster, faster, since since it it reduces reduces the the search search updated space dimensions of the original BP method. Although we can apply BP to tune the parameters in the space dimensions of the original BP method. Although we can apply BP to tune the parameters in TSK-ELM design, this simple optimization method usually takes a long time before it converges. If the TSK-ELM design, this simple optimization method usually takes a long time before it converges. the input-output data set is large, then the premise and consequent parameters should be fine-tuned. If the input-output data set is large, then the premise and consequent parameters should be fine-tuned. Human-determined parameters in in terms of reproducing the desired output. This Human-determined parametersare areseldom seldomoptimal optimal terms of reproducing the desired output. learning scheme is equally applicable to RBFN. Furthermore, this hybrid-learning method using LSE This learning scheme is equally applicable to RBFN. Furthermore, this hybrid-learning method using and BP is the well-known scheme in the ANFIS design [28]. LSE and BP is the well-known scheme in the ANFIS design [28].

Forward learning Fixed

x1

LSE(p i , q i , ri )

f1

w1 yˆ

f2

w2

x2

BP (ci ,σ i )

Fixed Backward learning

Figure 4. Diagram of the hybrid-learning procedure. procedure.

4. Experimental Results 4. this section, section, we we report report on on the the comprehensive comprehensive experiment experiment set set and and describe describe comparative comparative In this thethe proposed TSK-ELM for short-term electricity load experiments to to evaluate evaluatethe theperformance performanceof of proposed TSK-ELM for short-term electricity forecasting. load forecasting. 4.1. 4.1. Training Training and and Testing TestingData DataSets Sets The dew point, hour of of day, dayday of The input input variables variablesused usedin inthis thisexample exampleare aredry drybulb bulbtemperature, temperature, dew point, hour day, the week, a flag indicating if it a holiday/weekend, of the week, a flag indicating if is it is a holiday/weekend,previous previousday’s day’saverage averageload, load, load load from from the the same hour the previous day, day, and and load load from from the the same same hour hour and and same same day day from from the the previous previous week week as as shown in in Figure Figure 5. 5. This This example example demonstrates demonstrates building building and and validating validating aa short-term short-term electricity-load electricity-load forecasting model. model. The model model considers considers multiple multiple information information sources, sources, including including temperatures temperatures and and holidays, when constructing a day-ahead load forecaster. The original data were obtained from the obtained the Zonal Zonal hourly hourly load load data data websites websites of of the the New New England England ISO ISO [22,23]. [22,23].

Energies 2017, 10, 1613 Energies 2017, 10, 1613

8 of 18 8 of 18

Energies 2017, 10, 1613 Energies 2017, 10, 1613

Dry bulb temperature

8 of 18 8 of 18

Dew point

Dry bulb temperature of day Dry bulbHour temperature Dew point Day of the week Dew point A flag indicating if it is a holiday/weekend Hour of day Hour day Previous day'sDay average load of theof week

TSK-based ELM

Day-ahead load forecast

TSK-based ELM TSK-based ELM

Day of the week LoadA from the same ifhour previous day flag indicating it is the a holiday/weekend A flag indicating if it is a holiday/weekend

Day-ahead load forecast Day-ahead load forecast

day's load Load from the Previous same hour andaverage same day day's from previous week Loadthe from thePrevious same hour theaverage previousload day Load from the same hour the previous day Load from the same hour and same day Figure 5. Input variables Figure from from the previous Load the sameweek hour and same day5. Input variables from the previous week

and output outputvariable. variable.

Figure 5. Input and 35,068 output variable. The numbers of training and testing datavariables points are from 2004 to 2007 and 8784 from 2008, The numbers of training and testing datavariables points and are output 35,068variable. from 2004 to 2007 and 8784 from 2008, Figure 5. Input respectively. The training data set is used for designing the TSK-ELM, while the testing data set is respectively. The training data set is used for designing the TSK-ELM, while the and testing is used The numbers of training and testing data points are 35,068 from 2004 to 2007 8784data fromset 2008, used for validating the generalization performance. The numbers of training and testing data points are 35,068 from 2004 to 2007 and 8784 from 2008, respectively. training dataperformance. set is used for designing the TSK-ELM, while the testing data set is for validating the The generalization Since mostThe similarity metrics sensitive to the ranges of elementswhile in the input vectors, each respectively. training data setare used for designing the TSK-ELM, the testing data set is of used for validating the metrics generalization performance. Since most similarity areissensitive to the ranges of elements in the input vectors, each of theused input variables must be normalized to within the unit interval [0, 1] except for out variable. For for validating the generalization performance. Since most similarity metrics are sensitive to the ranges of elements in the input vectors, each of theshort-term input variables mustthese be normalized to withintemperature, the unit interval [0, 1]hour except for day, out day variable. include dry-bulb dew point, of the Sinceforecasting, most similarity metrics arethe sensitive to the of elements the input vectors, eachFor of of the input variables must be normalized to within theranges unit interval [0, 1]in except for out variable. Forthe short-term forecasting, these include the dry-bulb temperature, dew point, hour of the day, week, holiday and weekend indicators, the previous day’s average load, the load from the same the input variables must be normalized to within the unit interval [0, 1] except for out variable. For short-term forecasting, these include the dry-bulb temperature, dew point, hour of the day, day of dayhour of the week, holiday and weekend indicators, the previous day’s average load, the load from previous day, and the loadthe from theprevious same hour same day of load the previous week. short-term forecasting, these include dry-bulb temperature, dew point, hour of the day, of the thethe week, holiday and weekend indicators, the day’sand average load, the from theday same same hour theholiday previous day, andthe the loadbetween from the same hour and same day ofathe previous the week, and weekend indicators, the previous day’s average load, the load from the week. same Figure 6 the visualizes the data distribution input variables. Figure 7 shows histogram of week. the hour previous day, and load from the same hour and same day of the previous Figure 6 visualizes the data distribution between input variables. Figure 7 shows a histogram of hour the previous day, and the load from the same hour and same day of the previous week. actual output. Figure 6 visualizes the data distribution between input variables. Figure 7 shows a histogram of the the 6 visualizes the data distribution between input variables. Figure 7 shows a histogram of the actualFigure output. actual output. 4

80 actual output.

3

4

80 LoadLoad from the from same hour the previous Load the same hour theday previous from the same hour the previous day day

60 80 60

40

20 40 Dew Dew pointpoint

Dew point

40 60

20

0 20 0 0

-20

-20 -20

-40 -20

-40 -20 -40 -20

x 10

0

20 0 0

40 60 Dry 20 bulb temperature 40 60 20

Dry bulb temperature 40 60 Dry bulb temperature

(a)

80

100

80 80

x 10 3 4 x 10

3 2.5 2.5 2.5

2

2 2

1.5 1.5 1.5

1 1 1

0.5 0.5

0.5 0.5 0.5 0.5

100 100

1 1 1

1.5

2

Average day 1.5 load of previous 2 Average load of previous day 1.5 2 Average load of previous day

(b)

2.5 2.5 2.5

3 x3 10

4

4

x 103 4

x 10 (a) (b) (a) (b) Figure 6. Data distribution between input variables (a) dew point and dry bulb temperature and (b) Figure 6. Data distribution between input variables (a) dew point and dry bulb temperature and (b) Figure 6. Data distribution between input variables (a) dewhour point and drytemperature bulb average load of the previous day the from same thedry previous day temperature Figure 6.load Data distribution between input variables (a) bulb and (b) and average of the previous dayand and theload load from the thedew samepoint hourand the previous day (b) average load of the previous day and the load from the same hour the previous day average load of the previous day and the load from the same hour the previous day

2500

2500

2500

2000 2000 2000

count

count count

1500 1500 1500

1000 1000 1000

500

500

500

0

0 0.5 0 0.5 0.5

1

1

1

1.5

2

1.5actual output 2 1.5 2 actual output

2.5

2.5

2.5

actual output

Figure 7. Histogram of actual output.

Figure of actual actualoutput. output. Figure7. 7. Histogram Histogram of Figure 7. Histogram of actual output.

3

3

4

x 103

4

x 10 4

x 10

Energies 2017, 10, 1613 Energies 2017, 10, 1613

9 of 18 9 of 18

4.2. Experiments and Results In this experiment, we compared the proposed method with conventional methods. We quantify 4.2. Experiments and Results the performance of the forecaster using metrics such as RMSE as well as MAE and MAPE. Also, we In this experiment, we compared the proposed method with conventional methods. We quantify investigate statistical characteristics based on the coefficient of determination (R-squared) as follows the performance of the forecaster using metrics such as RMSE as well as MAE and MAPE. Also, we investigate statistical characteristics based on the 1coefficient of determination (R-squared) as follows n 2 (13) RMSE s =  tk − yˆk n n k =1 1 (13) RMSE = (tk − yˆk )2 n n k∑ =1

(

MAE =n

)

 tk − yˆ k

(14)

k =1

∑ |tk −nyˆ k |

MAE =

k =1

(14)

1 nn t − yˆ k MAPE = n  k 1 n k|=t1k − tyˆkk | MAPE = n k∑ =1 n

2

 ( t − yˆ

)

2 n k k2 ∑k =(1tk − yˆ k ) k =1n 2 n k2 ˆ t − y ∑k =1 k k =1

R = 1− 2

R = 1−

tk

(15) (15)

 ( t − yˆ

(16) (16)

)

tk , and are model output, actual target output, mean actualoutput, output, whereyˆ ,yˆkt ,, and where t aret model output, actual target output, andand thethe mean of of thetheactual k k respectively. the number of measured points. respectively. n isntheisnumber of measured data data points. First, we obtained the root-mean-square error (RMSE) variation, the number cluster centers First, we obtained the root-mean-square error (RMSE) variation, asas the number ofof cluster centers increasedfrom from2 2toto40. 40.Here Here the the number number of cluster is Also, the increased is equal equalto tothat thatofofTSK-based TSK-basedfuzzy fuzzyrule. rule. Also, Gaussian membership function per input variable is equal to that of cluster center. Figure 8 shows the Gaussian membership function per input variable is that of cluster center. Figure 8 shows theRMSE RMSEvalues valuesfor forthe thetraining trainingand andtesting testingdata datasets. sets.We We observe that the RMSE value diminishes the observe that the RMSE value diminishes the number rules increases.The The obtained results showed good generalizationperformance performancefor for asas the number ofof rules increases. obtained results showed good generalization testing data set, when the number of rules is 36. The RMSE of the testing error is at its lowest value: testing data set, when the number of rules is 36. The RMSE of the testing error is at its lowest value: 400.86.Simultaneously, Simultaneously, the RMSE thetraining trainingerror error 402.89.Figure Figure9 9shows showsthe theRMSE RMSEvariation, variation, 400.86. the RMSE ofof the is is402.89. whenhybrid-learning hybrid-learningis isperformed performedwith withfive five rules.The Theexperimental experimentalresults resultsshow showa aperformance performance when rules. similar the best case shown Figure similar toto the best case shown inin Figure 8. 8. 2000 RMSE(training) RMSE(testing)

1800 1600

RMSE

1400 1200 1000 800 600 400

0

5

10

15

20 25 num. of rules

30

35

Figure RMSE rule number variation (TSK-ELM). Figure 8. 8. RMSE byby rule number variation (TSK-ELM).

40

Energies 2017, 10, 1613 Energies 2017, 10, 1613

10 of 18 10 of 18 750

Energies 2017, 2017, 10, 10, 1613 1613 Energies

RMSE(training) RMSE(testing)

700 750 750

10 of of 18 18 10

RMSE(training) RMSE(training) RMSE(testing) RMSE(testing)

650 700 700

600 550

600 600

RMSE RMSE

RMSE

650 650

500550 550

500 450500 450 400450 400

350400 0

10

350 350 00

10 10

20 20 20

30 30 30

40 50 60 num. of epoch

40 40

50 50

60 60

70 70 70

80 80 80

90 90 90

100

100 100

num. of of epoch epoch num. Figure9.9.RMSE RMSEby byhybrid hybrid learning (num.ofofrules rules==5). 5). Figure learning (num.

Figure 9. 9. RMSE RMSE by by hybrid hybrid learning learning (num. (num. of of rules rules == 5). 5). Figure

Figure 10 shows the prediction performance for all training data points. Figure 11 shows the Figure 10 shows thethe prediction performance for all training trainingdata datapoints. points. Figure 11 shows the Figure 10 shows prediction performance for all allintraining Figure 11 shows showsTSK-ELM the Figure shows the prediction performance for points. 11 the TSK-ELM error10 for the training data points. As shown Figuresdata 10 and 11,Figure the proposed TSK-ELM errorerror for the training data inFigures Figures1010and and proposed TSK-ELM TSK-ELM for the the training datapoints. points.As As shown shown in in 11,11, thethe proposed TSK-ELM TSK-ELM for training data points. As shown Figures 10 and 11, the proposed TSK-ELM shows a gooderror approximation capability. shows a good approximation capability. shows aa good good approximation approximation capability. capability. shows 4

x 10

4

104 xx 10

2.8

actual output

actual output output actual TSK-ELM output TSK-ELM output output TSK-ELM

2.8 2.8

2.6 2.6 2.6

2.2 2.2 2.2 Electricity Electricityload load

Electricity load

2.4 2.4 2.4

2

22

1.8 1.8 1.8

1.6

1.6 1.6 1.4 1.4

1.4

1.2 1.2

1.2

11

1

00

0.5 0.5

0

0.5

4

11

1

1.5 1.5 22 num. of of training training data data num.

2.5 2.5

1.5 2 num. of training data

33

2.5

3

3.5 3.5 4 104 xx 10

3.5 4

x 10

Figure 10. Predictionperformance performance for for training data points. Figure Prediction performance data points. Figure 10.10. Prediction fortraining training data points. Figure 10. Prediction performance for training data points.

104 xx 10 11 4

1

0.8 x 10 0.8

0.8 0.6 error error

0.4

error

0.2

0.6 0.6 0.4 0.4 0.2 0.2 00 -0.2 -0.2

-0.4 0 -0.4 -0.6 -0.2 -0.6 -0.8 -0.4 -0.8

-0.6

-1 -1

0.5 0.5

-0.8 -1

0.5

11

1.5 1.5 22 num. ofoftraining trainingdata data num.

2.5 2.5

Figure 11. TSK-ELM error for training data points.

Figure 11.TSK-ELM TSK-ELM error error for data points. Figure 11. fortraining training data points. 1 1.5 2 2.5 num. of training data

Figure 11. TSK-ELM error for training data points.

33

3.5 3.5 4 104 xx 10

3

3.5 4

x 10

Energies 2017, 10, 1613 Energies 2017, 2017, 10, 1613 1613 Energies Energies 2017,10, 10, 1613

11 of 18 11 of of 18 11 11 of18 18

Figures 12 and 13 visualize the generalization performance and error between the actual output Figures and 13 visualize the generalization the actual output Figures 1212 and 13 visualize performance anderror errorbetween between the actual output Figures 12 and 13 visualizethe thegeneralization generalizationperformance performanceand and error between the actual output and the TSK-ELM output for the testing data points. Figure 14 shows the performance for some and the TSK-ELM output for the testing data points. Figure 14 shows the performance for some and the TSK-ELM output for the testing data points. Figure 14 shows the performance for some period and the TSK-ELM output for the testing data points. Figure 14 shows the performance for some period (some part of January in 2008) in the data set. As shown in these figures, the proposed method period (some part of January in 2008) in the data set. As shown in these figures, the proposed method (some part(some of January 2008) in 2008) the data set.data As set. shown in these the proposed method had a period part ofin January in the As shown in figures, these figures, the proposed method had good generalization performance. had aaagood performance. good performance. hadgeneralization goodgeneralization generalization performance. 44

10 4 xx 10 2.8 x 10 2.8 2.8

actual output output actual actual output TSK-ELM output output TSK-ELM TSK-ELM output

2.6 2.6 2.6 2.4 2.4 2.4

Electricity load Electricityload load Electricity

2.2 2.2 2.2 22 2 1.8 1.8 1.8 1.6 1.6 1.6 1.4 1.4 1.4 1.2 1.2 1.2 11 1 0.8 0.8 0.8 00 0

1000 1000 1000

2000 2000 2000

3000 3000 3000

4000 5000 4000 5000 4000 5000 num. of of testing testing data data num. num. of testing data

6000 6000 6000

7000 7000 7000

8000 8000 8000

9000 9000 9000

Figure12. 12.Prediction Prediction performance performance for for testing testing data points. Figure testingdata datapoints. points. Figure Figure12. 12.Prediction Predictionperformance performance for for testing data points. 4

104 4 xx 10 11 x 10 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4

error error error

0.2 0.2 0.2 00 0

-0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1

1000 1000 1000

2000 2000 2000

3000 3000 3000

4000 5000 4000 5000 4000 5000 num. oftesting testing data data num. of num. of testing data

6000 6000 6000

7000 7000 7000

8000 8000 8000

Figure13. 13.TSK-ELM TSK-ELM error error for for testing testing data points. Figure Figure datapoints. points. Figure13. 13.TSK-ELM TSK-ELMerror error for for testingdata data points. 4

104 4 xx 10 x 10 Actual output output Actual Actual output TSK-ELM output output TSK-ELM TSK-ELM output

22 2

Electricity load Electricity load Electricity load

1.8 1.8 1.8

1.6 1.6 1.6

1.4 1.4 1.4

1.2 1.2 1.2

11 1

01/03 01/03 01/03

01/04 01/04 01/04

01/05 01/05 01/05

01/06 01/06 01/06

01/07 01/07 01/07

01/08 01/09 01/08 01/09 01/08Day 01/09 Day Day

01/10 01/10 01/10

01/11 01/11 01/11

01/12 01/12 01/12

01/13 01/13 01/13

Figure 14. 14. Performance Performance for for some some period period of of testing testing data data sets. sets. Figure Figure14. 14.Performance Performancefor forsome some period period of Figure of testing testingdata datasets. sets.

01/14 01/14 01/14

Energies 2017, 10, 1613

12 of 18

Table 1 lists the performance results of the RMSE regarding the previous model and the proposed approach. In the linguistic model (LM) design, we used 10 contexts and 10 clusters per context [29]. Thus, the number of rules was 100. This model lacked the adaptability to cover complex nonlinearity. In the RBFN design with context-based fuzzy c-means (CFCM) clustering, we used the number of clusters and the context in the same manner as the LM [30]. Here, the learning rate and the number of epochs are 0.001 and 1000, respectively. In the incremental RBFN design (IRBFN), we combined linear regression (LR) and RBFN [31]. LR and RBFN are used as the global model and local granular network, respectively. The error obtained from the LR is compensated by the local RBFN based on CFCM clustering. Here, the weights between the hidden layer and the output layer are estimated by LSE. Figure 15 shows the linguistic contexts produced from errors, when the number of contexts is 10. These contexts are obtained in error space based on CFCM clustering [30,31]. The errors obtained by LR are compensated by a collection of radial basis functions that capture more localized nonlinearities of the system as a local RBFN [31]. These contexts are used as the linguistic weights in the design of RBFN. In order to generate these contexts, we used fuzzy equalization and probabilistic distribution. Here, we use probabilistic distribution by histogram, probability density function, and conditional density function in order [31,32]. Table 1. Comparison results of RMSE (*: no. of rule). Method

No. of Node

RMSE (Training)

RMSE (Testing)

LR [33] RBFN(CFCM) [30] LM [29] IRBFN [31] ELM (ReLU) [34] ELM (Sigmoid) [21] ELM (RBF) [21] ELM (sin) [21] TSK-ELM (FCM) TSK-ELM(SC) TSK-ELM (without learning)

100 100 * 450 260 270 300 270 36 * 36 * 36 *

1044.5 983.64 1005.61 647.10 620.35 451.56 449.88 443.16 468.79 487.38 402. 89

928.98 926.43 980.59 450.60 586.11 438.52 433.99 435.07 450.60 478.79 400.86

TSK-ELM (with learning)

5* 10 * 36 *

409.31 359.24 292.93

398.58 349.61 331.16

Energies 2017, 10, 1613

13 of 18

1

membership grades

0.8

0.6

0.4

0.2

0 -8000

-6000

-4000

-2000 error

0

2000

4000

Figure 15. Linguistic contexts obtained by IRBFN. Figure 15. Linguistic contexts obtained by IRBFN.

In the ELM case, we used the ReLU, sigmoid, RBF, and sine functions as activation functions. Figure 16 visualizes the RMSE for the training and testing data sets as the number of nodes increases from 10 to 300. We observed that the ELM with the sine, sigmoid, and radial basis function showed similar performance except for ReLU activation function. However, we could not obtain the knowledge information about the ELM structure. Although these models have been successfully demonstrated on various applications, the proposed TSK-ELM outperformed the previous models as listed in Table 1.

membership grades

0.8

Energies 2017, 10, 1613

0.6

13 of 18 0.4

On the other hand, we performed on TSK-ELM based on scatter partition methods such as fuzzy 0.2 c-means (FCM) clustering and subtractive clustering (SC). In the case of FCM clustering, we used 36 cluster centers as the number of rules. The experimental results of RMSE are 468.79 and 450.60 as 0 listed in Table 1, respectively. the case of SC-2000 clustering, we used 4000 36 cluster centers as in the same -8000 In-6000 -4000 0 2000 error manner. The experimental results of RMSE are 487.38 and 479.79, respectively. As a result, the proposed TSK-ELM showed good performance comparison with those by of IRBFN. scatter partition methods. Thus, Figure 15. in Linguistic contexts obtained the initial random-partition method of ELM used in this experiment demonstrated the effectiveness as In the ELM case, we used the ReLU, sigmoid, RBF, and sine functions as activation functions. the previous works [27]. FigureIn16the visualizes thewe RMSE forthe theReLU, training and testing sets as the number of nodes increases ELM case, used sigmoid, RBF, data and sine functions as activation functions. from 1016 to visualizes 300. We observed that withand thetesting sine, sigmoid, radial basis of function showed Figure the RMSE forthe theELM training data setsand as the number nodes increases similar performance except for ReLU activation function. However, we could not obtain the from 10 to 300. We observed that the ELM with the sine, sigmoid, and radial basis function showed knowledge information about the ELM structure. Although thesewe models have beenthe successfully similar performance except for ReLU activation function. However, could not obtain knowledge demonstrated on various applications, the proposed outperformed the previous models as information about the ELM structure. Although theseTSK-ELM models have been successfully demonstrated on listed in Table 1. various applications, the proposed TSK-ELM outperformed the previous models as listed in Table 1. 1200

1300 RMSE(training) RMSE(testing)

1100

1000

1000

900

900

800

800

700

700

600

600

500

500

0

50

100

150 num. of nodes

200

250

RMSE(training) RMSE(testing)

1100

RMSE

RMSE

1200

400

300

0

50

100

(a)

150 num. of nodes

200

250

300

(b)

2500

1600

RMSE(training) RMSE(testing)

RMSE(training) RMSE(testing) 1400

2000

1200 RMSE

RMSE

1500 1000

1000 800

500

0

600

0

50

100

150 num. of nodes

(c)

200

250

300

400

0

50

100

150 num. of nodes

200

250

300

(d)

Figure 16. Performance results of the conventional ELM. (a) ReLU; (b) sigmoid; (c) RBF; and (d) sine. Figure 16. Performance results of the conventional ELM. (a) ReLU; (b) sigmoid; (c) RBF; and (d) sine.

In what follows, we quantified the performance between TSK-ELM and the previous methods using metrics such as of MAE and MAPE. To obtain further insight into the TSK-ELM performance for forecasting, we visualized the percent forecast errors by the hour of the day, day of the week, and month of the year as shown in Figure 17, respectively. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25% and 75% for MAPE, respectively.

Energies 2017, 10, 1613

14 of 18

In what follows, we quantified the performance between TSK-ELM and the previous methods using metrics such as of MAE and MAPE. To obtain further insight into the TSK-ELM performance for forecasting, we visualized the percent forecast errors by the hour of the day, day of the week, and Energies 2017, 10, 1613 14 of 18 month of the year as shown in Figure 17, respectively. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25% and 75% for MAPE, respectively. points not considered outliers, and the outliers are The whiskers extend to the most extreme data points individually using using the the “+” “+” symbol. plotted individually byby thethe number of node in the of ELM with Figure 18 18 shows showsperformance performanceresults resultsofofMAPE MAPE number of node in design the design of ELM four four activation functions, respectively. TheThe experiment was performed asasthe with activation functions, respectively. experiment was performed thenumber number of of node increase from 10 to 300. Figure 19 visualizes visualizes the the distribution distribution of of error, error, MAE, MAE, and and MAPE, MAPE, respectively. respectively. Tables22and and33list listthe thecomparison comparisonresults resultsofofMAPE MAPEand andMAE, MAE,respectively. respectively. listed Tables 2 and Tables AsAs listed inin Tables 2 and 3, 3, these results led theconclusion conclusionthat thatthe theproposed proposedTSK-ELM TSK-ELMwithout withoutlearning learningor or with with learning these results led usus toto the showed good performance in comparison to the previous methods such as ELM with four activation functions [21] [21]and andANFIS ANFISwith withgrid grid partition and input selection method We compared functions partition and input selection method [35].[35]. We compared ELMELM (sin) (sin) TSK-ELM with TSK-ELM based on the coefficient of determination (R-squared) showing statistical with based on the coefficient of determination (R-squared) showing statistical characteristics. characteristics. R-squared values are 0.9783 and 0.9905 for ELM respectively. and TSK-ELM, respectively. These R-squared values are 0.9783 and 0.9905 for ELM and TSK-ELM, These results provide provide a measure how well theisactual output predictedmodel. by TSK-ELM model. addition aresults measure of how well theof actual output predicted byisTSK-ELM In addition toIn R-squared to R-squared values, theof MAPE values ELM andand 2.14testing for training and testing while values, the MAPE values ELM are 2.16 of and 2.14are for 2.16 training data, while those ofdata, TSK-LEM those of and TSK-LEM 1.38inand 1.49 listed in Table respectively. As a result, the proposed TSKare 1.38 1.49 asare listed Table 3, as respectively. As a3,result, the proposed TSK-ELM showed good ELM showedin good performance in comparison to ELM itself. performance comparison to ELM itself.

20

percent error statistics

15

10

5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 hour

20

20

15

15

percent error statistics

percent error statistics

(a)

10

5

10

5

0

0 Sun

Mon

Tue

Wed

(b)

Thu

Fri

Sat

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug Sep

Oct

Nov Dec

(c)

Figure 17. RMSE divided by each output point. (a) hour of day; (b) day of week; (c) month of year. Figure 17. RMSE divided by each output point. (a) hour of day; (b) day of week; (c) month of year.

Energies 2017, 10, 1613

15 of 18

Energies 2017, 10, 1613

15 of 18

7

5.5

MAPE(training) MAPE(testing)

6.5

MAPE(training) MAPE(testing)

5

6 4.5

4

5

MAPE

MAPE

5.5

4.5

3.5

4 3

3.5 2.5

3 2.5

0

50

100

150 num. of nodes

200

250

2 0

300

50

100

(a)

150 num. of nodes

200

250

300

(b)

10

7

MAPE(training) MAPE(testing)

9

MAPE(training) MAPE(testing)

6.5 6

8 5.5 5 MAPE

MAPE

7 6

4.5 4

5

3.5

4 3

3 2

2.5

0

50

100

150 num. of nodes

200

250

300

2

0

50

100

150 num. of nodes

(c)

200

250

300

(d)

Figure 18. Performance results of MAPE by the number of node (ELM with four activation functions).

Figure 18. Performance results of MAPE by the number of node (ELM with four activation functions). (a) ReLU; (b) sigmoid; (c) RBF; and (d) sine. (a) ReLU; (b) sigmoid; (c) RBF; and (d) sine. Table 2. Comparison results of MAPE (*: no. of rule).

Table 2. Comparison results of MAPE (*: no. of rule).

Method ELM (ReLU) [34] Method ELM (Sig) [21] ELM (ReLU) ELM (RBF)[34] [21] ELM (Sig) ELM (sin)[21] [21] ELMANFIS (RBF)[35] [21] ELM (without (sin) [21]learning) TSK-ELM

ANFIS [35] TSK-ELM (without learning) TSK-ELM (with learning)

No. of Node 300 No. of Node 290 300 300 290 300 36300 * 36300 * 536 * * 36 * 10 * 365**

TSK-ELM (with learning)

10 *

MAPE(%)-Training 2.94 MAPE(%)-Training 2.18 2.94 2.23 2.18 2.16 2.23 3.32 2.16 2.01 3.32 1.96 2.01 1.72 1.96 1.38

1.72

36 *results of MAE (*: no. 1.38 Table 3. Comparison of rule).

MAPE(%)-Testing 2.87 MAPE(%)-Testing 2.15 2.87 2.19 2.15 2.14 2.19 3.27 2.14 1.99 3.27 1.95 1.99 1.70 1.95 1.49

1.70 1.49

Method No. of Node MAE(MWh)-Training MAE (MWh)-Testing (*: no. of rule). ELM (ReLU) [34]Table 3. Comparison 300 results of MAE 445.54 427.08 ELM (Sig) [21] 290 331.06 320.69 Method No. of MAE(MWh)-Training MAE326.21 (MWh)-Testing ELM (RBF) [21] 300Node 338.91 (sin)[34] [21] 300 328.71 319.18 ELMELM (ReLU) 300 445.54 427.08 [35] 36 500.84 483.11 ELMANFIS (Sig) [21] 290* 331.06 320.69 TSK-ELM (without 36 306.62 298.21 ELM (RBF) [21] learning) 300* 338.91 326.21 ELM (sin) [21] 300 328.71 319.18 5* 300.53 292.14 ANFIS(with [35] learning) 36 500.84 483.11 TSK-ELM 10 * 264.65 257.19 TSK-ELM (without learning) 36 * 306.62 298.21 213.72 227.48

TSK-ELM (with learning)

5* 10 * 36 *

300.53 264.65 213.72

292.14 257.19 227.48

Energies2017, 2017,10, 10,1613 1613 Energies

16 16ofof1818

1000 900 800 700 600 500 400 300 200 100 0 -3000

-2000

-1000

0 1000 Error distribution

2000

3000

4000

(a) 1000

1200 Errors MAE 1000

Errors MAPE

900 800 700

800

600 500

600

400 400

300 200

200

100 0

0

500

1000

1500 2000 2500 Absolute error distribution

(b)

3000

3500

4000

0

0

5

10 15 Absolute percent error distribution

20

25

(c)

Figure 19. Distribution of (a) error; (b) MAE; and (c) MAPE. Figure 19. Distribution of (a) error; (b) MAE; and (c) MAPE.

5. Conclusions 5. Conclusions We proposed a new design method based on an ELM with automatic knowledge representation We proposed a new design method based on an ELM with automatic knowledge representation from the numerical data sets for short-term electricity-load forecasting. The conventional ELM did from the numerical data sets for short-term electricity-load forecasting. The conventional ELM did not use knowledge information. The notable features of the proposed TSK-LEM were that the cluster not use knowledge information. The were notable features of the proposed TSK-LEM were that the cluster centers of the membership function used to determine the premise parameters, and the linear centers of the membership function were used to determine the premise parameters, and the linear weights of the TSK-type were estimated by LSE. weights ofexperimental the TSK-typeresults were estimated by LSE. The revealed that the proposed TSK-ELM showed good performance for The experimental resultsforecasting revealed that proposed TSK-ELM showed good performance for short-term electricity-load in the comparison to previous approaches such as LR, short-term electricity-load forecasting in comparison to previous approaches such as LR, CFCM-RBFN, CFCM-RBFN, LM, IRBFN, and ELM with four activation functions. Furthermore, when a hybridLM, IRBFN, and ELM with four Furthermore, when showed a hybrid-learning method learning method using LSE andactivation BP was functions. applied, the proposed model better prediction using LSE and BP was applied, the proposed model showed better prediction performance and performance and statistical characteristics for RBSE, MAPE, MAE, and R-squared. For further statistical characteristics for RBSE, MAPE, MAE, and R-squared. For further research, we shall design research, we shall design an intelligent predictor by integrating an ELM with deep learning to solve an intelligent predictor by integrating an ELM with deep learning to solve real-world problems. real-world problems. Acknowledgments: “Human Resources Resources Program Programin inEnergy EnergyTechnology” Technology”ofof Acknowledgments:This Thisresearch researchwas was supported supported by by the the “Human the Korea Institute of Energy Technology Evaluation and Planning (KETEP). Financial resources were granted by theMinistry Korea Institute ofIndustry Energy Technology and Planning (KETEP). Financial resources were granted the of Trade, and Energy,Evaluation Republic of Korea (No. 20174030201620). by the Ministry of Trade, Industry and Energy, Republic of Korea (No. 20174030201620). Author Contributions: Chan-Uk Yeom suggested the idea of the work and performed the experiments; Author Contributions: Chan-Uk Yeom suggested the idea the work andand performed the experiments; KeunKeun-Chang Kwak designed the experimental method; bothofauthors wrote critically revised the paper. Chang Kwak designed the experimental method; both authors wrote and critically revised the paper. Conflicts of Interest: The authors declare no conflict of interest. Conflicts of Interest: The authors declare no conflict of interest.

Energies 2017, 10, 1613

17 of 18

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12. 13.

14.

15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

Kyriakides, E.; Polycarpou, M. Short Term Electric Load Forecasting: A Tutorial. In Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2007; pp. 391–418. Zhang, Y.; Yang, R.; Zhang, K.; Jiang, H.; Jason, J. Consumption behavior analytics-aided energy forecasting and dispatch. IEEE Intell. Syst. 2017, 32, 59–63. [CrossRef] Dong, Y.; Ma, X.; Ma, C.; Wang, J. Research and application of a hybrid forecasting model based on data decomposition for electrical load forecasting. Energies 2016, 9, 1050. [CrossRef] Huang, N.; Lu, G.; Xu, D. A permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 2016, 9, 767. [CrossRef] Bennett, C.; Stewart, R.A.; Lu, J. Autoregressive with exogenous variables and neural network short-term load forecast models for residential low voltage distribution networks. Energies 2014, 7, 2938–2960. [CrossRef] Hernandez, L.; Baladron, C.; Aguiar, J.M.; Carro, B.; Sanchez-Esguevillas, A.J.; Lloret, J. Short-term load forecasting for microgrids based on artificial neural networks. Energies 2013, 6, 1385–1408. [CrossRef] Amjady, N.; Keynia, F. A new neural network approach to short term load forecasting of electrical power systems. Energies 2011, 4, 488–503. [CrossRef] Goude, Y.; Nedellec, R.; Kong, N. Local short and middle term electricity load forecasting with semi-parametric additive models. IEEE Trans. Smart Grid 2014, 5, 440–446. [CrossRef] Guo, H. Accelerated continuous conditional random fields for load forecasting. IEEE Trans. Knowl. Data Eng. 2015, 27, 2023–2033. [CrossRef] Yu, C.N.; Mirowski, P.; Ho, T.K. A sparse coding approach to household electricity demand forecasting in smart grids. IEEE Trans. Smart Grid 2017, 8, 738–748. [CrossRef] Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the support vector regression model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [CrossRef] Ren, Y.; Suganthan, P.N.; Amaratunga, G. Random vector functional link network for short-term electricity load demand forecasting. Inf. Sci. 2016, 367–368, 1078–1093. [CrossRef] Lopez, C.; Zhong, W.; Zheng, M. Short-term electric load forecasting based on wavelet neural network, particle swarm optimization and ensemble empirical mode decomposition. Energy Procedia 2017, 105, 3677–3682. [CrossRef] Zhang, R.; Dong, Z.Y.; Xu, Y.; Meng, K.; Wong, K.P. Short-term load forecasting of Australian national electricity market by an ensemble model of extreme learning machine. IET Gener. Transm. Distrib. 2013, 7, 391–397. [CrossRef] Golestaneh, F.; Pinson, P.; Gooi, H.B. Very short-term nonparametric probabilistic forecasting of renewable energy generation-with application to solar energy. IEEE Trans. Power Syst. 2016, 31, 3850–3863. [CrossRef] Li, S.; Wang, P.; Goel, L. A novel wavelet-based ensemble method for short-term load forecasting with hybrid neural networks and feature selection. IEEE Trans. Power Syst. 2016, 31, 1788–1798. [CrossRef] Cecati, C.; Kolbusz, J.; Rozycki, P.; Siano, P.; Wilamowski, B.M. A novel RBF training algorithm for short-term electric load forecasting and comparative studies. IEEE Trans. Ind. Electron. 2015, 62, 6519–6529. [CrossRef] Li, S.; Wang, P.; Goel, L. Short-term load forecasting by wavelet transform and evolutionary extreme learning machine. Electr. Power Syst. Res. 2015, 122, 96–103. [CrossRef] Yang, Z.; Ce, L.; Lian, L. Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl. Energy 2017, 190, 291–305. [CrossRef] Li, S.; Goel, L.; Wang, P. An ensemble approach for short-term load forecasting by ELM. Appl. Energy 2016, 170, 22–29. [CrossRef] Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B 2012, 42, 513–529. [CrossRef] [PubMed] ISO. New England. Available online: https://en.wikipedia.org/wiki/ISO_New_England (accessed on 13 October 2017). Mathworks. Available online: https://www.mathworks.com/videos/electricity-load-and-price-forecastingwith-matlab-81765.html (accessed on 13 October 2017). Zhang, R.; Lan, Y.; Huang, G.B.; Xu, Z.B. Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 365–371. [CrossRef] [PubMed]

Energies 2017, 10, 1613

25. 26. 27.

28. 29. 30. 31. 32. 33. 34. 35.

18 of 18

Huang, G.B.; Bai, Z.; Kasun, L.L.C.; Vong, C.M. Local receptive fields based extreme learning machine. IEEE Comput. Intell. Mag. 2015, 10, 18–29. [CrossRef] Neumann, K.; Steil, J.J. Optimizing extreme learning machines via ridge regression and batch intrinsic plasticity. Neurocomputing 2013, 102, 23–30. [CrossRef] Deng, Z.; Choi, K.S.; Cao, L.; Wang, S. T2FELA: Type-2 fuzzy extreme learning algorithm for fast training of interval type-2 TSK fuzzy logic system. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 664–676. [CrossRef] [PubMed] Jang, J.S.R.; Sun, C.T.; Mizutani, E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 1997. Han, Y.H.; Kwak, K.C. A design of an improved linguistic model based on information granules. J. Inst. Electron. Inf. Eng. 2010, 47, 76–82. Pedrycz, W. Conditional fuzzy clustering in the design of radial basis function neural networks. IEEE Trans. Neural Netw. 1998, 9, 601–612. [CrossRef] [PubMed] Lee, M.W.; Kwak, K.C. A design of incremental granular networks for human-centric system and computing. J. Korean Inst. Inf. Technol. 2012, 10, 137–142. Kwak, K.C.; Kim, S.S. Development of quantum-based adaptive neuro-fuzzy networks. IEEE Trans. Syst. Man. Cybern. Part B 2010, 40, 91–100. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2008. Han, J.S.; Kwak, K.C. Image classification using convolutional neural network and extreme learning machine classifier based on ReLU function. J. Korean Inst. Inf. Technol. 2017, 15, 15–23. Pal, S.; Sharma, A.K. Short-term load forecasting using adaptive neuro-fuzzy inference system. Int. J. Novel Res Electr. Mech. Eng. 2015, 2, 65–71. © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents