A Scalable Short-Term Load Forecasting Model for ...

A Scalable Short-Term Load Forecasting Model for Micro-grid Communication Networks

By

Ashfaq Ahmad CIIT/FA13-REE-044/ISB MS Thesis In Electrical Engineering

COMSATS Institute of Information Technology Islamabad – Pakistan Spring, 2015

A Scalable Short-Term Load Forecasting Model for Micro-grid Communication Networks A Thesis Presented to

COMSATS Institute of Information Technology, Islamabad

In partial fulfillment of the requirement for the degree of

MS (Electrical Engineering) By

Ashfaq Ahmad CIIT/FA13-REE-044/ISB

Spring, 2015

ii


A Graduate Thesis submitted to Department of Electrical Engineering as partial fulfillment of the requirement for the award of Degree of M.S (Electrical Engineering).

Name

Registration Number

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

Supervisor Dr. Nadeem Javaid, Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology (CIIT), Islamabad Campus.

iii

Final Approval This thesis titled


By

Ashfaq Ahmad CIIT/FA13-REE-044/ISB has been approved For the COMSATS Institute of Information Technology, Islamabad

External Examiner: ___________________________________

Dr. Hasan Mahmood Associate Professor, Department of Electronics, QAU, Islamabad Supervisor: ________________________________________________ Dr. Nadeem Javaid Assistant Professor, Department of Computer Science,

CIIT, Islamabad HoD:_________________________________________________________ Dr. Shahid A. Khan Professor, Department of Electrical Engineering,

CIIT, Islamabad

iv

Declaration I Mr. Ashfaq Ahmad, CIIT/FA13-REE-044/ISB, hereby declare that I have produced the work presented in this thesis, during the scheduled period of study. I also declare that I have not taken any material from any source except referred to wherever due that amount of plagiarism is within acceptable range. If a violation of HEC rules on research has occurred in this thesis, I shall be liable to punishable action under the plagiarism rules of the HEC.

Signature of the student: Date: ____________________________ ____________________________ Ashfaq Ahmad CIIT/FA13-REE-044/ISB

v

Certificate It is certified that Ashfaq Ahmad CIIT/FA13-REE-044/ISB has carried out all the work related to this thesis under my supervision at the Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad and the work fulfills the requirements for the award of the MS degree.

Date: ____________________________ Supervisor: ____________________________

Dr. Nadeem Javaid, Assistant Professor

Head of Department: ____________________________

Dr. Shahid A. Khan Professor, Department of Electrical Engineering.

vi

DEDICATION This thesis is dedicated to my teachers, my family and my friends.

vii

ACKNOWLEDGMENT I am heartily grateful to my supervisor, Dr. Nadeem Javaid, who not only guided me but also motivated me via insightful criticism from the beginning to the final level that enabled me to complete this thesis. I would like to acknowledge my family, my friends, and the cooperative CAST lab attendants. They all kept me motivated and energetic, and this work have not been possible without them. Finally, I offer my regard and blessing to everyone who supported me in any regard during the completion of my thesis.

Ashfaq Ahmad CIIT/FA13-REE-044/ISB

viii

ABSTRACT


The underlying forecast model is one of the most significant strategies that directly affect the economies of energy trade because not only prosumers but also the utilities aim to maximize their benefits. In this regard, most of the existing forecast models trade-off between forecast accuracy and convergence rate. This thesis presents a short term load forecasting model for micro-grid communication networks. Unlike existing short term forecast models, our proposed model factors in accuracy as well as convergence rate. Subject to accuracy improvement, we devise modifications in two popular techniques; mutual information based feature selection, and enhanced differential evolution algorithm based error minimization. On the other hand, convergence rate of the overall forecast strategy is enhanced by devising modifications in the heuristic algorithm. Besides accuracy and convergence rate improvement, we also devise modification in the feature selection technique to make our proposed model scalable. Simulation results show that accuracy of the proposed scalable short term load forecasting model is 99.5%. ix

TABLE OF CONTENTS

1 Introduction 1.1 The Smart Grid . . . . . . 1.2 Towards Localization: The 1.2.1 Load Forecast . . . 1.2.2 Our Contribution .

. . . . . . . . . . . Smart Micro-Grid . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 2 3 4 5

2 Related Work 7 2.1 Stochastic Distribution Based Strategies . . . . . . . . . . . 8 2.2 ANN based Strategies . . . . . . . . . . . . . . . . . . . . . 8 2.3 Markov Chain Based Strategies . . . . . . . . . . . . . . . . 10 3 Forecast Strategies: Towards Development 3.1 Challenges . . . . . . . . . . . . . . . . . . . 3.2 Influencing Factors . . . . . . . . . . . . . . 3.3 Basic Units of a Generic Forecast Model . . 3.3.1 Feature Selector . . . . . . . . . . . . 3.3.2 Forecaster . . . . . . . . . . . . . . . 3.3.3 Optimizer . . . . . . . . . . . . . . . 4 ANN Based Forecast Strategy 4.1 Data Preparation Module . . . . . 4.2 Feature Selection Module . . . . . . 4.3 Forecast Module . . . . . . . . . . 4.4 Simulation Results . . . . . . . . . 4.4.1 Error Performance . . . . . 4.4.2 Convergence Rate Analysis .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

11 12 12 13 13 14 15

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

16 17 18 21 25 27 30

5 mEDE and ANN Based Forecast Strategy 5.1 Motivation . . . . . . . . . . . . . . . . . . 5.2 The mEDE and ANN Based Forecast . . . 5.2.1 Pre-Processing Module . . . . . . . 5.2.2 Forecast Module . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

32 34 34 34 38

x

. . . . . .

. . . . . .

. . . . . .

5.2.3 5.2.4 5.2.5 5.2.6

Optimization Module . . . . Simulation Results . . . . . Error Performance . . . . . Convergence Rate Analysis .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

41 43 44 49

6 Modified Feature Selection, ANN and Modified EDE based Forecast Strategy 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Proposed S-STLF Model . . . . . . . . . . . . . . . . . 6.2.1 Modified MI based Feature Selection . . . . . . . . . 6.2.2 ANN based STLF . . . . . . . . . . . . . . . . . . . . 6.2.3 mEDE Based Forecast Error Minimization . . . . . . 6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Error Performance . . . . . . . . . . . . . . . . . . . 6.3.2 Convergence Rate Analysis . . . . . . . . . . . . . . . 6.3.3 Scalability Analysis . . . . . . . . . . . . . . . . . . .

50 51 51 53 56 58 60 64 67 68

7 Conclusion and Future Work 71 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8 References

74

xi

LIST OF FIGURES

1.1

An SMG . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

4.1 4.2 4.3 4.4 4.5

ANN based forecast: Block diagram . . . . . . . . . . . . . . 17 Data preparation module for ANN based forecast . . . . . . 19 An artificial neuron . . . . . . . . . . . . . . . . . . . . . . . 22 ANN based data forecast module . . . . . . . . . . . . . . . 25 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (actual vs forecast) . . . . . . . . . . . . 27 4.6 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (error performance) . . . . . . . . . . . . 28 4.7 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (convergence rate analysis) . . . . . . . . 28 4.8 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (actual vs forecast) . . . . . . . . . . . . . . . . . . 29 4.9 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (error performance) . . . . . . . . . . . . . . . . . . 29 4.10 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (convergence rate analysis) . . . . . . . . . . . . . . 30 5.1 5.2

5.3

5.4

5.5

mEDE and ANN: Block diagram . . . . . . . . . . . . . . DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (actual vs forecast) . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (error performance) . . . . . . . . . . . . . . . . . . . . . . . . . . . DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (convergence rate analysis) . . . . . . . . . . . . . . . . . . . . . . . . . EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (actual vs forecast)

xii

. 35

. 44

. 45

. 45 . 46

EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (error performance) 5.7 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (convergence rate analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (actual vs forecast) . . 5.9 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (error performance) . . 5.10 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (convergence rate analysis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14

52 61 61 62 62 63 63 64 64 66 66 67 67 69

5.6

The proposed S-STLF model . . . . . . PJMW: Actual vs forecast . . . . . . . EKPC: Actual vs forecast . . . . . . . DAYTOWN: Actual vs forecast . . . . FE: Actual vs forecast . . . . . . . . . PJMW: Error performance . . . . . . . EKPC: Error performance . . . . . . . DAYTOWN: Error performance . . . . FE: Error performance . . . . . . . . . PJMW: Convergence rate analysis . . . EKPC: Convergence rate analysis . . . DAYTOWN: Convergence rate analysis FE: Convergence rate analysis . . . . . Impact of load on error performance .

xiii

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

46

47 47 48

LIST OF TABLES

4.1

Simulation parameters of ANN based forecast . . . . . . . . 27

5.1

Simulation parameters of mEDE and ANN based forecast . . 44

6.1

Simulation parameters of mFS, ANN, and mEDE based forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Performance evaluation of the selected forecast strategies . . 70

6.2

xiv

Chapter 1 Introduction

1

1.1

The Smart Grid

In most parts of the world, especially in developed countries, transmission and distribution systems have become aged. Existing/traditional grid system needs renovation not only to bridge the ever increasing gap between demand and supply but also to meet some other essential challenges like grid reliability, grid robustness, customer electricity cost minimization, etc [1]. In this regard, recent integration of the latest information and communication technologies with the existing grid system has gained enormous attention. One of the beauties of this integration is customer engagement that plays a key role in the economies of energy trade. In other words, the old concept of uni-directional energy flow is replaced by the new and smart concept of bi-directional energy flow–transformation from traditional consumer to a smart prosumer or transformation from traditional grid into a smart one (the smart grid) [2]. European technology platform (European Commission, 2006) defines smart grid as, “a smart grid is an electricity network that can intelligently integrate the actions of all users connected to it–generators, consumers and those that do both in order to efficiently deliver sustainable, economic and secure electricity supplies”. Smart grid has revolutionized the performance of all the sections of conventional grid. In case of conventional grid, energy can only flow from generation side to consumer, whereas, in case of smart grid consumer can also sell its extra electricity generated through domestic sources, e.g., solar, wind, etc [3]. Introduction of smart grid infrastructure on distribution section has manifold impact where retailers and consumers are important players of distribution section. Prior and after advanced technology installation, utilities seek for as much return on investment as possible. On the other hand, customers seek for as minimum electricity consumption paying cost as possible. Thus, the nature of not only utilities but also consumers is greedy. Traditional grid was unable to entertain both parties at the same time due to lack of flexibility. In other words, absence of two way communication or bi-directional energy flow between utility and consumer makes the traditional grid inadequate to meet modern day grid challenges like reliability, robustness, etc [4]. It is more likely that the smart grid will integrate new communication technologies, advanced metering, distributed systems, distributed storage, security and safety to achieve considerable robustness and reliability [5, 6, 7]. From this discussion, it is clear that unlike traditional grid where utility was the main/dominant player, smart grid involves customers in energy trade as well–bi-directional energy flow. In smart grids, user engagement via two way communications leads to peak 2

load reduction as optimal decisions are taken by the energy management unit. The resulting/new grid with its advanced metering infrastructure, will affect that how [8]: • to determine and meet the load, • to determine customer engagement with utility, and • integration of the latest technologies will affect the energy trade between customer and utility. Thus, we have two main players in the smart grid; user and utility (every user is a player if more one users are considered). The bi-directional communication or energy flow benefice not only the utilities but also the consumers. More specifically, the consumers are no longer only consumers instead they are prosumers who have the ability to access electricity market both as sellers and buyers. At the same time, the smart utilities have the ability to efficiently manage their resources. Consequently, the demand and supply gap that is ever increasing can be met [6, 9]. Initially, the utility forecasts future load/price signal that is based on past activities of the users. Users then adjust their power usage schedules as per utilities price/load signal while not compromising their comfort levels. However, with the ever growing expectations, accurate forecast strategies and advanced scheduling techniques are of extreme significance that would make the over all operation as optimal as possible. In this regard, many demand side scheduling techniques are proposed [10, 11, 12, 13]. However, there exists sufficient challenges prior to scheduling techniques in terms of stochastic information schemes to predict the future load. Thus, with the growing expectation in the adoption of smart grids, advanced techniques and tools are required to optimize the overall operation. Moreover, this determination would require that the daily operations of a smart grid utility (like strategic decisions to bridge the gap between demand and supply, and fuel resource planning) are properly conveyed. To sum up, all these decisions are highly influenced by the underlying load forecast strategy [14]. 1.2

Towards Localization: The Smart Micro-Grid

By taking into consideration the development of demand response in smart grids, the resulting capacity on the user side (in residential areas) would be small enough such that we can refer it as a micro-grid (refer Fig. 1.1 [15]). It is foreseen that over the next decade, Smart Micro-Grids (SMGs) will significantly grow due to minimized installation cost, higher reliability, 3

Figure 1.1: An SMG

increased support from prosumers and utilities, etc. During disturbances, an SMG can work in islanded mode, i.e., it can disconnect itself from the main distribution system. Thus, the SMG can maintain a high service level. Moreover, islanding in an intentional manner (no disturbances) has the potential to provide high local reliability [16]. Another benefit of SMGs is the exploitation of distributed control to prevent single point of failure. 1.2.1

Load Forecast

In terms of load forecast, the SMGs are more difficult to realize than macro smart grids. This is obvious as load forecast curve exhibits more volatile and non-linear load fluctuations in SMGs as compared macro smart grids. Load forecast is one of the fundamental as well as essential tasks that are needed for proper operation of the micro-grid. On another note, accurate load forecasting leads to enhanced management of resources (renewable and conventional) which in turn directly affects the economies of energy trade. In SMGs, load forecast is of two types; short term and long term. However, in terms of Short Term Load Load Forecast (STLF), the micro-grid is more 4

difficult to realize due to lower similarities (high randomness due to more load fluctuations) in history load curves as compared to that of long term load forecasting [16]. As mentioned earlier, the load of a micro-grid shows more fluctuations as compared to the traditional large power system. In these grids, adaptation to production w.r.t load can be performed in a more dynamic way as compared macro grids. The load curve of an SMG does not always show the same shape due to random power consumption schedules of the prosumers which leads to more variability as compared to the macro grid. However, all these operations are significantly affected by the underlying forecast strategy to predict the future load(s). Due to more volatility in the history load curve, STLF is more challenging than long term load forecast [17]. In literature, many STLF strategies are presented. The authors in [18] use Artificial Neural Network (ANN) and mutual information based technique to forecast load/price of the next day. In their work, Artificial Neurons (ANs) are activated by sigmoid function because of its ability to capture nonlinearity(ies) in the load time series. Apart from its advantages, the major disadvantage of this strategy is the high value of relative error between the actual and forecast curves. Subject to relative error minimization of [18], [16] utilizes Enhanced version of Differential Evolution (EDE) algorithm. This integration minimizes the forecast error very efficiently, however, not only further improvement can be achieved in terms of accuracy but also the execution time of this strategy can be improved which is relatively on the higher side. Another hybrid STLF strategy is presented in [19], however, this strategy is very complex in terms of implementation and its execution time is also very high. 1.2.2

Our Contribution

In this thesis, we present a Scalable-STLF (S-STLF) model for Micro-grid Communication Networks (MCNs). We use a modular strategy where the output of each preceding module is fed into the succeeding module. Overall, our proposition consists of three modules; feature selector, forecaster, and optimizer. Initially, the feature selector receives historical time series of load data as input, and then selects candidate inputs having more relevant information based on our improved version of the mutual information based technique. Thus, the feature selector minimizes the curse of high dimensionality. Followed by the forecaster (note: it consists of ANN) which receives selected candidate inputs from the feature selector. Based on this received data, ANs (activated by sigmoid function) are trained to 5

predict load of the upcoming day. At this stage, the relative error between the actual and forecast curves is high. Thus, the optimizer, which consists of our modified version of the EDE (mEDE) algorithm, minimizes the forecast error. The proposed S-STLF model for MCNs is validated via simulations which show that our proposed S-STLF model performs better than the selected existing strategies in terms accuracy, convergence rate, and scalability. Rest of the thesis is organized as follows. Chapter 2 contains relevant STLF contributions from research community, chapter 3 deals with the basic architecture of a generic forecast model, chapter 4 contains description of ANN based forecast strategy, chapter 5 integrates EDE with the ANN based forecast strategy of chapter 4, chapter 6 integrates feature selection module with ANN+EDE based forecast strategy of chapter 5, and chapter 7 not only concludes the thesis but also provides future research directions. Finally, references are provided at the end of the thesis.

6

Chapter 2 Related Work

7

As accurate load forecasting has a direct impact on the economics of energy trade. So, we discuss some of the previous load forecasting research articles in SMGs as follows. 2.1

Stochastic Distribution Based Strategies

[25] presents a probabilistic approach that is subjected to energy consumption profile generation of household appliances. The proposed approach takes a wide range of appliances into consideration along with a high degree of flexibility. The proposed methodology configures household appliances between holidays and working days. Main assumptions of this work are; (i) gaussian distributed ON-OFF cycles of different appliances, (ii) gaussian distributed appliances’ energy consumption patterns, and (iii) gaussian distributed appliances in terms of their number. In this work, not only a wide range of appliances is considered but also high flexibility degree of appliances is considered. However, absence of closed form solution makes the gaussian based forecast strategy very complex. Moreover, these assumption can not be always true, thus, accuracy of the predicted load-time series is highly questionable. An improvement over [25] is presented in [26]. This research work uses regulizer to overcome the computational complexity of gaussian distribution based STLF strategy in [25]. Moreover, the proposed STLF strategy has the ability to capture heteroscedasity of load in a more efficient way as compared [25]. Simulations are conducted to prove that the proposed STLF strategy performs better than the existing one. To sum up, we conclude that [26] has overcome the complexity of [25] to some extent, however, the basic assumptions (gaussian distribution based on-off cycles of household appliances, number of appliances, and power consumption pattern of appliances) still hold the bases and thus make the proposal highly questionable in terms of accuracy. 2.2

ANN based Strategies

In [18], authors present a hybrid technique subject to short term price forecasting of SMGs. This hybrid technique comprises of two steps; feature selection and prediction. In the first step, a mutual information based technique is implemented to remove redundancy and irrelevancy from the input load time series. In the second step, ANN along with evolutionary 8

algorithm is used to predict the future load time curve. In this process, the authors assume sigmoid activation function for ANs. In addition, the authors fine-tune some adjustable parameters during the first and second steps via an iterative search procedure which is part of this work. Subject to forecast accuracy, this technique is efficient as it embeds various techniques, however, the cost paid is implementation complexity. In [16], the authors study the characteristics of load time series of a micro grid and then compare its differences with that of a traditional power system. More importantly, the authors propose a bi-level (upper and lower) short term load prediction strategy for micro grids. The lower level is a forecaster which utilizes neural network and evolutionary algorithm. The upper level optimizes the performance of the lower level by using the differential evolution algorithm. In terms of effectiveness, the proposed bi-level prediction strategy is evaluated via real time data of a Canadian university. Effectiveness of this work is reflected via MATLAB simulations which demonstrate that the proposed strategy performs STLF in SMGs with a reasonable accuracy. However, its implementation complexity is very high. Another ANN based STLF strategy is presented in [23]. This hybrid methodology completes the STLF task in four steps; data selection, transformation, forecast, and error correction. In step one, some well known techniques of data selection are used to minimize the high dimensionality curse of input load time series characteristics. Step two deals wavelet transformation of the selected characteristics of input load time series to enable redundancy and irrelevancy filter implementation. Followed by step three, which uses ANN and a training algorithm subject to STLF in SMGs. More importantly, they choose sigmoid activation function for ANs due non-linear capturability. Finally, error correcting functions are used in step four to improve the proposed STLF methodology in terms of accuracy. In simulations, this methodology is tested against practical household load which demonstrates that this methodology is very good in terms of accuracy, however, at the cost of complexity. Similarly, another novel strategy is presented in [24] to predict the occurrence of price spikes in SMGs. The proposed strategy utilizes wavelet transformation for input feature selection. An ANN is then used to predict future price spikes based on the training of the selected inputs. In [27], another STLF strategy is presented for SMGs which is completed in five steps: (i) database handling of historical load data, (ii) detection of missing data and its interpolation, (iii) principle component analysis to detect outliers, (iv) ANN based forecast, and (v) display the forecast data on different devices. However, accuracy of [27] is not satisfactory. 9

2.3

Markov Chain Based Strategies

Subject to robustness of STLF forecast strategy, authors in [22] propose a markov chains based strategy. This stochastic strategy aims to tackle load time series fluctuations associated with energy consumption of users in a heterogeneous environment. The markov chains are used to predict the future on-off cycles of household appliances in a robust way due to their memoryless nature (future values only depend on the current values; past values are not considered). This memoryless nature of markov chains not only makes the STLF strategy robust but also relatively less complex in comparison to the aforementioned techniques. However, the memory less nature of markov chains also has a drawback; less accuracy.

10

Chapter 3 Forecast Strategies: Towards Development

11

Subject to daily supply and demand planning of an utility, the daily operations are strongly influenced by price/load forecast strategies. Accurate load forecasting hold basis for spot price management in the system. As a growing interest is shown by utilities towards the implementation of smart grids, so the significance of forecast strategies becomes more important due to expanded application horizon–storage maintenance, demand side management, integration of renewable resources, load scheduling, etc. From customers point of view, accurate forecast strategies means proper understanding of the relationship between price and demand that enable them to properly schedule their usage pattern. 3.1

Challenges

Due to growing awareness of customer participation in smart grids, utilities are enforced to to develop load/price forecast strategy(ies). However, in doing so utilities face many challenges like [20]: • High and varied range of customers consumption data. • Highly volatile nature of the load/price signal. • Highly non-linear characteristics of the load/price signal. • Hybrid customer groups–using traditional meters or smart meters. • High dimensionality curse of identifying factors that may lead to overfitting problem. • Complexity of the identifying parameters. • Lack of data availability for different scenarios. 3.2

Influencing Factors

In addition to the aforementioned challenges, there are some factors that influence load forecasting in smart grid [20]: • Weather conditions: specifically when renewable energy sources are integrated. • Time of the day: electricity consumption significantly varies at different time slots of the day.

12

• Random disturbances: for example, sudden cloudy conditions highly disturb solar generation. • Electricity price market: as per price market the customers consumption pattern varies and vice versa. • Storage cells: at both utility and customer locations would greatly affect the forecast signal. 3.3

Basic Units of a Generic Forecast Model

In view of the aforementioned challenges, the research community has developed many forecast strategies. From these works, we conclude that a forecast strategy comprises of three basic units; feature selector, forecaster, optimizer. In this subsection, these basic units are discussed in detail. 3.3.1

Feature Selector

As per basic assumption of the feature selector, input data not only contain irrelevant features but also redundant features. Irrelevant features are those which do not provide useful information, and redundant features are the duplicate ones that do not provide more information. In this unit, a subset of most relevant features subject to forecast strategy development is selected. Incorporation of feature selector mainly provide three benefits; reduced over-fitting, decreased time during training, and improved model interpretation. In literature, many forecast strategies exist that have utilized the feature selector. For example, [19] uses four indices of load variation and empirical mode decomposition for feature selection. The indices of load variation observe data variation over months, between two adjacent days, between hours of the same day and in between an hour. Followed by the empirical mode decomposition algorithm that gradually decomposes the load/price signal into linear components along with some residue. The decomposed components are then ranked based on different trends and scales. Similarly, [21] applies forward selection algorithm to select reduced number of scenarios that were generated via monte carlo simulation. [22] utilizes multi-scale setting to fine tune information during state aggregation. [9] uses probability distribution function and roulette wheel mechanism to generate several scenarios. Among the generated scenarios, a subset is selected based on

13

scenario reduction process where weibull and gaussian probability distribution functions are utilized. In another work [23], input data is initially classified into schedulable and non-schedulable loads. Then, wavelet transformation is conducted to rank the input data into detailed components (high frequency) and approximate components (low frequency). Finally, [18, 16, 24] use entropy based mutual information technique for feature selection. 3.3.2

Forecaster

The basic purpose of this unit is to forecast the future load/price signal based on learning algorithms. Since the load/price is highly non-linear, the forecaster needs to capture these non-linearities with reasonable accuracy and execution time. The type of learning used here would be supervised learning because history load data is available. An important advantage of the forecaster is its ability to provide valuable information. Based on this valuable information, experts take qualitative as well as quantitative decisions that benefice the energy trade between utility and its customers. Literature review reveals that many strategies have been proposed subject to this unit. For example, [19] uses extreme learning machine with kernel in an artificial neural network environment, and [22] uses markov chains to predict the next state. [16, 18, 24] uses artificial neural network based forecaster. Among the typically used activation functions, these authors prefer sigmoid function for neuron activation due to its ability to handle the non-linearities associated with price/load signal. Subject to training of the network, [24] uses mallat’s algorithm, [23] uses discrete wavelet transformation based technique, and [16, 16] use multi-variate auto regressive model. Some other well known training algorithms are Newton’s Method, Gradient Descent based back propagation, levenbergmarquardt learning algorithm, etc. However, among these training algorithms, the typically used one is levenbergmarquardt learning algorithm because it can train the artificial neural network 10–100 times faster than the classical Newton’s Method and Gradient Descent based back propagation algorithm. Rest of the training algorithms are still unexplored in this area–a potential research area for future. It is worth mentioning that some forecasters like [16, 18] also use an evolutionary algorithm based local optimizer. The key benefit of local optimizer is its ability to escape from trapping in local minima/maxima that may arise during the training process of the artificial neural network.

14

3.3.3

Optimizer

Generally, an optimization problem is written as; Max f0 (x) OR Min f0 (x)

(3.1)

fi (x) ≤ ci ∀i ∈ Z +

(3.2)

subject to: where, the optimization variable is x = (x1 , ..., xn ), the objective function is f0 : Rn → R, the constraints are fi : Rn → R, and the upper bounds are ci . x∗ is an optimal solution of the optimization problem if and only if it has the smallest or greatest objective value among all the possible solution vectors that satisfy the constraints; we have f0 (z) ≥ f0 (z ∗ ) for any z with f1 (z) ≤ c1 , ..., fn (z) ≤ cn . Subject to forecast strategy(ies), the forecaster returns day ahead pric/load signal with some error. The forecast strategy can be further enhanced in terms of accuracy if error minimization is considered as an objective function of the optimizer. However, in this process, surplus execution time is spent. For applications, where accuracy is more important than execution time, the optimizer is of extreme significance. Here, heuristic optimization techniques (like differential evolution, particle swarm optimization, etc.) are preferred over the other optimization techniques (like linear programming, non-linear programming, etc.) due faster convergence rate. In this regard, only few techniques have been proposed that take into consideration the optimizer. For example, [16] uses enhanced differential evolution algorithm and [19] uses particle swarm optimization in the optimizer. To our knowledge, this unit is still unexplored and can be considered as a potential research area (ant algorithms, bee algorithms, genetic algorithms, etc. need to be explored).

15

Chapter 4 ANN Based Forecast Strategy

16

Subject to complex day-ahead load forecast of SGs, any proposed prediction strategy should be capable enough to mitigate the non-linear input/output relationship as efficiently as possible. ANNs are widely used as forecasters that can predict the non-linear behaviour of SG’s load time series with acceptable accuracy. However, prior to ANN based forecasting, input load time series must be made compatible. Therefore, our proposed day-ahead load forecasting model (for SGs) consists of three modules; data preparation module, feature selection module and forecast module (refer figure 4.1). The first module performs pre-processing to make the input data compatible with the feature selection module and the forecast module. The second module removes irrelevant and redundant features from the input data. The third module consists of an ANN to forecast day-ahead load of the SG. Details are as follows.

Figure 4.1: ANN based forecast: Block diagram

4.1

Data Preparation Module

As mentioned earlier, the data preparation module receives the input load time series (historical). Suppose, the input load time series is shown by the

17

following matrix: pdh11 pd2  h1  d3 P =  ph 1  ..  . pdhn1 

pdh12 pdh13 pdh22 pdh23 pdh32 pdh33 .. .. . . dn dn ph 2 ph 3

 . . . pdh1m . . . pdh2m   . . . pdh3m   ..  .. . .  dn . . . ph m

(4.1)

where, hm is the mth hour, dn is the nth day, and pdhnm is historical power consumption value at mth hour of the nth day. As there are 24 hours in a day, so m = 24. The value of n depends on designer’s choice, i.e., greater value of n leads to fine tuning during the training process of the forecast module because more lagged samples of input data are available. However, it would lead to more execution time. Prior to feed the ANN with input matrix P , the following step wise operations are performed by the data preparation module (refer Fig. 4.4): 1. Local maximum: Initially, a local maximum value is calculated for i each column of the P matrix; pcmax = max{pdh1i , pdh2i , pdh3i , . . . , pdhni }, ∀ i ∈ {1, 2, 3, . . . , n}. 2. Local normalization: In this step, each column of the matrix P is normalized by its respective local maxima such that the resultant matrix is represented by Pnrm . Now, each entry of Pnrm ranges between 0 and 1. 3. Local median: For each column of Pnrm matrix, a local median value Medi is calculated (∀ i ∈ {1, 2, 3, . . . , n}). 4. Binary encoding: Each entry of Pnrm matrix is compared with its respective Medi value. If the entry is less than its respective local median value, then it is encoded with a binary 0, else, it is encoded with a binary 1. In this way, a resultant matrix containing only binary values (0’s and 1’s),Pb , is obtained. At this stage, the Pb matrix is compatible with the forecast module and is thus fed into it. 4.2

Feature Selection Module

Once the data is binary encoded, not only redundant but also irrelevant samples needs to be removed from the lagged input data samples. In removing redundant features, the execution time during the training process 18

Figure 4.2: Data preparation module for ANN based forecast

is minimized. On the other hand, removal of irrelevant features leads to improvement in forecast accuracy because the outliers are removed. In order to remove the irrelevant and redundant features from the binary encoded input data matrix Pb , an entropy based mutual information technique is used in [16, 18] which defines the mutual information between input Q and target T by the following formula, XX p(Qi , Tj ) ∀i, j ∈ {0, 1} (4.2) MI(Q, T ) = p(Qi , Tj )log2 p(Qi )p(Ti ) i j In equation 4.2, MI(Q, T ) = 0 means that Q and T are independent, high value of MI(Q, T ) means that Q and T are strongly related and low value of MI(Q, T ) means that Q and T are loosely related. Thus, the candidate inputs are ranked with respect to the mutual information value between input and target values. In [16],[18], the target values are chosen as the last samples for every hour of the day among all the training samples (for every hour only one target value is chosen that is value of the previous day). Choice of the last sample seems logical as it is the closest value to the upcoming day with respect to time, however, it may lead to serious forecast errors due to inconsideration of the average behaviour. However, consideration of only the average behaviour is also insufficient because the last sample has its own importance. To sum up, we come up with a solution that not only considers the last sample but also the average behaviour. Thus, we modify equation 4.2 for three discrete random variables as,

MI(Q, T, M) =

XXX i

j

p(Qi , Tj , Mk )log2

k

19

p(Qi , Tj , Mk ) p(Qi )p(Ti )p(Mk )

∀i, j ∈ {0, 1} (4.3)

In expanded form, equation 4.3 is written as follows, p(Q = 0, T = 0, M = 0 MI(Q, T, M) = p(Q = 0, T = 0, M = 0) × log2 p(Q = 0)p(T = 0)p(M = 0) p(Q = 0, T = 0, M = 1 + p(Q = 0, T = 0, M = 1) × log2 p(Q = 0)p(T = 0)p(M = 1) p(Q = 0, T = 1, M = 0 + p(Q = 0, T = 1, M = 0) × log2 p(Q = 0)p(T = 1)p(M = 0) p(Q = 0, T = 1, M = 1 + p(Q = 0, T = 1, M = 1) × log2 p(Q = 0)p(T = 1)p(M = 1) p(Q = 1, T = 0, M = 0) + p(Q = 1, T = 0, M = 0) × log2 p(Q = 1)p(T = 0)p(M = 0) p(Q = 1, T = 0, M = 1) + p(Q = 1, T = 0, M = 1) × log2 p(Q = 1)p(T = 0)p(M = 1) p(Q = 1, T = 1, M = 0) + p(Q = 1, T = 1, M = 0) × log2 p(Q = 1)p(T = 1)p(M = 0) p(Q = 1, T = 1, M = 1) + p(Q = 1, T = 1, M = 1) × log2 p(Q = 1)p(T = 1)p(M = 1) (4.4) In order to determine the MI value between Q and T , the joint and independent probabilities needs to be determined. For this purpose, an auxiliary variable Av is introduced. Av = 4T + 2M + Q ∀T, M, Q ∈ {0, 1} It is clear from equation 4.5 A2v , A3v , ..., A7v counts the l data points) for which Av 7, respectively. In this way,

(4.5)

that Av ranges between 0 and 7. A0v , A1v , number of sample data points (out of total = 0, Av = 1, Av = 2, Av = 3,..., Av = we can now easily determine the joint and

20

independent probabilities as follows. p(Q = 0, T = 0, M = 0) = p(Q = 0, T = 0, M = 1) = p(Q = 0, T = 1, M = 0) = p(Q = 0, T = 1, M = 1) = p(Q = 1, T = 0, M = 0) = p(Q = 1, T = 0, M = 1) = p(Q = 1, T = 1, M = 0) = p(Q = 1, T = 1, M = 1) =

p(Q = 0) = p(Q = 1) = p(T = 0) = p(T = 1) = p(M = 0) = p(M = 1) =

A0v l A2v l A4v l A6v l A1v l A3v l A5v l A7v l

A0v + A2v + A4v + A6v l A1v + A3v + A5v + A7v l A0v + A1v + A2v + A3v l A4v + A4v + A5v + A7v l A0v + A1v + A4v + A5v l A2v + A3v + A6v + A7v l

(4.6)

(4.7)

Based on equation 4.4, mutual information between Q and T is calculated, and thus redundancy and irrelevancy is removed from the input samples. This mutual information based technique is computed with reasonable execution time and acceptable accuracy. 4.3

Forecast Module

By evaluating load variations over several months or between two consecutive days or between consecutive hours over a day, [19] concluded that 21

SG’s load-time series signal exhibits strong volatility and randomness. This result is obvious because different users have different energy/power consumption patterns/habits. Thus, in terms of DLF, realization of a SG is more difficult as compared to its realization in terms of long-term load forecast. Therefore, the basic requirement of the forecast module is to forecast the load-time series of a SG by taking into consideration its nonlinear characteristics. In this regard, ANNs are widely used due to two reasons; accurate forecast ability, and the ability to capture the non-linear characteristics. Due to the aforementioned reasons, we choose ANN based implementation in our forecast module. Initially, the forecast module receives selected features SF (.), and then constructs training ‘T S’ and validation samples ‘V S’ from it as follows: T S = SF (i, j), ∀i ∈ {2, 3, . . . , m} and ∀j ∈ {1, 2, 3, . . . , n}

(4.8)

V S = SF (1, j), ∀j ∈ {1, 2, 3, . . . , n}

(4.9)

From equations 4.8 and 4.9, it is clear that the ANN is trained by all the historical load-time series candidates except the last one which is used for validation purpose. This discussion leads us towards the explanation of the training mechanism. However, prior to explanation, it is essential to describe the ANN. An ANN, inspired from the nervous system of humans,

Figure 4.3: An artificial neuron

is a set of Artificial Neurons (ANs) to perform tasks of interest (note: our task of interest is STLF of micro-grids). Usually, an AN performs a nonlinear mapping from RI to [0, 1] that depends on the activation function used. AN fact : RI → [0, 1] (4.10) 22

where I is the vector of input signal to AN. Fig. 4.3 illustrates the structure of an AN that receives I = (I1 , I2 , . . . , In ). In order to either deplete or strengthen the input signal, to each Ii is associated a weight wi . The ANN AN computes I, and uses fact to compute the output signal ‘y’. However, the strength of y is also influenced by a bias value (threshold) ‘b’. So, we can compute I as follows: iX max Ii w i (4.11) I= i=1

AN AN receives I and b to determine y. Generally, fact s are mappings The fact AN AN that monotonically increase (fact (−∞ = 0) and fact (+∞ = 1)). Among AN AN the typically used fact s, we use sigmoid fact .

1

AN fact (I, b) =

1+

e−α(I−b)

(4.12)

AN AN We choose sigmoid fact due to two reasons; fact ∈ (0, 1) and the paramAN eter α has the ability to control steepness of the fact . In other words, AN sigmoid fact choice enables the AN to capture the non-linear characteristic of load time series. Since, this work aims at day-ahead load forecasting for micro-grids, and one day consists of 24 hours. So, the ANN consists of 24 forecasters (one AN for an hour) where each forecaster predicts load of one hour of the next day. In other words, 24 hourly load time-series are separately modeled instead of one complex forecaster. The whole process is repeated every day to forecast load of the next day.

The question that now needs to be answered is how to determine wi and b? The answer is straight forward, i.e., via learning. In our case, prior knowledge of load-time series exists. Thereby, we use supervised learning; adjusting wi and b values until a certain termination criterion is satisfied. The basic objective of supervised training is to adjust wi and b such that the error signal ‘e(k)’ between the target value ‘ˆ y (k)’ and real output of neuron ‘y(k)’ is minimized. Minimize e(k) = y(k) − yˆ(k), ∀k ∈ {1, 2, 3, . . . , m}

(4.13)

We use the method of least squares to determine the parameter matrices, that is given as follows, Minimize J(K) =

m X

eT (k)e(k),

k=1

∀k ∈ {1, 2, 3, . . . , m} 23

(4.14)

Subject to most feasible solution of Eqn. 4.14, we use the multi-variate auto regressive model presented in [28] because it solves the objective function in relatively less time with reasonable accuracy as compared to the typically used learning rules like gradient descent, widrow-hoff, and delta [29]. According to [28], the parameter matrices are given as follows, n X

W (i)R(j − i) = 0 j = {2, 3, . . . , n}

(4.15)

W (i)R(i − j) = 0 j = {2, 3, . . . , n}

(4.16)

i=1

n X i=1

where, W (1) = ID (ID is identity matrix), W (1) = ID , and R is the cross co-relation given as: n−1−i 1 X [x(k) − m][x(k − i) − m]T R(i) = n k=i

(4.17)

In Eqn. 4.17, m is the mean vector of observed data, n

1X m= x(k) n k=i

(4.18)

Based on these equations, [28] defines the following prediction error covariance matrices.  P  Vt = nk=1 Wt (k)R(−k)   Pn   V t = k=1 W t (k)R(−k) P (4.19) ∆t = nk=1 Wt (−k)R(t − k + 1)     P ∆t = nk=1 W t (k)R(−t + k − 1)

The recursive equations are as follows:

Wt+1 (k) = Wt (k)Wt+1 (t + 1)W t (t − k + 1)

)

W t+1 (k) = W t (k)W t+1 (t + 1)W t (t − k + 1) ) −1 Wt+1 (t + 1) = −∆t V t W t+1 (t + 1) = −∆t Vt−1

(4.20)

(4.21)

In order to find the weights, Eqn. 4.20 and Eqn. 4.21 are solved recursively. For further details about the weight update mechanism, readers are suggested to read [28]. Fig. 4.4 is pictorial representation of the steps involved in data forecast module. 24

Figure 4.4: ANN based data forecast module

Once the weights in Eqn. 4.19 and Eqn. 4.20 are recursively adjusted as per objective function in Eqn. 4.14, the output matrix is ten binary decoded and de-normalized to get the desired load-time series. Stepwise algorithm of the proposed methodology is shown in algorithm 1. 4.4

Simulation Results

We evaluate our proposed DLF model (m(MI+ANN)) by comparing it with an existing MI+ANN model in [16]. In our simulations, historical load time-series data from November (2014) to January (2015) is taken from the publicly available PJM electricity market for two SGs in United States of America; DAYTOWN, and EKPC [30]. November–December (2014) data is used for training and validation purpose, and January (2015) data is used for test purpose. Simulation parameters are shown in Table 4.1, and their justification can be found in [16, 18, 28, 29]. In this paper, we have considered two performance metrics; % error, and execution time (convergence rate). • Error performance: It is the difference between actual and the forecast signal/curve, and is measured in %. • Convergence rate or execution time: The simulation time taken by the system to execute a specific forecast model. Forecast models for which execution time is small are said to converge fastly as compared to the vice versa case. In this paper, execution time is measured in seconds. Figures 4.5 and 4.8 are the graphical illustrations of the fact that how well our proposed ANN based DALF model predicts the target values of an SG. In these figures, the proposed m(MI+ANN) based forecast curve more tightly follows the target curve as compared to the existing MI+ANN 25

Algorithm 1 : Pseudo-code of day-ahead ANN load forecast 1: Pre-conditions: i = # of days, and j = # of hours per day 2: P ← historical load data ci 3: Compute Pmax ∀i ∈ {1, 2, 3, . . . , m} 4: Compute Pnrm 5: Compute Medi ∀i ∈ {1, 2, 3, . . . , m} 6: for all (i ∈ {1, 2, 3, . . . , m}) do 7: for all (j ∈ {1, 2, 3, . . . , n}) do (i,j) 8: if (Pnrm ≤ Medi ) then 9: Pbi,j ← 0 10: else if then 11: Pbi,j ← 1 12: end if 13: end for 14: end for 15: Compute ST and SV 16: Compute y(1) by letting W (1) = I and 17: W (1) = I 18: while Max. # of iterations not reached do 19: if J(k + 1) ≤ J(k) then 20: y(k) ← y(k + 1) 21: else if then 22: Train ANN as per Eqn. 4.20 and Eqn. 4.21 23: Compute y(k + 1) and go back to step (17) 24: end if 25: end while 26: Perform decoding 27: Perform de-normalization

26

Table 4.1: Simulation parameters of ANN based forecast

Parameter Number of forecasters Number of hidden layers Number of neurons in the hidden unit Number of iterations Momentum Initial weights Historical load data Bias value

Value 24 1 5 100 0 0.1 26 days 0

based forecast curve which is justification of the theoretical discussion of our proposed methodology in terms of non-linear forecast ability. Not only AN the sigmoid fact (refer equation) but also the multivariate auto-regressive training algorithm enable the day-ahead ANN based forecast methodology to capture non-linearity(ies) in historical load data. 2300 Actual m(MI+ANN) forecast MI+ANN forecast

2200

Load (KW)

2100

2000

1900

1800

1700

1600

0

5

10

15

20

25

Time (hr)

Figure 4.5: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (actual vs forecast)

4.4.1

Error Performance

Figure 4.6 shows the % forecast error when tests are conducted on DAYTOWN grid; our m(MI+ANN) forecasts with 2.9% and the existing MI+ANN 27

X=2 Y = 3.84

4 3.5 3

X=1 Y = 2.9

Error (%)

2.5 2 1.5 1 0.5 0

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.6: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (error performance) X=2 Y = 6.54

7

6

Execution time (sec)

5

4

3

X=1 Y = 2.48

2

1

0

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.7: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (convergence rate analysis)

forecasts with 3.84% relative errors, respectively. Similarly, Fig. 4.9 shows the % forecast error when tests are conducted on EKPC grid; our m(MI+ANN) forecasts with 2.88% and the existing MI+ANN forecasts with 3.88% relative errors, respectively. This improvement in terms of relative % error performance by our proposed DALF model is due to the following two reasons; (i) the modified feature selection technique in our proposed DALF 28

1800 Actual m(MI+ANN) forecast MI+ANN forecast

1700

Load (KW)

1600

1500

1400

1300

1200

0

5

10

15

20

25

Time (hr)

Figure 4.8: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (actual vs forecast) X=2 Y = 3.88

4 3.5 3

X=1 Y = 2.88

Error (%)

2.5 2 1.5 1 0.5 0

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.9: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (error performance)

model, and (ii) multi variate auto regressive training algorithm. The first reason accounts for the removal of redundant as well as irrelevant features from the input data in a more efficient way as compared to the existing DALF model. By more efficient way we mean that as our proposal considers average sample in the feature selection process as well in addition to the last sample and the target sample. Thus, the margin of outliers 29

X=2 Y = 6.6

7

6


5

4

3

X=1 Y = 2.58

2

1

0

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.10: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN forecast (convergence rate analysis)

which cause significant relative % error is down-sized. The second reason deals with the selection of an efficient training algorithm, as our proposition trains the ANN via the multi variate auto regressive algorithm and the existing DALF model trains the ANN via levenberg-marquardt algorithm. 4.4.2

Convergence Rate Analysis

As discussed earlier that there exist a trade-off between forecast accuracy and execution time. However, Figs. 4.6–4.7 and 4.9–4.10 show that our proposed DALF model not only results in relatively less % error but also less execution time. As mentioned earlier, our devised modifications in the feature selection process and selection of the multi variate training algorithm cause relative improvement in terms of % error. On the other hand, m(MI+ANN) model converges with a faster rate (less execution time) as compared to the existing MI+AN model due to three reasons; (i) exclusion of the local optimization algorithm subject to error minimization, (ii) modified feature selection process, and (iii) selection of multi variate auto regressive training algorithm. Our proposition selects features from the input data while considering average sample, last sample and the target sample. This means that the chances of outliers in selected features have been significantly decreased, and the local optimization algorithm used by the existing MI+ANN forecast model is not further needed. Our proposed m(MI+ANN) forecast model does not account for the execution time 30

taken by the iterative optimization algorithm. As a result, our proposed DALF model converges with a faster rate as compared to the existing DALF model.

31

Chapter 5 mEDE and ANN Based Forecast Strategy

32

In STLF problems, the stochastic volatility of target values of the time series has a significant impact on the STLF errors. As we know that the load of MG shows larger volatility as compared to that of traditional large power system. Thus, with the objective to correctly discuss the difference between load time series characteristics of a SMG and a large power system, following are the typically used indices [19]. 1. LV in months: In this technique, normalized mean square M to show LV in several months. Initially, sampled values of load time series are normalized into [0, 1] and then variance is calculated. If xi is the ith load sample in total N number of samples, then M is given as follows: N 1 X ′ M= (x i − µ′ i ) N i=1

where, x′i =

xi max(x)

and µ′ =

1 N

PN

i=1

(5.1)

x′i .

2. LV between two consecutive days: LV between two consecutive days can be shown by two sub-indices; maximum difference ’Dmax ’ and average difference ’Davg ’. If Xi is the load of ith day and Nd is the total number of days, then the formulae for Dmax and Davg are as follows: ! r ∆Xi ∆XiT (5.2) Dmax = max L r N d −1 X 1 ∆Xi ∆XiT Davg = (5.3) Nd − 1 i=1 L where, ∆Xi = Xi+1 −Xi , i = 1, . . . , Nd −1 for Eqn. 5.2, i = 1, . . . , Nd x ,...xi×L . for Eqn. 5.3, and Xi = (i−1)×L+1 x(i−1)×L+1

3. LV in a day: As the daily load time series curve shows variations– average value ’¯ xi ’, minimum value ’xmin ’ and maximum value ’xmax ’. i i For a maximum of Nd number of days, the minimum daily load rate is computed as follows: Rmin

Nd 1 X xmin i = . Nd i=1 xmax i

(5.4)

Similarly, the daily load rate can be computed as follows: Nd x¯i 1 X R= max Nd i=1 xi

33

(5.5)

4. LV in an hour: In order to analyze load variation in an hour, maximum slope ’mmax ’ and average slope ’mavg ’ are used. If N samples of load time series are considered then mavg and mmax are given as follows: N −1 1 X x′ i − x′i mavg = || || (5.6) N − 1 i=1 ti+1 − ti ′ x i − x′i || (5.7) mmax = || ti+1 − ti 5.1

Motivation

Subject to highly volatile DALF of SGs, any forecast proposition must efficiently deal with the non-linear input/output relationship. In this regard, ANNs are widely used as forecasters because these networks can predict the non-linearities of SGs’ load with low convergence time. However, sometimes the achieved prediction accuracy is not up to the mark. Thus, leading to the adoption of optimization techniques that can significantly enhance the prediction accuracy of ANNs. However, the cost paid to achieve high accuracy is increased convergence time. Therefore, we focus on the development of an DALF strategy that is based on a compromising approach between prediction accuracy and convergence time. 5.2

The mEDE and ANN Based Forecast

Our proposed DALF strategy consists of three modules; pre-processing module, forecast module, and optimization module (refer Fig. 5.1). The pre-processing module makes the input load time series compatible with the forecast module, and removes redundant and irrelevant features from the input data. Based on sigmoid activation function and multi-variate auto regressive model, the forecast module (which consists ANNs) performs DALF of SGs. Finally, the optimization module minimizes prediction errors to improve accuracy of the overall DALF strategy. Detailed description of each module is as follows. 5.2.1

Pre-Processing Module

Since the ANN based forecaster uses only binary data to predict load of the next day, the input data must be made compatible. In other words, the 34

Figure 5.1: mEDE and ANN: Block diagram

input data must be pre-processed to make it compatible with the forecast module. In addition, redundant and irrelevant samples must be removed from the input data set due to two reasons; (i) redundant features do not provide more information and thus unnecessarily increase the execution time during the training process (will be later discussed in the forecast module), and (ii) irrelevant features do not provide useful information and act as outliers. Detailed description of the pre-processor module is as follows. As mentioned earlier, the data preparation module receives the input load time series (historical). Suppose, the input load time series is shown by the following matrix,   p(h1 , d1 ) p(h2 , d1 ) p(h3 , d1) . . . p(hm , d1 )  p(h1 , d2 ) p(h2 , d2 ) p(h3 , d2) . . . p(hm , d2 )     p(h1 , d3 ) p(h2 , d3 ) p(h3 , d3) . . . p(hm , d3 )     p(h1 , d4 ) p(h2 , d4 ) p(h3 , d4) . . . p(hm , d4 )  P = (5.8)   p(h1 , d5 ) p(h2 , d5 ) p(h3 , d5) . . . p(hm , d5 )      .. .. .. .. ..   . . . . . p(h1 , dn ) p(h2 , dn ) p(h3 , dn ) . . . p(hm , dn )

where, dn is the nth day, hm is the mth hour of the day, and p(hm , dn ) is 35

the power consumption value of the of the nth day at the mth hour. As per standard time horizon of one complete day, m = 24. The value of n is totally dependent on the choice of designer; increasing n means fine tuning during the training of the forecast module because more historical lagged samples of input power matrix are available. However, this fine tuning is achieved at the cost of more execution time. Thus, there is a trade-off between convergence rate and accuracy. Before feeding the forecast module with input matrix P , algorithm 2 is executed by the pre-processing module to ensure P ’s compatibility with the forecast module. Algorithm 2 : Pseudo-code of the pre-processing module 1: Pre-conditions: i = # of days, and j = # of hours per day 2: P ← historical load data ci 3: Compute Pmax ∀i ∈ {1, 2, 3, . . . , m} 4: Compute Pnrm 5: Compute Medi ∀i ∈ {1, 2, 3, . . . , m} 6: for all (i ∈ {1, 2, 3, . . . , m}) do 7: for all (j ∈ {1, 2, 3, . . . , n}) do (i,j) 8: if (Pnrm ≤ Medi ) then 9: Pbi,j ← 0 10: else if then 11: Pbi,j ← 1 12: end if 13: end for 14: end for i Firstly, a local maximum value ‘pcmax ’ is calculated subject to each column of the historical input load matrix P ; i pcmax = max(p(hi , d1 ), p(hi , d2 ), p(hi , d3), . . . , p(hi , dn )), ∀ i ∈ {1, 2, 3, . . . , m}

(5.9)

Secondly, local normalization of each column of P is carried out by its respective local maxima; results are saved in Pnrm (range of Pnrm ∈ [0, . . . , 1]). Thirdly, a local median value ’Medi ’ (∀ i ∈ {1, 2, 3, . . . , n}) is computed subject to each column of the Pnrm matrix. Fourthly, each element of Pnrm matrix is compared with its respective local median value ’Medi ’ based on which encoding is performed as follows: 1 if Pnrm (hi , dj ) ≥ Medi Pb (hi , dj ) = (5.10) 0 Otherwise 36

In this way, a resultant matrix ‘Pb ’ consisting of only binary values (0’s and 1’s) is obtained. This Pb matrix not only contain irrelevant features but also contain redundant features. In order to remove these two types of features from the matrix Pb , we use mutual information technique that is proposed in [16] and later on used in [18] as well. According to this technique, the mutual information between input X and target T is given as follows, XX p(Xi , Tj ) (5.11) MI(X, T ) = p(Xi , Tj )log2 p(X )p(X ) i i i j In (5.11), MI(X, T ) = 0 means that the input and target variables and independent, high value of MI(X, T ) means that the two variables are strongly related and low value of MI(X, T ) means that the two variables are loosely related. Expanded form of (5.11) is as follows, p(X = 0, T = 0 MI(X, T ) = p(X = 0, T = 0) × log2 p(X = 0)p(T = 0) p(X = 0, T = 1 +p(X = 0, T = 1) × log2 p(X = 0)p(T = 1) p(X = 1, T = 0 (5.12) +p(X = 1, T = 0) × log2 p(X = 1)p(T = 0) p(X = 1, T = 1 +p(X = 1, T = 1) × log2 p(X = 1)p(T = 1) In order to determine the joint and independent probabilities in (5.12), an auxiliary variable Vb is introduced. Vb = 2T + X ∀T, X ∈ 0, 1

(5.13)

It is clear from (5.13) that Vb ranges between 0 and 3. V0b , V1b , V2b , and V3b counts the number of sample data points (out of total l data points) for which Vb = 0, Vb = 1, Vb = 2, and Vb = 3, respectively. In this way, we can now easily determine the independent and joint probabilities as follows. V1b + V3b V0b + V2b , p(X = 1) = l l V0b + V1b V2b + V3b p(T = 0) = , p(T = 1) = l l

p(X = 0) =

V0b , p(X = 0, T = 1) = l V1b p(X = 1, T = 0) = , p(X = 1, T = 1) = l

p(X = 0, T = 0) =

37

V2b l V3b l

(5.14)

(5.15)

Based on (5.12), mutual information between X and T is calculated, and thus redundancy and irrelevancy is removed from the input samples. According to [16, 18], this mutual information based technique is computed with reasonable execution time and acceptable accuracy. 5.2.2

Forecast Module

In literature, many research works exist that investigated LV in SMGs. However, authors in [19] comprehensively examined LV based on four indices; LV in months (refer Eqn. 5.1), LV between two consecutive days (refer Eqn. 5.2), Eqn. 5.3), LV in day (refer Eqn. 5.4, Eqn. 5.5) and LV in an hour (refer Eqn. 5.6, Eqn. 5.14). From these works, it is concluded that any forecast strategy must be able to perform STLF of SMGs while ensuring non-linear prediction capability. Therefore, we choose ANNs because these have the ability to capture the highly volatile characteristics of load time series with reasonable accuracy. For STLF, two strategies are used; direct forecasting and iterative forecasting [16]. However, it is discussed in [31] that the first strategy may introduce significant round off errors and the second one introduces large forecast errors. In order to overcome these imperfections, [16] has introduced the idea of cascaded strategy. Thus, our proposed forecast module implements the cascaded strategy. Our forecast module consists of an ANN; 24 consecutive cascaded forecasters such that each one of the 24 forecasters has a single output to forecast the load of an hour of the upcoming day. It is worth mentioning that the 24 hourly time series forecasters are separately modeled instead of a single complete/complex one. These 24 one hour ahead forecasters allow improvement in terms of accuracy [16]. The cascaded ANN forecast structure is a combination of direct and iterative structures such that load of each hour of the next day is directly predicted and each forecaster yields exactly one output. In the forecast module, each forecaster is an AN that implements sigmoid function as an activation function. We have chosen sigmoid activation function because it enables the AN to capture the highly volatile (nonlinear) characteristic of SMG’s load time series. In order to update the weights during training process of the AN, like [16, 18], we use multi-variate auto regressive algorithm because it can train the ANN more faster than levenberg-marquardt algorithm and gradient descent back propagation algorithm [29]. According to kolmogrov theorem, if the ANN is provided with proper number of ANs then it has the ability to solve a problem by 38

adopting one hidden layer. Thus, we have considered one hidden layer in the ANN structure of all 24 ANs. In short, (due to the aforementioned reasons) our proposed forecast module is basically an ANN that consists of 24 ANs. Each AN is activated by sigmoid function and is trained by multi-variate auto regressive algorithm. Initially, the forecast module receives the binary encoded matrix Pb which is the output of pre-processing module. From this matrix, the forecast module constructs training and validation samples as follows:

ST = Pb (i, j), ∀i ∈ {2, 3, . . . , m} and ∀j ∈ {1, 2, 3, . . . , n}

(5.16)

SV = Pb (1, j), ∀j ∈ {1, 2, 3, . . . , n}

(5.17)

Eqns. 5.16 and 5.17 illustrate that the ANN is trained by all the candidate inputs (historical load time series) except the last one. The last sample of historical load time series is used for validation purpose. In fact, the validation set/sample is a part of the training load samples that is removed from it during the training process. Thus, the validation set becomes unseen for ANN. Moreover, validation error can be used as a measure of ANN’s error for the 24 hour forecast horizon. In order to make the validation error as a true representative of the forecast error, validation sample needs to be as close to the forecast horizon as possible. We consider the validation sample as the day before the forecast day because it includes not only the short run trend but also the daily periodicity characteristics of the load signal [32]. Thus, each of the 24 ANs is trained as per multi variate auto regressive algorithm by the training samples and is validated by the last/unseen validation sample. The Mean Absolute Percentage Error (MAPE) for each of the 24 validation samples is considered as validation error in this research work. m

1 X |pact (hi , dj ) − pf or (hi , dj )| MAP Ei = m j=1 pact (hi , dj )

(5.18)

where pact (hi , dj ) is the actual load value of the ith hour of the j th day, pf or (hi , dj ) is the forecast load value of the ith hour of the j th day, and m is the number of days under consideration. The objective of supervised training is to adaptively adjust the weight values (fed to ANs) such that the error signal ‘MAP Ei ’ between the target 39

value and real output of neuron is minimized. For the sake of clarity, we represent MAP Ei as MAP E(i) Minimize MAP E(i) ∀i ∈ {1, 2, 3, . . . , m}

(5.19)

In this research work, the method of least squares is used, thus we can write, Minimize J(I) =

m X

MAP E T (i)MAP E(i),

k=1

∀i ∈ {1, 2, 3, . . . , m}

(5.20)

In order to achieve the objective function in Eqn. 5.20, we use the multivariate auto regressive model [28]. We choose this model due to two reasons: (i) it provides solution to the objective function in relatively less time, and (ii) in terms of accuracy it is reasonable. It is worth mentioning here that both these reasons are given after comparison of the multi-variate auto regressive model with the typically used learning models like gradient descent, delta, and widrow-hoff [29]. Thus, the parameter matrices are [28], n X

W (i)R(j − i) = 0,

j = {2, 3, . . . , n}

(5.21)

W (i)R(i − j) = 0,

j = {2, 3, . . . , n}

(5.22)

i=1

n X i=1

D

where, W (1) = I (I D is identity matrix), W (1) = I D , and R is the cross co-relation given as: n−1−i 1 X [x(k) − xm ][x(k − i) − xm ]T R(i) = n k=i

(5.23)

In Eqn. 5.23, x is the vector of observed data, and xm is the mean of observed data, Based on these equations, [28] defines prediction error co-variance matrices as follows,  P ∆t = nk=1 Wt (−k)R(t − k + 1)    Pn  ∆t = k=1 W t (k)R(−t + k − 1) P (5.24)  Vt = nk=1 Wt (k)R(−k)   P   V t = nk=1 W t (k)R(−k) 40

The recursive equations are given as follows: Wt+1 (k) = Wt (k)Wt+1 (t + 1)W t (t − k + 1)

)

W t+1 (k) = W t (k)W t+1 (t + 1)W t (t − k + 1) −1

Wt+1 (t + 1) = −∆t V t

W t+1 (t + 1) = −∆t Vt−1

)

(5.25)

(5.26)

In order to find the weights W , the recursive equations are solved. Further details of the weight update process can be found in [28]. Once the weights in Eqn. 5.25 and Eqn. 5.26 are adaptively adjusted in a recursive manner, the forecast module return the error signal to the optimization module. Stepwise algorithm of the proposed forecast module is shown in algorithm 3. Algorithm 3 : Pseudo-code of the forecast module 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

Pre-conditions: MAP E(i) is the output of AN, and i ∈ {1, 2, ..., 24} Receive Pb matrix from the pre-processing module Compute ST and SV Compute MAP E(i) by letting W (1) = I D and W (1) = I D Compute J(i) while Max. # of iterations not reached do if J(i + 1) ≤ J(i) then MAP E(i) ← MAP E(i + 1) else if then Train ANN as per Eqn. 5.25 and Eqn. 5.26 Compute MAP E(i) and go back to step (6) end if end while Return J(I) to the optimization module

5.2.3

Optimization Module

Based on the nature of the overall forecast strategy, the basic objective of optimization module is to minimize the forecast error. For this purpose, various choices are available like linear programming, non-linear programming, quadratic programming, convex optimization, heuristic optimization, etc. However, the first one is not applicable here because the problem is 41

highly non-linear. The non-linear problem can be converted into a linear problem, however, the overall process would become very complex. The second one is applicable here and gives accurate results, however, its execution time is very high. Similarly, the third and fourth ones suffer from slow convergence time. It is worth mentioning here that optimization does not imply exact reachability to optimum set of solutions, rather, near optimal solution(s) are obtained. To sum up, heuristic optimization techniques are preferred in these situations because these provide near optimal solution(s) in relatively faster rate of convergence. Differential evolution is one of the heuristic optimization techniques proposed in [33] and its enhanced version is used for forecast error minimization in [16]. In this paper, the EDE algorithm is modified for the sake accuracy. Thus, in the upcoming paragraphs, detailed discussion is presented. According to [16], the trial vector for ith individual in generation ’t’ is given as, t ′t ui,j if rnd(j) ≤ F FN (Uit ) (5.27) yi,j = t xi,j if rnd(j) > F FN (Uit )

where, x( i, j)t and u( i, j)t are the corresponding parent and mutant vectors, respectively. In Eqn. 5.27, F FN (.) denotes the fitness function (0 < F FN (.) < 1) and Rand(j) is a uniformly distributed random number [0, 1]. Between Xit and Yit , the corresponding offspring of the next ( generation Xi t + 1) is selected as follows: ′t ′ yi,j if J(yit ) ≤ J(xti ) t (5.28) yi,j = xti,j otherwise Where, J(.) is the objective function. From Eqn. 5.27 and Eqn. 5.28, it is clear that offspring selection depends on the trial vector which in turn depends on the random number and the fitness function. From this discussion, we conclude that there is a big question mark on the fitness of the selected offspring. We fix this problem by eliminating the influence of random number on the selection of the offspring, i.e., we modify Eqn. 5.27 as follows:  Xit  uti,j if < F FN (Uit ) ′ Xitmax yi,jt = (5.29) Xt  xti,j if X t i ≥ F FN (Uit ) imax

From Eqn. 5.29, it is clear that the trial vector no longer depends on the random number instead its dependence in now totally on the mutant vector which in turn depends on the parent vector. In this way, the selected offspring will be fit enough in terms of accuracy. 42

Algorithm 4 : Pseudo-code of the optimization module 1: Pre-conditions: Set t = 0; the generation counter. 2: Randomly initialize the enhanced differential evolution population while respecting the given lower and upper bounds [18, 24]. 3: Evaluate the individuals; Xit 4: while Max. # of generations not reached do 5: for Each individual (Xit ) do 6: Obtain J(Xit ) from the forecast module 7: Compute mutant vector; Uit 8: Obtain J(Uit ) from the forecast module 9: Compute F FN (Xit ) based on Eqn. 3 in [16] 10: Compute F FN (Uit ) based on Eqn. 4 in [16] ′ 11: Generate Yi t , based on Eqn. 5.29 12: Select offspring for the next generation based on Eqn. 5.28 13: i=i+1 14: end for 15: t=t+1 16: end while 17: Return best individuals 5.2.4

Simulation Results

In order to evaluate our proposed day-ahead load forecasting model, we conduct simulations. The proposed FS+ANN+mEDE based forecast model is compared with two existing DALF models; FS+MI forecast [16], and Bi-level forecast [18]. Historical load-time series values of PJM electricity market are publicly available at their website [30]. In our simulations, historical load time-series data from November (2014) to January (2015) is taken from the publicly available PJM electricity market for three SGs in United States of America; DAYTOWN, EKPC, and FE. November– December (2014) data is used for training and validation purpose, and January (2015) data is used for test purpose. Following are the simulation parameters that are used in our experiments (refer Table 5.1). Justification of these parameters can be found in [16, 18, 28, 29]. In this paper, we have considered two performance metrics; accuracy, and execution time (convergence rate). • Accuracy: Accuracy(.) = 100 − EF (.), and is measured in %. • Convergence rate or execution time: During simulations, the time taken by the system to completely execute a given forecast strategy. 43

Forecast strategies for which execution time is small converge fastly as compared to the vice versa case. In this paper, execution time is measured in seconds. Table 5.1: Simulation parameters of mEDE and ANN based forecast

Parameter Number of forecasters Number of hidden layers Number of neurons in the hidden unit Number of iterations Momentum Initial weights Historical load data Bias value

Value 24 1 5 100 0 0.1 26 days 0

2300

2200

Load (KW)

2100

2000

1900

1800 Actual MI+ANN+mEDE forecast Bi−level forecast MI+ANN forecast

1700

1600

0

5

10

15

20

25

Time (hr)

Figure 5.2: DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (actual vs forecast)

5.2.5

Error Performance

Referring all the figures in this section, which are graphical illustrations of the proposed FS+ANN+mEDE based forecast model vs the two existing DALF models; MI+ANN and Bi-level. From Figs. 5.2, 5.5 and 5.8, it

44

X=3 Y = 3.8

4 3.5 3 X=2 Y = 2.2

Error (%)

2.5 2 1.5

X=1 Y = 1.24

1 0.5 0

MI+ANN+mEDE forecast

Bi−level forecast

MI+ANN forecast

Figure 5.3: DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (error performance) 120 X=1 Y = 102

X=2 Y = 101


100

80

60

40

20

0

X=3 Y = 6.49


Bi−level forecast

MI+ANN forecast

Figure 5.4: DAYTOWN (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (convergence rate analysis)

is clear that how well the proposed FS+ANN+mEDE based model predicts the future load of the three selected SGs. The ANN based forecaster captures the non-linearities in the history load time-series. This non-linear prediction capability is not only due to sigmoid activation function but also due to the selected training algorithm; multivariate auto regressive model. When we look at Figs. 5.3, 5.6 and 5.9, the % error of the MI+ANN 45

1800 Actual MI+ANN+mEDE forecast Bi−level forecast MI+ANN forecast

1700

Load (KW)

1600

1500

1400

1300

1200

0

5

10

15

20

25

Time (hr)

Figure 5.5: EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (actual vs forecast) X=3 Y = 3.81

4 3.5 3 X=2 Y = 2.23

Error (%)

2.5 2 1.5

X=1 Y = 1.24

1 0.5 0


Bi−level forecast

MI+ANN forecast

Figure 5.6: EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (error performance)

based forecast model is 3.8%, 3.81% and 3.7% for DAYTOWN, EKPC and FE, respectively. Similarly, the % error of the Bi-level forecast model is 2.2%, 2.23% and 2.1% for DAYTOWN, EKPC and FE, respectively. Finally, the % error of the proposed MI+ANN+mEDE based forecast model is 1.24%, 1.24% and 1.23% for DAYTOWN, EKPC and FE, respectively. From these results, it is clear that the existing MI+ANN based forecast 46

120 X=1 Y = 103

X=2 Y = 101


100

80

60

40

20

0

X=3 Y = 6.49


Bi−level forecast

MI+ANN forecast

Figure 5.7: EKPC (27th January, 2015): MI+ANN+mEDE forecast vs Bilevel forecast and MI+ANN forecast (convergence rate analysis) 8600 8400 8200

Load (KW)

8000 7800 7600 7400 7200 Actual MI+ANN+mEDE forecast Bi−level forecast MI+ANN forecast

7000 6800 6600

0

5

10

15

20

25

Time (hr)

Figure 5.8: FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (actual vs forecast)

model predicts the future load with the highest % error. This result is obvious due to absence of optimization module in MI+ANN based forecast model. In order to minimize this forecast error, the Bi-level forecast model uses EDE algorithm. Subject to further minimization of the forecast error, we have integrated an optimization module with our forecast strategy. In this module, we use our modified version of the EDE (mEDE) algorithm 47

X=3 Y = 3.7

4 3.5 3

Error (%)

2.5

X=2 Y = 2.1

2 1.5

X=1 Y = 1.23

1 0.5 0


Bi−level forecast

MI+ANN forecast

Figure 5.9: FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (error performance) 120

X=2 Y = 110

X=1 Y = 108


100

80

60

40

X=3 Y = 12.4

20

0


Bi−level forecast

MI+ANN forecast

Figure 5.10: FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-level forecast and MI+ANN forecast (convergence rate analysis)

for forecast error minimization. Results show that integration of mEDE algorithm yields fruitful results; the FS+ANN+mEDE based DALF model more tightly follows the target curve as compared to the other two existing DALF models. These figures show the positive impact of optimization module on the forecast error minimization between target curve and the forecast curve. It is obvious that the error curve decreases as the num48

ber of generations of the mEDE algorithm are increased. As the proposed FS+ANN+mEDE forecast model compares the forecast curve’s error (next generation) with the existing one (existing generation), and updates the weights if the forecast curve’s error is less than the existing one (survival of the fittest). Thus, as expected, the forecast error is significantly minimized as the forecast strategy is subjected to step ahead generations. 5.2.6


However, during simulations, we observed that from 89th to 100th generation, the forecast error does not exhibit significant improvement. Therefore, the proposed and the existing forecast models are not subjected to further generations. As mentioned in sections I, II and III that there exist a tradeoff between forecast accuracy and convergence rate. This trade-off is shown in Figs. 5.3–5.4, 5.6–5.7 and 5.9–5.10. From these figures, it is clear that the Bi-level forecast model improves the the accuracy of MI+ANN forecast model at the cost of relatively more execution time. On the other hand, our proposed MI+ANN+mEDE forecast model modifies the EDE algorithm to further improve the accuracy of the Bi-level forecast model, however, in doing so no further cost in terms of execution time is paid. However, the execution time of our proposed forecast model is still greater than the MI+ANN forecast model due to integration of optimization module.

49

Chapter 6 Modified Feature Selection, ANN and Modified EDE based Forecast Strategy

50

6.1

Motivation

In order to optimize the performance of SMG, especially its distribution part, a decision making entity is needed. Where, proper decision making leads to reduction in the electricity cost of end user(s) along with minimization of total power losses and peak to average ratio [4]. Keeping these objectives in mind, the current research in smart grids majorly focuses on the optimization techniques of power scheduling [34, 35, 36]. However, prior to scheduling, accurate load forecasting model is needed because accurate load forecasting leads to enhanced management of resources which in turn directly affects the economies of energy trade. Furthermore, lower similarities (high randomness) and non linearity in history load curves make the STLF more challenging as compared to long term load forecasting. In literature, STLF models do exist [16, 18, 23, 24, 27], however, accuracy of these models is either not satisfactory or either their convergence rate is slow. In other words, the convergence rate of these models decays with a rapid rate as their accuracy is increased. For example, [19] uses a hybrid approach to achieve accuracy, however, in doing so the complexity of the overall strategy is increased in terms of implementation which causes its convergence rate to decrease. In another work [18], convergence rate is increased, however, at the cost of increased forecast error. In order to solve the accuracy problem of [18], [16] integrates an optimizer with the strategy in [18]. In this way, accuracy is increased, however, in doing so, execution time is increased (convergence rate is decreased). To sum up, there exists a trade-off between accuracy and convergence rate. Furthermore, the indicated trade-off leads to non-scalability of these existing works whenever subjected to high loads. We mainly focus on the works in [16] and [18] not only to improve accuracy of the forecast strategy but also to increase its convergence rate. Convergence rate improvement while not compromising on accuracy means that the proposed strategy is scalable in terms of handling higher loads. Detailed description of the proposed S-STLF model is given in the following sections. 6.2

The Proposed S-STLF Model

The proposed methodology consists of three modules; feature selector, forecaster, and optimizer. We follow similar underlying technique as in [16] and [18], however, with some modifications in the feature selection technique, and EDE algorithm to achieve the desired objectives. At first, the candidate inputs (load time series) are given to the feature selection module, 51

which uses mutual information (entropy) based technique. The selected inputs are given to the forecast module (note: this module consists of ANN). Where, from the selected inputs, training and validation samples are constructed based on previously observed data (26 days observation period). These training and validation samples are given to the ANN which forecasts load of the next day and gives it to the optimization module. At first, the optimization module calculates error signal between target and the forecast ones. Then, using iteration based mEDE algorithm, the optimizer minimizes this error. In other words, we can say that error minimization is the objective function of the optimizer (note: a given load value with minimum objective function value has maximum fitness function value). Fig. 6.1 shows pictorial diagram of the proposed model.

Figure 6.1: The proposed S-STLF model

52

6.2.1

Modified MI based Feature Selection

Let, P be the matrix representation fed into the forecast module.  p(1, 1) p(2, 1)  p(1, 2) p(2, 2)   p(1, 3) p(2, 3)   P =  p(1, 4) p(2, 4)  p(1, 5) p(2, 5)   .. ..  . . p(1, n) p(2, n)

of the input load time series that is  p(m, 1) p(m, 2)   p(m, 3)   p(m, 4)   p(m, 5)   ..  .  p(3, n) . . . p(m, n) p(3, 1) p(3, 2) p(3, 3) p(3, 4) p(3, 5) .. .

... ... ... ... ... .. .

(6.1)

where, p(1, 1) is the power consumed at the 1st hour of day 1, p(1, 2) is the power consumed at the 1st hour of day 2, such that, p(m, n) is the power consumed at the mth hour of the nth day. As we know that there are 24 hours in a day, thereby, n = 24. The value of m is tightly linked with tuning during the ANN training. Large value of m implies fine tuning and vice versa. However, fine tuning results in relatively larger convergence time. Thus, there is a performance trade-off between fine tuning and convergence time. In machine learning, a subset of features from a given data set is selected via a process known as feature selection. The selected features are used in the training process of the learning algorithm. The selected best subset contains minimum number of features that highly contribute to accuracy while removing the redundant and irrelevant ones. In pre-processing, feature selection is very important because it is one of the ways that avoid the high dimensionality curse. In this regard, one the typically used technique is entropy based mutual information technique; used in classification problems like cancer classification, pattern recognition, image processing, etc. In this paper, subject to accuracy, we improve the mutual information based feature selection technique developed in [18] and is later on used in [16] as well. Detailed description is as follows. The joint entropy of two discrete random variables is defined as: XX H(X, Y t ) = − p(Xi , Yjt )log2 (p(Xi , Yjt )) ∀i, j ∈ {1, 2} i

(6.2)

j

where p(Xi , Yjl ) is the joint probability of the two discrete random variables; Xi is the input discrete random variable, and Yjt is the target value. Since the common information between the two discrete random variables is very 53

important in feature selection, that is formulated in [18] and is given as follows, XX p(Xi , Yjt ) t t (6.3) MI(X, Y ) = p(Xi , Yj )log2 p(Xi )p(Yjt ) i j This common information, MI(X, Y t ), is termed as mutual information. From (6.3), three conclusions can be drawn; (i) if MI(X, Y t ) = 0 then the two discrete random variables are independent (unrelated), (ii) MI(X, Y t ) is large then the two discrete random variables are closely related, and (iii) if MI(X, Y t ) is small the two random variables are loosely related. Thus, the candidate inputs are ranked with respect to the mutual information value between input and target values. In [16, 18], the target values are chosen as the last samples for every hour of the day among all the training samples (for every hour only one target value is chosen that is value of the previous day). Choice of the last sample seems logical as it is more closely to the upcoming day with respect to time, however, it may lead to serious forecast errors due to inconsideration of the average behaviour. However, consideration of only the average behaviour is also insufficient because the last sample has its own importance. To sum up, we come up with a solution that not only considers the last sample but also the average behaviour. Thus, we modify Eqn. 6.3 for three discrete random variables as, XXX MI(X, Y t , Y m ) = p(Xi , Yjt , Ykm ) i

j

×log2

k

p(Xi , Yjt , Ykm ) p(Xi )p(Yjt )p(Ykm )

(6.4)

where Ykm is the second target value, i.e., the mean value. As the informa-

54

tion source is binary encoded, so we can expand Eqn. 6.4 as follows. MI(X, Y t , Y m ) = p(X = 0, Y t = 0, Y m = 0) p(Xi = 0, Yjt = 0, Ykm = 0) ×log2 p(Xi = o)p(Yjt = 0)p(Ykm = 0) +p(X = 0, Y t = 0, Y m = 1) p(Xi = 0, Yjt = 0, Ykm = 1) ×log2 p(Xi = 0)p(Yjt = 0)p(Ykm = 1) ×log2

+p(X = 0, Y t = 1, Y m = 0) p(Xi = 0, Yjt = 1, Ykm = 0) p(Xi = 0)p(Yjt = 1)p(Ykm = 0)

+p(X = 0, Y t = 1, Y m = 1) p(Xi = 0, Yjt = 1, Ykm = 1) ×log2 p(Xi = 0)p(Yjt = 1)p(Ykm = 1)

(6.5)

+p(X = 1, Y t = 0, Y m = 0) p(Xi = 1, Yjt = 0, Ykm = 0) ×log2 p(Xi = 1)p(Yjt = 0)p(Ykm = 0)

+p(X = 1, Y t = 0, Y m = 1) p(Xi = 1, Yjt = 0, Ykm = 1) ×log2 p(Xi = 1)p(Yjt = 0)p(Ykm = 1) +p(X = 1, Y t = 1, Y m = 0) p(Xi = 1, Yjt = 1, Ykm = 0) ×log2 p(Xi = 1)p(Yjt = 1)p(Ykm = 0) ×log2

+p(X = 1, Y t = 1, Y m = 1) p(Xi = 1, Yjt = 1, Ykm = 1) p(Xi = 1)p(Yjt = 1)p(Ykm = 1)

In order to find the individual as well as the joint probabilities in Eqn. 6.5, we introduce an auxiliary variable Um , such that, Um = 4Y l + 2Y m + X,

∀ Y m , Y l , X ∈ {0, 1}

(6.6)

From Eqn. 6.6, it is clear that Um ∈ {0, 1, 2, · · · , 7}. U0m counts the number elements for which Um = 0 in the column of interest (column length l) of the input data matrix. Similarly, U1m counts the number of ones, U2m the number of twos, and finally U7m counts the number of sevens. This discussion enables us to determine the individual as well as the joint 55

probabilities subject to Eqn. 6.5 as follows, p(X = 0) = p(X = 1) = p(Y m = 0) = p(Y m = 1) = p(Y t = 0) = p(Y t = 1) =

U0m + U2m + U4m + U6m l U1m + U3m + U5m + U7m l U0m + U1m + U4m + U5m l U2m + U3m + U6m + U7m l U0m + U1m + U2m + U3m l U4m + U5m + U6m + U7m l

(6.7)

U0m l U2m t m p(X = 0, Y = 0, Y = 1) = l U4m t m p(X = 0, Y = 1, Y = 0) = l U 6m (6.8) p(X = 0, Y t = 1, Y m = 1) = l U1m p(X = 1, Y t = 0, Y m = 0) = l U 3m p(X = 1, Y t = 0, Y m = 1) = l U 5m p(X = 1, Y t = 1, Y m = 0) = l U 7m p(X = 1, Y t = 1, Y m = 1) = l In our proposed feature selection method, based on Eqn. 6.7 and Eqn. 6.8, we compute MI(X, Y t , Y m ) by means of Eqn. 6.5. After computation, the candidate inputs are ranked according to their MI values, and then irrelevant and redundant candidates are excluded from the parent set. Prior to feed the forecast module with the selected features from the input matrix P , the selected features are locally normalized and binary encoded. p(X = 0, Y t = 0, Y m = 0) =

6.2.2

ANN based STLF

The primary objective of this module is to devise an architecture that is enabled via learning to forecast the future load. However, the load sig56

nal shows significant non-linear characteristics. Thus, the designed STLF strategy must not only perform the tedious forecast job but also must deal with the non-linearities. Moreover, the forecast values need to be accurate with reasonable convergence rate. In this regard, many existing research works have devised different architectures like dynamic regression and transfer function [37, 38], and generalized auto regressive conditional heteroskedastic [39]. However, these architectures only enable linear prediction, whereas, the nature of load time series exhibit complementary characteristics. In order to solve this problem, [16] uses ANN architecture and [40] uses fuzzy ANNs. Both of these architectures have the ability to forecast non-linear load time series of SMGs. However, the fuzzy ANNs are slower in terms of convergence rate as compared to the ANNs. Thus, we choose ANNs based architecture for our forecast unit. The ANN architecture is composed of 24 cascaded ANs such that each AN is a forecaster to predict load of the next day subject to its respective hour. Each AN uses sigmoid activation function fact due to its ability to capture non-linearities. 1 fact (X, b) = (6.9) 1 + e−β(X−b) where, X is the input signal, b is the bias value, and the parameter β controls steepness of the activation function. X=

iX max

Xi w i

(6.10)

i=1

where wi is the associated weight with each input Xi . As mentioned earlier, this network must be enabled via learning to perform the tedious forecast job. Learning is of three types; supervised, unsupervised, and re-enforced. Since the load time series data is available in our case, thus we use supervised learning. In literature, many supervised learning rules exist like gradient descent, levenburg-marquart, leaf frog, and newton’s method [29]. However, we choose the multivariate auto regressive rule [28] due to two reasons; (i) relatively faster convergence rate as compared to the other rules, and (ii) it is used by the MI+ANN and the Bi-level forecast strategies in [18] and [16], respectively. Thus, by using the multi variate auto regressive model, the 24 ANs are trained by the training samples and are validated by the last (unseen) validation sample. In our work, the Mean Absolute Percentage Error (MAPE) for the 24 validation samples is considered as validation error. m 1 X |pactual (i, j) − pf orecast (i, j)| MAP E(i) = (6.11) m j=1 pact (i, j) 57

where pactual (i, j) denotes the actual load value, pf orecast (i, j) denotes the forecast load value, and m denotes the number of days that are under consideration. The multivariate auto regressive rule updates the wi s such that MAP E(i) is minimized. At this stage, the final forecast output of the ANN is pf orecast (i, j) having the least MAP E(i). 6.2.3

mEDE Based Forecast Error Minimization

From the previous sub section, we know that ANN returns STLF values of the next day with some error which is the least as per capabilities of the ANN’s activation functions and training algorithm. The forecast error ’MAP E(i)’ can be further minimized if we integrate an optimization technique with the forecast strategy. In other words, the MAP E(i) minimization would become the objective function of the optimization technique. Mathematically, Minimize MAP E(i) ∀i ∈ {1, 2, · · · , m}

(6.12)

However, in doing so, surplus time is spent. For applications, where execution time is relatively less important than accuracy, integration of optimization technique with the forecast strategy is not a feasible and vice versa. In particular, heuristic optimization techniques especially evolutionary algorithms are preferred over the commonly used optimization techniques like non-linear programming and linear programming. In this regard, [16] uses enhanced version of differential evolution algorithm that was originally proposed in [33]. However, the work in [16] can be improved in two ways; (i) accuracy of the trial vector generation, and (ii) convergence rate of the forecast strategy. The ways in which the two improvements are made, are discussed as follows. According to [16], the trial vector ’y’ for ith individual in generation t is as follows, ut (i, j) if rand(j) ≤ F F (Ut (i)) (6.13) y′ t (i, j) = xt (i, j) if rand(j) > F F (Ut (i)) where, ut (i, j) is the corresponding mutant vector for the parent vector xt (i, j). In (1), F F (.) that ranges between 0 and 1, is the fitness function and Rand(j) is a uniformly distributed random number [0, 1]. Between Xt (i) and Yt (i), the corresponding next generation offspring Xt+1 (i) is selected as follows: y′ t (i, j) if MAP E(y′ t (i)) ≤ MAP E(xt (i)) yt (i, j) = (6.14) xt (i, j) otherwise 58

From Eqn. 6.13 and Eqn. 6.14, it is deduced that offspring selection for generation t+1 depends on trial vector of generation t that in turn has a strong dependence on rand(.) and F F (.). In other words, the EDE algorithm [16] updates load values if the generated random number (rand(.) ∈ [0, 1]) is less than the fitness function (F F (.) ∈ [0, 1]) of the candidate value for load update. Thus, there is a big question mark on the fitness of the updated load value. We fix this problem by eliminating the influence of random number on the selection of the load value to be updated, i.e., the comparison is now between fitness function of the candidate value for load update and the previous one. In this way, the selected load update value will be fit enough in terms of accuracy. Thereby, we modify (1) as follows: ( (i) < F F (Ut (i)) ut (i, j) if XtX(itmax ) y′ t (i, j) = (6.15) Xt (i) xt (i, j) if Xt (imax ) ≥ F F (Ut (i)) Now, the fitness functions for the parent and mutant vectors are defined by [16] as: F F (Ut (i)) =

F F (Xt (i)) =

1 M AP E(Ut (i)) 1 M AP E(Ut (i))

+

1 M AP E(Xt (i))

1 M AP E(Xt (i)) 1 M AP E(Xt (i))

+

1 M AP E(Ut (i))

(6.16)

(6.17)

From Eqn. 6.16, if we assume that each operation (addition or division) takes 1 time unit during algorithm execution, then calculation of F F (.) will take 5 time units during 1 iteration. As stated in [16], the number of iterations of the enhanced differential algorithm is 100. Thus, the algorithm would take about 500 time units to calculate 1 fitness function. Moreover, the algorithm calculates 2 fitness functions (based on Eqn. 6.16 and Eqn. 6.17) during each iteration; total execution time= 1000 units. In order to reduce this time, we re-write the fitness functions as follows: F F (Ut (i)) =

MAP E(Xt (i)) MAP E(Ut (i)) + MAP E(Xt (i))

(6.18)

F F (Xt(i)) =

MAP E(Ut (i)) MAP E(Xt (i)) + MAP E(Ut (i))

(6.19)

Similarly, Eqn. 6.18 and Eqn. 6.19 would take 400 time units to calculate two fitness functions during 100 iterations of the algorithm. In this way, the convergence rate of EDE algorithm is enhanced. Algorithm 1 shows pseudo-code of the proposed algorithm in detail. 59

Algorithm 5 : Pseudo-code of the proposed S-STLF model 1: P ← historical time series of load data 2: Compute local maximum w.r.t each column of the P matrix 3: Perform local normalization w.r.t each column of the P matrix 4: MI(X, Y t , Y m ) ← rank the candidate inputs based on Eqn. 6.5 5: P ∗ ← remove redundant and irrelevant features based on the computed MI(X, Y t , Y m ) values 6: P ∗ ← remove irrelevant features based on the computed MI(X, Y t , Y m ) values 7: Compute local median w.r.t each column of the P ∗ matric 8: P b ← binary encode P ∗ 9: Feed the ANN based STLF module with P b 10: Perform the training process and return MAP E(.) [28] 11: Random initialization of the entire mEDE population while respecting the lower and upper limits [18, 24] 12: while Max. no. of generations not reached do 13: for Each individual (Xt (i)) do 14: Obtain MAP E(Xt (i)) from the forecast module 15: Compute mutant vector; Ut (i) 16: Obtain MAP E(Ut (i)) from the forecast module 17: Compute F F (Xt (i)) based on Eqn. 6.19 18: Compute F F (Ut (i)) based on Eqn. 6.18 19: Generate Y′ t (i), based on Eqn. 6.15 20: Select offspring for the next generation based on Eqn. 6.14 21: i=i+1 22: end for 23: t=t+1 24: end while 25: Return best individuals 6.3

Simulation Results

Subject to performance evaluation of our newly proposed S-STLF model for MCNs, we conduct simulations in MATLAB. In these simulations, S-STLF model is compared with two existing ANN based STLF strategies; Bi-level forecast and MI+ANN forecast. Subject to fair comparison of the selected existing strategies and our newly proposed S-STLF model, historical load time series is taken from the publicly available PJM electricity market [30] and simulation parameters (as shown in Table 6.1) are kept same for all the three strategies. The initialization and optimization bounds are given 60

800 780 760

Load (KW)

740 720 700 680

Actual AFC−STLF Bi−level forecast MI+ANN forecast

660 640

0

5

10

15

20

25

Time (hr)

Figure 6.2: PJMW: Actual vs forecast 180 Actual S−STLF Bi−level forecast MI+ANN forecast

170

Load (KW)

160

150

140

130

120

0

5

10

15

20

25

Time (hr)

Figure 6.3: EKPC: Actual vs forecast

by the following equations. Initialization bounds = [zeros(No.Of DecisionV ariables, 1) ones(No.Of DecisionV ariables, 1)] Optimization bounds = Initialization bounds

(6.20) (6.21)

For justification of all these parameters refer [16, 18, 29]). In this paper, we have considered three performance metrics which are defined as follows. 61

230

220

Load (KW)

210

200

190

180 Actual S−STLF Bi−level forecast MI+ANN forecast

170

160

0

5

10

15

20

25

Time (hr)

Figure 6.4: DAYTOWN: Actual vs forecast 850

Load (KW)

800

750

700

650

Actual S−STLF Bi−level forecast MI+ANN forecast 0

5

10

15

20

25

Time (hr)

Figure 6.5: FE: Actual vs forecast

• Accuracy: Accuracy = 100 − MAP E. In this thesis, accuracy is measured in %. • Execution time or convergence rate: During simulations, the time spent by the system while executing a specific forecast strategy. For any forecast strategy, small execution time means that the strategy is able to complete the forecast strategy with a faster convergence rate and vice versa. In this thesis, execution time is measured in seconds.

62

4.5

X=3 Y = 4.1

4 3.5

Error (%)

3 X=2 Y = 2.4

2.5 2 1.5 1

X=1 Y = 0.5

0.5 0

AFC−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.6: PJMW: Error performance X=3 Y = 3.77

4 3.5 3 X=2 Y = 2.19

Error (%)

2.5 2 1.5 1 X=1 Y = 0.48

0.5 0

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.7: EKPC: Error performance

• Scalability: A forecast strategy is scalable, if it does not show significant performance degradation in terms of accuracy and convergence rate whenever subjected to high load.

63

4.5

X=3 Y = 4.1

4 3.5

Error (%)

3 X=2 Y = 2.4

2.5 2 1.5 1

X=1 Y = 0.49

0.5 0

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.8: DAYTOWN: Error performance 5

X=3 Y = 4.5

4.5 4

Error (%)

3.5 X=2 Y = 2.8

3 2.5 2 1.5 1

X=1 Y = 0.5

0.5 0

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.9: FE: Error performance

6.3.1

Error Performance

Figs. 6.2–6.5 shows comparison of three forecast strategies; Our proposed S-STLF, and two existing ones (Bi-level forecast and MI+ANN forecast). These figures are obtained by taking historical load data of four USA SMGs; EKPC, DAYTOWN, PJMW, and FE. From these graphical illustrations, it is clear that all the three forecast strategies are able to capture the non-linearities in the actual curve. This is obvious as all the three forecast 64

Table 6.1: Simulation parameters of mFS, ANN, and mEDE based forecast

Parameter No. of ANs No. of hidden layers No. of ANs in the hidden unit No. of iterations (ANN training) Momentum Learning rate for hidden unit Learning rate for output unit Initial weights Lagged input load samples Bias value (b) No. of objectives No. of generations No. of decision variables Population size

Value 24 1 5 100 0 5 2 0.1 26 days 0 1 100 1 24

strategies use sigmoid activation function for ANs. The training algorithms also enable these strategies to capture the non-linearities. From Figs. 6.2– 6.5, it is also observed for all the SMGs that our S-STLF model’s curve more closely follows the actual curve as compared to Bi-level and MI+ANN forecast strategies. This observation, in terms of numerical values, is plotted in Figs. 6.6–6.9 which show that the %errors of S-STLF, Bi-level and MI+ANN strategies are 0.5, 2.4 and 4.1 for PJMW, 0.48, 2.19 and 3.77 for EKPC, 0.49, 2.4 and 4.1 for DAYTOWN, and 0.5, 2.8 and 4.5 for FE, respectively. In terms of error performance, the Bi-level strategy is better than the MI+ANN based forecast strategy. A well-grounded reason for this behaviour is minimization of the forecast error via integration of the EDE optimization technique (the Bi-level forecast strategy uses EDE based optimizer whereas the MI+ANN based strategy does not use the optimizer). On the other hand, the S-STLF model’s error performance is significantly improved as compared to the Bi-level forecast strategy (refer Fig. 6.6–6.9). In S-STLF model, modifications in the EDE algorithm (refer Eqn. 6.13 to Eqn. 6.15) lead to off-spring selection without the influence of random number, whereas, the Bi-level forecast strategy leads to off-spring selection under the strong influence of random number. In this way, the next generation offspring of S-STLF model are more fit in terms of accuracy as compared to that of the Bi-level forecast strategy. Moreover, S-STLF’s feature selection process considers two parameters (the last sample and the average behaviour), whereas, the Bi-level forecast strategy ignores the av65

erage behaviour while selecting features from the historical load data. Due to these two reasons, S-STLF model shows improved performance in terms of forecast error when compared to the Bi-level forecast and the MI+ANN based forecast strategies. 120 X=2 Y = 102


100

80

60

X=1 Y = 50.4

40

20

0

X=3 Y = 6.49

AFC−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.10: PJMW: Convergence rate analysis

120 X=2 Y = 101


100

80

60

X=1 Y = 49.3

40

20

0

X=3 Y = 6.49

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.11: EKPC: Convergence rate analysis

66

120 X=2 Y = 101


100

80

60

X=1 Y = 49.3

40

20

0

X=3 Y = 6.49

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.12: DAYTOWN: Convergence rate analysis X=2 Y = 110

120


100

80

60

X=1 Y = 50.7

40

X=3 Y = 12.4

20

0

S−STLF

Bi−level forecast

MI+ANN forecast

Figure 6.13: FE: Convergence rate analysis

6.3.2


As mentioned in the previous subsection, S-STLF model shows better %error performance than the Bi-level and the MI+ANN based forecast strategies. However, this error minimization is achieved at the cost of more execution time as shown in Fig. 6.10–6.13. When the optimization module is integrated, these figures show an increase in execution time from 6.49 seconds to 102 seconds for PJMW, 6.49 seconds to 101 seconds for 67

EKPC and DAYTOWN, 12.4 seconds to 110 seconds for FE, respectively. Thus, there exists a trade-off between forecast accuracy and convergence rate–surplus 95.51 seconds are paid to achieve 1.7% more accuracy (for PJMW). Our proposed S-STLF strategy decreases this execution time because it uses mEDE algorithm (refer Eqn. 6.16 to Eqn. 6.19). In S-STLF, the two mentioned modifications in the Bi-level forecast strategy leads to decreased execution time (from 102 seconds to 50.4 seconds, from 101 seconds to 49.3 seconds, from 110 seconds to 50.7 seconds for the respective SMGs). 6.3.3

Scalability Analysis

We analyse the S-STLF model’s scalability as compared to the two existing forecast strategies; MI+ANN, and Bi-level. In this regard, the three forecast strategies are subject to load variation such that their response in terms of error performance (refer Fig. 6.14) is investigated. Fig. 6.14 shows that the error (%) of both MI+ANN forecast and Bi-level forecast significantly increases as the load of SMG is increased from 188 MW to 808.5 MW. Since increased load means that the appliances are either increased in number or either their power consumption is increased, the MI+ANN forecast, the Bi-level forecast, and the S-STLF model are subject to outliers in the training process. Both the existing schemes use the MI based feature selection technique which only considers the last sample to calculate MI values of the input load candidates. Choice of the last sample seems logical as it is the most close in time sample to the upcoming day, however, it is insufficient to combat outliers that arise due to increased load. S-STLF model solves this problem by not only considering the last sample but also the average sample. The modified MI based feature selection technique enables S-STLF model to combat the outliers, thereby, S-STLF’s curve does not show significant increase in terms of error performance. Besides complete picture of the aforementioned discussion, Table 6.2 also summarizes selected existing forecast strategies (that are earlier presented in related work section) in terms of selected performance metrics.

68

4.5 MI+ANN forecast Bi−level forecast S−STLF

4 3.5

Error (%)

3 2.5 2 1.5 1 0.5 0 100

200

300

400

500 600 Load (kW)

700

800

900

Figure 6.14: Impact of load on error performance

69

Table 6.2: Performance evaluation of the selected forecast strategies

Forecast Accuracy Execution strategy time Gaussian Low High distribution based forecast [25] Regularized Low High Gaussian distribution based forecast [26]

Convergence Remarks rate Slow Error performance, convergence rate and scalability need improvement

MI+ANN forecast [18] Markov chain based forecast [22] Bi-level forecast [16]

Low

Low

Fast

Low

Low

Fast

Error performance needs improvement

Moderate

High

Slow

Hybrid ANN based forecast [23] S-STLF model

Moderate

High

Slow

High

Moderate

Moderate

Integration of optimization technique minimizes the forecast error, however, at the cost of high execution time (slow convergence rate) and scalability. Forecast error is minimized, however, at the cost of high execution time (slow convergence rate) and scalability. Suggested changes yield fruitful results in terms of accuracy and scalability, and at the cost of fair enough convergence rate

70

Slow

Relatively better than the Gaussian distribution based forecast strategy, however, error performance, convergence rate and scalability still need improvement Error performance needs improvement

Chapter 7 Conclusion and Future Work

71

7.1

Conclusion

Conventional grid is outdated and needs renovation because it is no longer flexible enough to meet future challenges. By using the latest technology and implementing new strategies, it is possible to increase grid capacity, efficiency, reliability, power quality, and sustainability. Smart grid has the potential to fulfill these requirements by exploiting two way communications between consumer and retailer. From this study, we conclude that successful implementation of smart grid technologies majorally depends on consumers’ participation. This participation does not mean that they have to compromise their comfort level. It is up to the consumers to decide their degree of participation in these programs. In this regard, the bi-directional communication between prosumer and utility plays a significant role that is highly influenced by the price/load signal. Furthermore, lower similarities (high randomness) and non-linearity in history load curves make micro-grid’s STLF more challenging as compared to long term load forecasting. The ongoing research contributions, w.r.t STLF model development for SMGs, motivate us towards investigation of its current body. We found that many STLF models exist, however, these models tradeoff between accuracy and convergence rate. For example, the MI+ANN forecast strategy has a faster convergence rate, however, at the cost of accuracy. Similarly, the Bi-level forecast strategy minimizes the forecast error, however, at the cost of increased execution time. In order to overcome the indicated trade-off, we have proposed S-STLF model for SMGs. The newly proposed S-STLF model achieves approximately 99.5% accuracy which is better than the existing Bi-level (97.6%) and MI+ANN (95.9%) based forecast strategies, respectively. On the other hand, w.r.t convergence rate, performance order of our proposed model and the two existing strategies is: MI+ANN>S-STLF>Bi-level. To sum up, our proposed SSTLF model achieves the highest relative accuracy at the cost of moderate execution time. We also conclude that these results provide justification of the correctness of our modifications in the selected modules. 7.2

Future Work

On the basis of this study, our future directions are focused on the way(s) in which efficient and cost effective decisions are made. To make the future smart grid implementation possible, the challenges, advantages and disadvantages of optimization techniques needs to be addressed. Subject to price/load signal forecast and as a potential future research area, the three 72

basic modules needs further exploration in terms of accuracy and execution time. In future, we are interested in feature selection/extraction via advanced signal processing techniques. We are also interested in expanding our research horizon from only forecast strategies to a complete load forecast plus scheduling technique. It is, therefore, envisioned that future work in smart grids will be focused on techniques and algorithms satisfying all the stake holders.

73

Chapter 8 References

74

Bibliography

[1] L. Gelazanskas, and K. A. Gamage, “Demand side management in smart grid: A review and proposals for future direction”, Sustainable Cities and Society, Vol. 11, Iss. 2014, pp. 22–30, 2014. [2] P. Siano, “Demand response and smart gridsA survey”, Renewable and Sustainable Energy Reviews, Vol. 30, Iss. 2014, pp. 461–478. [3] J. Aghaei, and M-I. Alizadeh, “Demand response in smart electricity grids equipped with renewable energy sources: A review”, Renewable and Sustainable Energy Reviews, Vol. 18, Iss. 2013, pp. 64–72, 2013. [4] M. R. Alam, M. B. I. Reaz, and M. A. M. Ali, “A review of smart homesPast, present, and future”, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 42.6 (2012): 11901203. [5] A. Kailas, V. Cecchi, and A. Mukherjee, “A survey of communications and networking technologies for energy management in buildings and

75

home automation”, Journal of Computer Networks and Communications 2012 (2012). [6] Y. Yan, Y. Qian, H. Sharif, and Tipper, “A Survey on Smart Grid Communication Infrastructures: Motivations, Requirements and Challenges”, IEEE Communications Surveys & Tutorials, Vol. 15, No. 1, pp. 5–20, 2013. [7] R. R. Mohassel, A. Fung, F. Mohammadi, and K. Raahemifar, “A survey on advanced metering infrastructure”, International Journal of Electrical Power and Energy Systems, Vol. 63, Iss. 2014, pp. 473–484, 2014. [8] www.ciosummits.com/media/pdf/solution spotlight/sas forecastingenhance-smart-grid.pdf (accessed on: March 30, 2015). [9] S. Mohammadi, S. Soleymani, and B. Mozafari, “Scenario-based stochastic operation management of MicroGrid including Wind, Photovoltaic, Micro-Turbine, Fuel Cell and Energy Storage Devices”, Electrical Power and Energy Systems, No. 54, Iss. 2014, pp. 525-535, 2014. [10] I. Atzeni, L. G. Ordonez, G. Scutari, D. P. Palomar, and J. R. Fonollosa, “Demand-Side Management via Distributed Energy Generation and Storage Optimization”, IEEE Transactions on Smart Grid, Vol. 4, No. 2, pp. 866–876, 2013. [11] C. O. Adika, and L. Wang, “Autonomous Appliance Scheduling for Household Energy Management”, IEEE Transactions on Smart Grid, Vol. 5, No. 2, pp. 673–682, 2014. [12] I. Koutsopoulos, and L. Tassiulas, “Optimal Control Policies for Power Demand Scheduling in the Smart Grid”, IEEE Journal of Selected Areas in Communications, Vol. 30, No. 6, pp. 1049–1060, 2012. [13] H. Hermanns and H. Wiechmann, “Demand-Response Management for Dependable Power Grids”, Embedded Syst. Smart Appliances Energy Managem, Vol. 3, Iss. 2013, pp. 1–22, 2013. [14] M. Hashmi, S. Hanninen, and K. Maki, “Survey of smart grid concepts, architectures, and technological demonstrations worldwide”, Innovative Smart Grid Technologies (ISGT Latin America), 2011 IEEE PES Conference on. IEEE, 2011. [15] www.zeitgeistlab.ca/doc/Unveiling the Hidden Connections between E-mobility and Smart Microgrid.html (accessed on: Feb 20, 2015). 76

[16] N. Amjady, F. Keynia, and H. Zareipour, “Short-Term Load Forecast of Microgrids by a New Bilevel Prediction Strategy”, IEEE Transactions on Smart Grid, Vol. 1, No. 3, pp. 286–294, 2010. [17] H-T. Zhang, F-Y. Xu, and L. Zhou, “Artificial neural network for load forecasting in smart grid”, In Machine Learning and Cybernetics (ICMLC), 2010 International Conference on, Vol. 6, pp. 3200–3205, IEEE, 2010. [18] N. Amjaday, and F. Keynia, “Day-Ahead Price Forecasting of Electricity Markets by Mutual Information Technique and Cascaded NeuroEvolutionary Algorithm”, IEEE Transactions on Power Systems, Vol. 24, No. 1, pp. 306–318, 2009. [19] N. Liu, Q. Tang, J. Zhang, W. Fan, and J. Liu, “A Hybrid Forecasting Model with Parameter Optimization for Short-term Load Forecasting of Micro-grids”, Applied Energy, Iss. 129, Vol. 2014, pp. 336–345, 2014. [20] http://www.energybiz.com/article/13/10/load-forecasting-smart-grid (accessed on: 25t h March, 2015). [21] H. Liang, A. K. Tamang, W. Zhuang, and X. Shen, “Stochastic Information Management in Smart Grid”, IEEE Communications Surveys & Tutorials, Vol. 16, No. 3, pp. 1746–1770, 2014 [22] H. Meidani, R. Ghanem, “Multiscale Markov models with random transitions for energy demand management”, Energy and Buildings, Vol. 61, Iss. 2013, pp. 267-274, 2013. [23] H. T. Yang,J. T. Liao, and C. I. Lin, “A Load Forecasting Method for HEMS Applications”, Power Tech, IEEE Grenoble, pp. 1–6, 2013. [24] N. Amjady, F. Keynia, “Electricity market price spike analysis by a hybrid data model and feature selection technique”, Electric Power Systems Research, Vol. 80, Iss. 2010, pp. 318-327, 2010. [25] J. K. Gruber, and M. Prodanovic, “Residential energy load profile generation using a probabilistic approach”, 2012 UKSim-AMSS 6th European Modelling Symposium, IEEE, pp. 317-322, 2012. [26] P. Kou, and F. Gao, “A sparse heteroscedastic model for the probabilistic load forecasting in energy-intensive enterprises”, Electrical Power and Energy Systems, Iss. 55, Vol. 2014, pp. 144-154, 2014. [27] L. Hernandez, C. Baladrn, J. M. Aguiar, B. Carro, A. J. SanchezEsguevillas, and J. Lloret, “Short-Term Load Forecasting for Micro-

77

grids Based on Artificial Neural Networks”, Energies, Vol. 2013, Iss. 6, pp. 1385–1408, 2013. [28] C. W. Anderson, E. A. Stolz, and S. Shamsunder, “Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks”, IEEE Transactions on Biomedical Engineering, Vol. 45, No. 3, pp. 277–286, 1998. [29] A. P. Engelbrecht, “Computational intelligence: an introduction”, John Wiley & Sons, second edition, 2007. [30] www.pjm.com (accessed on: Feb 01, 2015). [31] N. Amjady, and F. Keynia, “Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary algorithm”, J. Energy, Vol. 34, No. 1, pp. 46-57, 2009. [32] R. H. Lasseter, and P. Piagi, “Microgrid: A conceptual solution”, in Proc. IEEE 35th Annu. Power Electron. Specialists Conf., Aachen, Germany, pp. 4285-4290, 2004. [33] R. Storn and K. Price, “Differential evolutionA simple and efficient heuristic for global optimization over continuous spaces”, J. Global Optim., Vol. 11, No. 4, pp. 341-359, 1997. [34] Q. Zhu, Z. Han, and T. Basar, “A differential game approach to distributed demand side management in smart grid”, in Proc. IEEE ICC 2012, pp. 3345–3350, Ottawa, Canada, 10–15 Jun, 2012. [35] J. Soares, M. Silva, T. Sousa, Z. Vale, and H. Morais, “Distributed energy resource short-term scheduling using Signaled Particle Swarm Optimization”, Energy, Vol. 42, No. 1, pp. 466–476, 2012. [36] Z. Zhu, J. Tang, S. Lambotharan, W. H. Chin, and Z. Fan, “An integer linear programming based optimization for home demandside management in smart grid”, in Proc. IEEE PES ISGT 2012, Washington, DC, USA, 16–20 Jan, 2012. [37] F. J. Nogales, J. Contreras, A. J. Conejo, and R. Espinola, “Forecasting next-day electricity prices by time series models”, IEEE Transactions on Power Systems, Vol. 17, No. 2, pp. 342-348, 2002. [38] H. Zareipour, C. A. Canizares, K. Bhattacharya, and J. Thomson, “Application of public-domain market information to forecast Ontario’s wholesale electricity prices”, IEEE Transactions on Power Systems, Vol. 21, No. 4, pp. 1707-1717, 2006.

78

[39] R. C. Garcia, J. Contreras, M. V. Akkeren, and J. B. C. Garcia, “A GARCH forecasting model to predict day-ahead electricity prices”, IEEE Transactions on Power Systems, Vol. 20, No. 2, pp. 867-874, 2005. [40] H. Mao, X. J. Zeng, G. Leng, Y. J. Zhai, and J. A. Keane, “Short-term and midterm load forecasting using a bilevel optimization model”, IEEE Transactions on Power Systems, Vol. 24, No. 2, pp. 1080-1090, 2009.

79

A Scalable Short-Term Load Forecasting Model for ...

A Scalable Short-Term Load Forecasting Model for ...

Suggest Documents

A short-term load forecasting model for demand response applications

LOAD FORECASTING

LOAD FORECASTING

Load profile generator and load forecasting for a ... - IEEE Xplore

Short-Term Load Forecasting Using a NeuroFuzzy Model Based on ...

Electric Load Forecasting Model for the State of Bahrain Network

A Hybrid Short Term Load Forecasting Model of an

A PSO-SVM-based 24 Hours Power Load Forecasting Model

Residential Load Forecasting Usin

Chapter 12 LOAD FORECASTING

Chapter 12 LOAD FORECASTING

Introduction to Load Forecasting

Multifactorial Short-Term Load Forecasting for Enugu Load Center ...

Multifactorial Short-Term Load Forecasting for Enugu Load Center ...

Improvements in shortterm forecasting of ... - Wiley Online Library

Hermes: Scalable and Load Distribution Engine for

Geographic Load Balancing for Scalable ... - Semantic Scholar

A Bayesian spatio-temporal model for forecasting

Short-term Electricity Load Forecasting Model and Bayesian ...

Social Forecasting: Evolving a Model for Indian

Agent-Based Spatial Load Forecasting

Electrical load forecasting - WIT Press

Electric Load Forecasting - KU Leuven

Load forecasting method considering temperature