A Hybrid Multiagent Framework With Q-Learning for ... - CiteSeerX

2434

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 26, NO. 4, NOVEMBER 2011

A Hybrid Multiagent Framework With Q-Learning for Power Grid Systems Restoration Dayong Ye, Minjie Zhang, Member, IEEE, and Danny Sutanto, Senior Member, IEEE

Abstract—This paper presents a hybrid multiagent framework with a Q-learning algorithm to support rapid restoration of power grid systems following catastrophic disturbances involving loss of generators. This framework integrates the advantages of both centralized and decentralized architectures to achieve accurate decision making and quick responses when potential cascading failures are detected in power systems. By using this hybrid framework, which does not rely on a centralized controller, the single point of failure in power grid systems can be avoided. Further, the use of the Q-learning algorithm developed in conjunction with the restorative framework can help the agents to make accurate decisions to protect against cascading failures in a timely manner without requiring a global reward signal. Simulation results demonstrate the effectiveness of the proposed approach in comparison with the typical centralized and decentralized approaches based on several evaluation attributes. Index Terms—Hybrid solution, multiagent systems, Q-learning.

I. INTRODUCTION

I

N an electricity network, catastrophic disturbances involving loss of generators can lead to cascading failures in the whole system. The power system will be considered to be in a vulnerable state when it is facing a threat of widespread outage [1]. Cascading failures in power grid systems bring great inconvenience to people’s lives and are very harmful for both power industries and the social economy. To avoid such failures, it is necessary to include restoration approaches in power grid systems. The general principle of current power restoration approaches is that when faults occur on a/some particular node(s), e.g., generator(s), restoration procedures are executed to isolate the faults from the remaining power grid system and take relevant actions to restore supplying as much load as possible. Cascading failures normally happen in a very short time, e.g., less than 30 s. This situation requires nodes in a power grid system to make accurate decisions within a very limited period. Nodes generally include generators, loads, switches, and so on. Agent technology, as a powerful artificial intelligence technique, has been used in power grid systems for different purposes. An intelligent agent is able to make rational decisions auManuscript received October 25, 2010; revised October 28, 2010, February 04, 2011, and April 13, 2011; accepted May 11, 2011. Date of publication June 09, 2011; date of current version October 21, 2011. This work was supported by an ARC Linkage project (LP0991428) from Australia Research Council and TransGrid, Australia. Paper no. TPWRS-00861-2010. D. Ye and M. Zhang are with the School of Computer Science and Software Engineering, University of Wollongong, Wollongong 2522, Australia (e-mail: [email protected]; [email protected]). D. Sutanto is with the School of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia (e-mail: [email protected]). Digital Object Identifier 10.1109/TPWRS.2011.2157180

tonomously in a dynamic environment, namely blending pro-activeness and reactiveness, showing rational commitments to decision making, and exhibiting flexibility when facing an uncertain and changing environment [2]. A multiagent system is composed of several intelligent agents and individual agents may perform different roles. The agents in a multiagent system can work autonomously, make decisions independently, and interact with each other to achieve global goals. Currently, most multiagent-based approaches for power grid systems restoration have central controllers which are in charge of various activities of the systems. However, these systems, with a central controller, are very difficult to take system-wide sequential actions in large-scale power gird systems, including communication, analysis, prediction, and decision making, within a very short time. To overcome this limitation, some researchers proposed decentralized multiagent systems for power grid systems restoration, which allow nodes in the systems to communicate only with their neighbors to acquire information. Nevertheless, the information obtained by nodes may be incomplete if only communicating with their neighboring nodes and, therefore, these individual nodes might not make accurate decisions. In this paper, we introduce a hybrid multiagent framework with a Q-learning algorithm which can help power grid systems restore quickly, when faults occur, to avoid cascading failures. Q-learning is one of the simplest algorithms to implement reinforcement learning, while reinforcement learning is one kind of learning methods which is employed by an agent to learn its optimal action through trail-and-error interactions with a dynamic environment [3]. The benefit of reinforcement learning is that an agent does not need a teacher to learn how to solve a problem. The only signal used by the agent to learn from its actions in dynamic environments is reward, a number which tells the agent if its last action was good or not [4]. The contribution of this paper focuses on the following three aspects: 1) the proposed hybrid framework combines the benefits of both centralized and decentralized architectures, which can avoid the single point of failure1 and can provide sufficient information for nodes to make accurate decisions; 2) he framework is topology independent and, hence, suitable for various types of power grid systems; 3) in order to realize the framework, an agent Q-learning algorithm is also proposed, which can help agents make accurate decisions in power grid systems. The rest of this paper is organized as follows. In the next section, an overview of current related research is provided. In Section III, the framework and the algorithm design are proposed. Section IV describes a simulation example to objectively demonstrate how our framework performs. The simulation results regarding the performance of the presented framework and 1Single point of failure means that when the central controller is broken, the whole system will be out of order.

0885-8950/$26.00 © 2011 IEEE

YE et al.: A HYBRID MULTIAGENT FRAMEWORK WITH Q-LEARNING FOR POWER GRID SYSTEMS RESTORATION

the relevant discussion are given in Section V. Finally, this paper is concluded in Section VI. II. RELATED WORK In the last decade, intelligent agent and multiagent technologies had been adopted for various aspects of power grid systems management, such as restoration [5], relaying [6], maintenance [7], substation automation [8], and state estimation [9]. In this paper, we focus on power grid systems restoration when one or several generators in a power grid system are out of order. Jung and Liu [1] presented a multiagent framework which could provide real-time information acquisition and interpretation, quick vulnerability evaluation of both power and communication systems, and preventive and corrective self-healing strategies to avoid catastrophic failures of a power system. Later, Jung et al. [10] improved their framework by introducing an adaptive learning method for load shedding. Their framework, however, was centralized in nature, as agents in the highest layer of the framework analyzed and controlled power systems from a global view. In addition, their learning method used a global reward signal, i.e., total load shedding in the power system, to reinforce the learning process. Nagata and Sasaki [11] developed a multiagent framework for power system restoration. Their framework was in a centralized design, consisting of several bus agents and a single facilitator agent. Bus agents were used to decide suboptimal target configuration after a fault occurrence, while the facilitator agent acted as a manager in the decision process. Srivastava et al. [12] proposed a similar method, in which a coordinating agent with global information was used for reconfiguration of the shipboard power system. Nagata et al. [13] improved the restoration methodology proposed in [11], but facilitator agents were still required for coordination of the agents. Momoh and Diouf [14] refined Nagata and Sasaki’s framework [11] and utilized power generation agents, bus agents and circuit breaker agents to distribute the reconfiguration functionalities. However, the system proposed in [14] still needs facilitator agents. The methods proposed in [11]–[14] are centralized in nature. The centralized manner may bring the framework the potential of the single point of failure and limits the scalability of the framework. To overcome the defect of centralized methods, some decentralized methods have been presented as well. Nordman and Lehtonen [9] proposed a new agent concept for managing an electrical distribution network. Their concept consisted of three elements, namely a secondary substation object, decentralized functionalities, and an information access model. Howbeit, all secondary substations are homogeneous in terms of the same type of agent intelligence, because all secondary substations are copies of the secondary substation objects. This homogeneous feature might restrict the adaptation of this concept. Solanki et al. [5] provided a multiagent framework, with a detailed design of each agent, to restore a power grid system after a fault appears. The framework is decentralized and topology independent which can overcome the scalability of limitations of existing restoration techniques. Although the decentralized manner can avoid the single point of failure, the accurate decisions cannot be always guaranteed. This is because each node in the decentralized architecture has only a limited view of the

2435

whole environment and makes decisions based only on its limited information. A multiagent system-based reconfiguration approach for mesh structured power grid systems was introduced by Huang et al. [15]. Even though the authors claimed that the architecture of the multiagent system was decentralized, some global information was still employed, e.g., the net power of the power system. Moreover, their work overlooked communication progress in the multiagent system, which is very important for a multiagent system to perform properly. In this research, a hybrid multiagent framework with a Q-learning algorithm for power restoration in power grid systems is devised, which deploys intelligent agents in different power grids, involving generator agents, load agents, and switch agents, to execute monitoring, analyzing, and maintaining activities. Agents in different nodes can also interact with each other to exchange their local information. The proposed framework utilizes FIPA [16] for agents’ communication, as FIPA is enough for our purpose and is easy to implement. Through clustering intelligent agents to grids, some higher level agents are generated, which have a broader view of the power grid system than each individual agent. In each grid, a higher level agent can provide suggestions to the individual agents for their decision making. Compared with centralized approaches, e.g., [11] and [12], this framework without a central manager (or controller) can avoid the single point of failure. In contrast to current decentralized approaches, such as [5] and [9], this framework can provide sufficient information for nodes to make accurate decisions. Since a node is represented as an agent, these two terms, i.e., node and agent, are used interchangeably in the rest of this paper. III. FRAMEWORK AND ALGORITHM DESIGN In this section, we first describe the hybrid multiagent framework and then depict the agent Q-learning algorithm for power restoration in power grid systems. A. Framework Architecture A simplified example of a power grid system is displayed in Fig. 1. There are three types of agents in the system, i.e., Generator Agent, Switch Agent, and Load Agent (abbreviated as Gen, S, and L, respectively). The entire number of agents in a power grid system corresponds to the overall number of generators, switches, and loads in that system. The three types of agents are elaborated as follows. 1) Generator Agent: The knowledge of a generator agent is represented as

is the agent identification number, which is unique in a is current available capacity of power grid system. the generator. is the ID of the generator agent’s neighboring switch agent. Each generator agent has only one neighboring switch agent. For example, in Fig. 1, the neighis . boring switch agent of generator agent is the number of loads supplied by the generator and demonstrates the IDs of those load agents. For example, in

2436


Fig. 1. Simplified power grid system.

Fig. 1, the loads, which are supplied by Generator 1 , are Loads 1 and 2. 2) Switch Agent: The knowledge of a switch agent is . is either or , where means The value of that the switch is turned on and current can pass through has the opposite meaning. It should be this switch while noted that switches in either on or off status do not affect the communication link between the corresponding switch agents. is a message pool which records not only the messages the switch agent currently holds but also the previous messages incoming and outgoing through the switch agent. Messages are used as an information source for switch agents. With time running off, previous messages are deleted gradually is the number of neighfrom the message pool. is the IDs of those boring switch agents and neighboring switch agents. For example, in Fig. 1, the switch and . agent, , has two neighboring switch agents, i.e., 3) Load Agent: The knowledge of a load agent is . inis in the range dicates the current amount of the load. which demonstrates how vital this load is and its of determination is left to the user. The priority of loads is used to calculate the expected reward of the corresponding switch agents, which will be described in detail in Section III-C. is the ID of the load agent’s neighboring switch agent. Each load agent has only one neighboring switch agent. has the switch agent For example, in Fig. 1, the load agent as its neighbor. To satisfy the requirements of power restoration, the proposed hybrid framework deploys a layered architecture, i.e., power grid layer and scheduling layer, to ensure accurate decisions and quick responses. It should be noted that scheduling layer may consist of several layers if necessary. Fig. 2 shows a three-level layered architecture for power grid systems restoration, where a solid line indicates that there exists a bidirectional communication channel between two agents. In Fig. 2, the lowest level is the power grid layer, which is an abstract of the power grid system (Fig. 1). Each grid in Fig. 1 matches the corresponding grid in power grid layer in Fig. 2. The grids in power grid systems can be clustered based on some specific measures, e.g., geographical distance. Moreover, in Fig. 2, each grid is scheduled by a scheduler agent (the layers above power grid layer), i.e., Scheduler Agent scheduling Grid . Scheduler agents can directly communicate with all the agents in the corresponding grids. The communication

Fig. 2. Layered architecture for power grid systems restoration.

links between scheduler agents and the corresponding grids are omitted in Fig. 2 for clarity purpose. The role of a scheduler agent can be played by a dedicated agent or one of the agents in the power grid layer. If the number of scheduler agents is large, a group of higher-level scheduler agents can be added and so on, forming a multilevel scheduling architecture. For example, in Fig. 2, Scheduler Agents 1 and 2 are clustered in a new grid, i.e., Grid 5 and a new scheduler agent, i.e., Scheduler Agent 5 is generated to schedule Grid 5. The same condition also applies to Scheduler Agents 3 and 4. If a scheduler agent fails, another agent in the corresponding grid can be selected to play the role as a scheduler agent. In this paper, we stipulate that each agent belongs to only one grid. Two scheduler agents can directly communicate with each other if and only if there exists a direct communication link between their corresponding grids. For example, in Fig. 2, Scheduler Agents 1 and 2 can directly communicate with each other, since there is a communication link between Grids 1 and 2. A scheduler agent’s view is based not only on the agents that it schedules, but also on its neighboring scheduler agents. Thus, a scheduler agent can employ its expertise to provide suggestions to the agents in its scheduling grid. B. Power Restoration Process Before introducing the agent Q-learning algorithm, we briefly describe the steps followed for restoration after detection of a fault in a power grid system.


For example, in Fig. 1, there is a fault on Generator 1 in Grid 1, which incurs Switch 1 to be turned off. Thereby, the and , then Loads 1 and 2 are de-energized. Load agents, send restoration messages to their neighboring switch agents, and , separately. These restoration messages include i.e., information about the current load and the priority of the load and forward these monitored by load agents. Thereafter, and , restoration messages to generator agents, , , , and , , , , through switch agents, respectively. When these switch agents receive the restoration messages, they will make copies of these messages. Then, the information contained in these messages can be used as knowledge for their decision making. and , receive these Once the generator agents, restoration messages, they send available remaining capacity (ARC) messages backwards to the switch agents which initially and . Meanwhile, the genrequest restoration, namely , acted as Scheduler Agent 1, sends restoraerator agent via and directly to . It is tion messages to is selected as a scheduler agent for supersupposed that . Similarly, , , and are selected vising , , and , as scheduler agents for supervising respectively. Then, the switch agents in the power grid system will make decisions about turning on or off to supply or shed the corresponding loads. At the same time, scheduler agents provide suggestions to each switch agent in their scheduling domain with regard to each switch agent’s decision making. The objective of the restoration process is to maximize the power supply for as many loads as possible given the priority of each load. Thus, it can be simply comprehended that turning switches on or off is the restoration process, as this process will decide which loads are energized. In this process, there is no predefined relationship between each load and a specific generator. In our simulation, a load can be supplied by any generators depending on their available capacity and the priority of the load. For the issue of which load is supplied by which generator, we leave it as one of our future works. C. Algorithm Design In this paper, the power restoration problem is to energize as much loads as possible given the priority of each load by turning each switch agent on or off (see Fig. 1) and each switch turned on or off will result in the corresponding load energized or de-energized, respectively. For example, if Switch 2 is turned on, Load 1 will be energized; otherwise, Load 1 will be de-energized. Here, we concern only switch agents, as they are the decision makers for shedding or supplying power from/to loads. Those switch agents, that are adjacent to generator agents, are excluded from our consideration. For example, in Fig. 1, the , are not included, be, and switch agents, , , , cause their status, on or off, totally relies on the corresponding generators. A switch agent, which is adjacent to a generator, is usually in on status unless there is a fault in the generator (recall the example described in Section III-B). Furthermore, the reward of each switch agent turning on or off depends on how much load, with the priority of each load, could be energized. Let us take the example introduced in Section III-B (Figs. 1 and 2) into account, where Generator is in off status, and Loads 1 and 2 are 1 is out of order,

2437

de-energized. It is supposed that the power, which is created by Generators 2, 3, 4 and 5, is not sufficient to supply all the loads in the power grid system. For a switch agent, e.g., , it considers that if it keeps in on status, i.e., selecting the action turn on, the neighboring load, namely Load 3, will be energized but Loads 1 and 2 might still be de-energized (because the power is not enough for supplying all the loads). Thus, the reward in this case is calculated as of . Since stores the restoration messages sent from and to , and it therefore knows the current loads and priorities of . On the other hand, if chooses the action turn off, then Load 3 will not be energized whereas Loads 1 and 2 may have the opportunity to be energized. Hence, in this case, the reward is of . Here, means the priority of indicates the current load of Load Load and . For those switch agents which connect two feeders, i.e., , , , and , they have to concern the loads is in off status, Loads of both feeders. For example, if 3 and 4 could be energized but Loads 1 and 2 might not is be energized. Therefore, in this case, the reward of . The coefficient, 1/2, indicates that Loads 1 and 2 have 50% may be turned on. opportunity to be energized since is turned on, Loads 1 and 2 could be energized If but Loads 3 and 4 might be de-energized in order to support Loads 1 and 2, as the power supply is insufficient in is the system. Thereby, in this case, the reward of . As depicted above, in order to achieve accurate and quick decisions, each grid has a scheduler agent, which can exploit its expertise to provide suggestions to the switch agents in its scheduling grid. A suggestion is formally defined as . means the suggestion for Switch Agent in the is the available acgrid scheduled by the scheduler agent. is a vector which demontion set of Switch Agent and strates the suggestion degree to each action. Here, each switch or , so agent has two available actions, is ranged . Each element of vector . A suggestion with a negative degree urges a superin vised agent not to do the specified action, whereas a suggestion with a positive degree encourages a scheduled agent to take the specified action. The greater the absolute value of the suggestion degree, the stronger the impact of the suggestion on the can be represented as this form, scheduled agent. Here, , which means that the scheduler agent moderately discourages Switch Agent to take the action but strongly recommends Switch Agent to take another action, . The suggestion, generated by a scheduler agent for the switch agents in its scheduling grid, depends on the reward of each switch agent in the coincident grid. As a scheduler agent has a global view of the corresponding grid, it can calculate the reward of each switch agent. It then averages these rewards. Afterwards, for those actions of which rewards are more than the average reward, the suggestion degree is positive; for those actions of which rewards are less than the average reward, the suggestion degree is negative. The more the reward, the higher the

2438


positive suggestion degree is, and the less the reward, the lower the negative suggestion degree is. Thus, suggestions are used to help balance the load in the power grid system. For each switch agent, there is a function, , whose value . determines the receptivity for suggestions ranges from and allows switch agents to selectively accept suggestions based on the information they currently hold (recall that the information is from the messages stored in its message pool). For instance, when a switch agent holds much information, it becomes confident on its local decisions and the value of decreases as learning progresses. In this paper, we set , where is a constant and returns the number of messages possessed by the switch agent during a specific event. A specific event could be a fault occurring in the power grid system (recall Section III-B). Based on Even-Dar et al.’s investigation [17], Q-learning can be used to solve distributed problems with decent performance. Thus, in this paper, we develop a Q-learning algorithm for the power grid systems restoration issue. As described in Section I, Q-learning is one of the most important breakthroughs in reinforcement learning. The simplest form of Q-learning is defined as

action, , which is the -greedy exploration method [21], where is a small positive number and is the number of available actions of an agent: if

Algorithm 1: Learning Progress of a Switch Agent 1) Initialize 2) for

to a predefined integer do;

calculate ;

4)

adapt

to

;

5) 6)

; for each available action

do; ;

7) 8)

end for

9) end for 10)

is an action-value function which derives a value where according to an agent taking action in state ; means the maximum value which is reaped by selecting an optimal action in the new state ; indicates the reward an agent gains by taking action in state ; while and are coefficients representing learning rate and discount factor, respectively. Here, state is usually represented by the properties of a dynamic environment and actions can be performed by agents to change the state of an environment. Interested readers can refer to Sutton and Barto’s famous literature [18] for more details about Q-learning. During learning processes, each switch agent improves its action policy as it interacts with other agents. A pure action policy chooses one action at each learning process deterministically, while a stochastic action policy specifies a probability distribution over the available actions at each learning process. Singh et al. [19] have proved that stochastic policies can work better than pure policies in partially observable environments, if both stochastic and pure policies are limited to act based on the current percept. Thus, stochastic action policy is exploited in this paper. A stochastic action policy can be represented as a func, which specifies the probability that an agent will exetion cute action . Equation (2) is the policy adaptation method used is an element of the vector deg that in this paper, where gives the suggestion degree to action [20]:

11)

(2) Algorithm 1 demonstrates the learning progress of a switch agent, e.g., , to make a rational decision in a pseudocode form. , first, arbitrarily initializes the Q-value of each available action (Line 1). Thereafter, launches the learning process for Q-value of each available action (Lines 2–9). In Line 2, the number of iterations is set to 50, which can yield adequate results. In Line 3, is calculated by using (3) for each available

value for each available action arbitrarily;

3)

(1)

if if

(3)

; takes the action

;

Then, each is adapted to by utilizing (2) such that it sums to 1 (Line (Line 4). To normalize function is introduced and is defined as 5), the , namely that returns a valid policy which has the minimum Euclidean distance to . is the learning rate and is the exIn Line 7, pected reward received by after taking action . The formula in Line 7 is different from (1). Equation (1) is just a general form of calculating Q-value and many other specific formulas have been developed for calculating Q-values. The formula in Line 7 is a specific one for our problem. When finishing learning, executes the action which could maximize the Q-value (Lines 10–11). The acute readers might point out that the switch agents can directly choose the action which could maximize their rewards and thus do not need the learning process. However, as stated by Kaelbling et al. [3], this kind of algorithm, which always takes the highest rewards action and overlooks the tradeoff between exploitation and exploration, may converge to a sub-optimal state. In the following sections, we will first provide an example to show the power grid system restoration process by using our approach and then empirically demonstrate the performance of our approach. IV. SIMULATION EXAMPLE To deeply understand our power restoration mechanism, in this section, an example is given, which demonstrates how a power grid system restores when a fault occurs on a generator. Suppose that, in Fig. 1, capacity of each generator is 120 kW and the amount of each load is 50 kW, the priority of Loads 1 and 2 are 0.8 and the priority of other loads are 0.5. Now, there is


a fault on Generator 1 and then Loads 1 and 2 are de-energized. The restoration process is described as follows. and , send restoration messages to 1) Load agents, and , sepatheir neighboring switch agents, i.e., rately. These restoration messages include information about the current load and the priority of the load monitored by load agents, as described in Section III-B. and forward these restoration messages to gen2) and , through switch agents, erator agents, , , , and , , , , respectively. and send ARC messages backwards to 3) and . The remaining capacities of both Generators 2 and 3 are 20 kW and they cannot supply either one of and act the Loads 1 and 2. Thus, the agents 1 and 2, separately, so as as scheduler agents for to contact other grids for help. and 4) As there is a connection between and a connection between and , and can directly contact and for help, reand know that spectively. Thereby, both are de-engized and each load needs two loads in 50 kW electricity and the priority of both loads is 0.8. 5) After information transfer, switch agents in the power grid system start to make decisions about turning on or off to supply or shed power to/from their corresponding load by using Algorithm 1. During the learning process, each scheduler agent will provide suggestions to the switch agents about their decision in its scheduling grid by using (2). In this example, after learning, only switch chooses the action turn off, whereas other agent switch agents choose the action turn on. In that case, all the loads, expect load 3, are energized. V. SIMULATION To objectively exhibit the performance of our approach, we evaluated it in comparison with other two representative approaches, i.e., Centralized and Decentralized[5] approaches. • Centralized approach: The centralized power restoration approach usually needs an external omniscient central manager to maintain information about all the nodes in a power grid system. In addition, the central manager is able to interact with all the nodes in the system. This approach has the potential of the single point of failure. • Decentralized approach: A power restoration approach, proposed by Solanki et al. [5], is a typical decentralized approach, where each node has the knowledge only about its directly linked neighbors. The nodes with limited knowledge might make inaccurate decisions for power restoration. A. Simulation Setup The simulation was conducted in two settings. The one is in a scale changing power grid system and the other is in a fixed scale power grid system with different grid sizes. The grid size is based on how many feeders each grid involves. For exinvolves two feeders, i.e., ample, in Fig. 1, 1 and 2, while other grids involve only one feeder. The average amount of load on each load in the system is 50 kW and the

2439

average amount of capacity of each generator is 120 kW. The exact amounts of load and capacity on each load and generator are subjected to Normal Distribution. The priority of each . The other parameters are load is a random number in , suggestion receptivity set as follows: learning rate , action selection distribution probability coefficient , and learning round . These parameter values are reaped through attempts during the simulation course which can achieve an adequate simulation result. 1) The scale of the simulated power grid system is fluctuated from 5 generators, where each grid contains one feeder, to 100 generators, where each grid contains 10 feeders. The architecture of the simulated power grid system is similar to Fig. 1. The purpose of this setting is to test the performance of the three approaches in different system scales, namely increasing number of generators (from 5 to 100). 2) The grid size, namely the number of generators in each grid, is changed between 5 and 25 with the fixed scale of the power grid system (100 generators). The aim of this setting is to test the suitability of our hybrid framework, i.e., the performance of our approach in different grid sizes with a fixed number of generators. Moreover, faults are randomly introduced on several generators in the system. The three criteria, used to evaluate the performance of the three approaches, include the overall energized loads given their priority, the entire number of communication messages created during a power restoration progress, and the time consumed to achieve decisions. The simulation is implemented by using JAVA programming language and is run on Windows XP SP3 system with Intel Core 2 Duo 3-GHz CPU and 2 GB of RAM. The simulation results are obtained by averaging 100 simulation runs. B. Simulation Results and Discussion 1) For Setting 1: We first present the simulation results for Setting 1, where the number of generators increases. Fig. 3(a) demonstrates the overall energized loads of Decentralized and our Hybrid approaches in different system scales in a percentage format with the maximum overall energized loads gained by the Centralized approach. As the central manager in the Centralized approach has a global view of the system, it can achieve the best performance as expected. Therefore, in Fig. 3(a), we use the Centralized approach as an upper bound. It can be seen that our Hybrid approach performs continuously better than the Decentralized approach and approximates the performance of the Centralized approach. This is because when the scale of system increases, each agent in the Decentralized approach could have only a small view of the whole system and then makes relatively inaccurate decisions. Agents in our Hybrid approach can still achieve a broad view of the whole system based on some designated scheduler agents and the broad view avails agents to make more accurate decisions. Fig. 3(b) shows the number of communication messages incurred by all the three approaches during a power restoration progress. When the system scale is small (with 5 generators), the Centralized approach creates the least communication messages, since the other two approaches have to pass messages in the system in order to accumulate information about the system

2440


Fig. 3. Performance of different approaches in various system scales. (a) Overall energized loads. (b) Communication messages incurred. (c) Time consumption.

Fig. 4. Performance of our Hybrid approach in different grid sizes. (a) Energized loads. (b) Communication messages incurred. (c) Time consumption.

whereas the Centralized approach does not need this step. However, with the rise of system scale, the Centralized approach incurs much more communication messages than both the Decentralized and our Hybrid approaches. This can be explained that with the system scale increasing, the central manager in the Centralized approach has to generate more communication messages to control the behavior of each agent, while in other two approaches, agents need to hold only a small view of the system and thus creates less communication messages. It should also be noticed that our Hybrid approach incurs less messages than the Decentralized approach persistently. This is because agents in our Hybrid approach can acquire information by inquiring only scheduler agents, which have a broader view than that of each individual agent has, instead of delivering many messages in the system as the Decentralized approach does. In power grid systems, the restoration time is a significant index. Thus, we tested the time consumed by agents to achieve rational decisions and deliver messages. Fig. 3(c) displays the time consumed by all the three approaches. In this simulation, we programmed a timer to record how many milliseconds are needed for each of the three approaches. It can be observed that the Centralized approach consumes more time with the system scale increasing, as the central manager has to manage more nodes in the systems and hence has to take much information into account when making a decision. The time consumption of the other two approaches keep relatively steady and in a lower level, because each agent needs to manage only a few nodes’ information and uses this information to make decisions. It should also be noted that the time consumption of our Hybrid approach is even lower than the Decentralized approach, although our Hybrid approach requires a time consumption for the iteration phase in the learning process. This is because our Hybrid approach generates less communication messages and thereby saves some time, which could compensate the time spent on learning process. 2) For Setting 2: In addition, for Setting 2, we tested the performance of our Hybrid approach in different grid sizes as

shown in Fig. 4. It can be found that, in Fig. 4(a), the overall energized loads of our Hybrid approach increases with the grid size rising. This is because when the grid size scales up, each scheduler agent could gain broader view of the system which benefits scheduler agents to provide correct suggestions. Nevertheless, with the grid size increasing, both the number of communication messages and time consumption soar up [Figs. 4(b) and 4(c)], as each scheduler agent has to maintain more agents’ information in a grid. Therefore, there is a trade-off for the grid size. Moreover, it should be pointed out that the simulation results presented above were tested in a one-scheduling layer environment. The simulation were also conducted in two and three scheduling layers environments, but the results did not show significant improvement compared with the one-scheduling layer architecture. This is because the increase of communication messages caused between grids in different layers offsets the reduction of communication messages incurred between grids in the same layer. Nonetheless, when a power grid system was further scaled up to 200 generators or more, multischeduling layers architecture can bring remarkable performance improvement in contrast with one-scheduling layer architecture. Thus, it could be concluded that a multischeduling layer is only suitable to a large power grid system (e.g., more than 200 generators). For a medium or small power grid system (e.g., about 100 generators or less), one-scheduling layer may be sufficient. In summary, the overall performance of our Hybrid approach is better than the typical Decentralized approach and approximates to upper bound Centralized approach, with less communication messages generation and time consumption. Thereby, it can be concluded that our Hybrid approach is a balance between Centralized and Decentralized approaches with more flexibility. VI. CONCLUSION In this paper, a hybrid multiagent framework for power grid systems restoration with a Q-learning algorithm was presented. The contribution of this framework is that it combines the benefits of centralized and decentralized architectures together. In


addition, the proposed Q-learning algorithm does not require a global reward signal and can converge faster based on higher level agents’ suggestions. The simulation results indicated the potential possibility of our approach applying in power grid systems for power restoration. As claimed in Section I, our framework is topology independent and thus can be used in any structures of power grid systems. In the future, as we described in Section III-B, we would like to cope with the issue of which generator should supply which load and how much capacity the generator should provide to the load. This issue can potentially be handled by using some negotiation protocols between generator agents and load agents. Negotiation is an important research issue in multiagent systems and many efficient negotiation protocols have been developed. Another issue is that, in practice, several other factors have to be taken into account, e.g., power balance between generation and load, the capacity of each generator, line flows, voltage values, frequency and time constrains, etc., when the reward of each switch agent is calculated. In this paper, we consider only how much load could be energized when we calculate the reward of each switch agent, namely that if an action of a switch agent can potentially lead to more loads to be energized, this action will have a higher reward. However, in real environments, other factors should also be taken into consideration. For example, an action of a switch agent can indeed energize more loads, but energizing these loads may cause too much delay or lead to significant oscillation of the system voltage value. Then, the switch agent may be in a dilemma state. We intend to incorporate these system constraints as a weighted linear function of reward and assign different weights to different constraints. We are currently working on these two issues and plan to implement and test our framework in some real cases. REFERENCES [1] J. Jung and C.-C. Liu, “Multi-agent technology for vulnerability assessment and control,” in Proc. IEEE Power Eng. Soc. Summer Meeting, Vancouver, BC, Canada, Jul. 2001, vol. 2, pp. 1287–1292. [2] D. Poutakidis, L. Padgham, and M. Winikoff, “Debugging multi-agent systems using design artefacts: The case of interaction protocols,” in Proc. 1st Int. Workshop Challenges in Open Agent Systems at AAMAS 2002, Bologna, Italy, 2002, pp. 960–967. [3] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996. [4] C. Eder, Q-Learning, Oct. 2008. [Online]. Available: http://knol. google.com/k/q-learning. [5] J. M. Solanki, S. Khushalani, and N. N. Schulz, “A multi-agent solution to distribution systems restoration,” IEEE Trans. Power Syst., vol. 22, no. 3, pp. 1026–1034, Aug. 2007. [6] D. V. Coury, J. S. Thorp, K. M. Hopkinson, and K. P. Birman, “Agent technology applied to adaptive relay setting for multi-terminal lines,” in Proc. IEEE Power Eng. Soc. Summer Meeting, Jul. 2000, pp. 1196–1201. [7] M. Kezunovic, X. Xu, and D. Wong, “Improving circuit breaker maintenance management tasks by applying mobile agent software technology,” in Proc. IEEE Power Eng. Soc. Asia Pacific Transmission and Distribution Conf., Oct. 2002. [8] D. P. Buse, P. Sun, Q. H. Wu, and J. Fitch, “Agent-based substation automation,” IEEE Power Energy, vol. 1, no. 2, pp. 50–55, Mar. 2003. [9] M. M. Nordman and M. Lehtonen, “Distributed agent-based state estimation for electrical distribution networks,” IEEE Trans. Power Syst., vol. 20, no. 2, pp. 652–658, May 2005. [10] J. Jung, C.-C. Liu, S. T. Tanimoto, and V. Vittal, “Adaptation in load shedding under vulnerable operating conditions,” IEEE Trans. Power Syst., vol. 17, no. 4, pp. 1199–1205, Nov. 2002.

2441

[11] T. Nagata and H. Sasaki, “A multi-agent approach to power system restoration,” IEEE Trans. Power Syst., vol. 17, no. 2, pp. 457–462, May 2002. [12] S. K. Srivastava, H. Xiao, and K. L. Butler-Purry, “Multi-agent system for automated service restoration of shipboard power systems,” in Proc. 15th Int. Conf. Computer Applications in Industry and Engineering, San Diego, CA, Nov. 2002, pp. 119–123. [13] T. Nagata, H. Fujita, and H. Sasaki, “Decentralized approach to normal operations for power system network,” in Proc. 13th Int. Conf. Intelligent Systems Application on Power Systems, Nov. 2005, pp. 407–412. [14] J. Momoh and O. Diouf, “Optimal reconfiguration of the navy ship power system using agents,” in Proc. IEEE PES Transmission and Distribution Conf. Exhib., May 2006, pp. 562–567. [15] K. Huang, S. K. Srivastava, D. A. Cartes, and M. Sloderbeck, “Intelligent agents applied to reconfiguration of mesh structured power systems,” in Proc. Int. Conf. Intelligent Systems Applications to Power Systems, Toki Messe, Niigata, Nov. 2007, pp. 1–7. [16] L. Chiariglione, Fipa 98 Specification. [Online]. Available: http://www.cselt.it/fipa/spec/fipa98. [17] E. Even-Dar, S. Mannor, and Y. Mansour, “Pac bounds for multi-armed bandit and Markov decision processes,” in Proc. 15th Annu. Conf. Computational Learning Theory (COLT 2002), Sydney, Australia, Jul. 2002, pp. 255–270. [18] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. [19] S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvari, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Mach. Learn., vol. 39, pp. 287–308, 2000. [20] C. S. A. Zhang and V. Lesser, “Integrating organizational control into multi-agent learning,” in Proc. AAMAS’09, Budapest, Hungary, May 2009, pp. 757–764. [21] E. R. Gomes and R. Kowalczyk, “Dynamic analysis of multiagent q-learning with e-greedy exploration,” in Proc. ICML’09, Montreal, QC, Canada, 2009. Dayong Ye received the B.Eng. degree from Hefei University of Technology, Hefei, China, in 2003 and the M.Sc. degree from the University of Wollongong, Wollongong, Australia, in 2009. Currently, he is pursuing the Ph.D. degree in computer science at the University of Wollongong. His research interests focus on self-organizing multiagent systems and their applications.

Minjie Zhang (M’98) received the B.Sc. degree from Fudan University, Shanghai, China, in 1982 and the Ph.D. degree in computer science from the University of New England, Armidale, Australia, in 1996. Currently, she is an Associate Professor of computer science at the University of Wollongong, Wollongong, Australia. Her research interests include multiagent systems, agent-based simulation, and modeling in complex domains.

Danny Sutanto (SM’88) received the B.Eng. and Ph.D. degrees from the University of Western Australia, Perth, Australia, in 1978 and 1981, respectively. Currently, he is a Professor of power engineering at the University of Wollongong, Wollongong, Australia. His research interests include energy storage systems and voltage stability.

A Hybrid Multiagent Framework With Q-Learning for ... - CiteSeerX

A Hybrid Multiagent Framework With Q-Learning for ... - CiteSeerX

Suggest Documents

Surviving cyber warfare with a hybrid multiagent-based intrusion ...

Surviving cyber warfare with a hybrid multiagent-base ... - IEEE Xplore

A Multiagent Framework to Integrate and Visualize Gene ... - CiteSeerX

HCDF: A Hybrid Community Discovery Framework - CiteSeerX

Multiagent Framework for Bio-data Mining

A Methodological Proposal for Multiagent Systems ... - CiteSeerX

A Multiagent Evolutionary Algorithm for Constraint ... - CiteSeerX

A hybrid multiagent approach for global trajectory optimization - Core

Environments for Multiagent Systems - CiteSeerX

Environments for Multiagent Systems - CiteSeerX

a hybrid-based framework for constraint satisfaction ... - CiteSeerX

HYUI - A Visual Framework for Prototyping Hybrid User ... - CiteSeerX

Hybrid Automata as a Unifying Framework for Modeling ... - CiteSeerX

A Hybrid Framework for Image Segmentation Using ... - CiteSeerX

A Multiagent Planning Architecture - CiteSeerX

A Framework for Evolutionary Optimization With ... - CiteSeerX

A Multiagent Framework for Studying Human and Social Behaviors in ...

A Multiagent Evolutionary Framework based on Trust for ... - ifaamas

A Cooperative Multiagent Framework for Self-Healing ... - IEEE Xplore

A Multiagent Planning Architecture - CiteSeerX

Environments for Multiagent Systems - CiteSeerX

gpuMF: A Framework for Parallel Hybrid Metaheuristics on GPU with

A Framework for Coordinated Control of Multiagent ... - IEEE Xplore

medee: a method framework for multiagent systems - Biblioteca ...