number of defense resources used to defend the th asset at layer ... Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org ...
AIAA 2011-1450
Infotech@Aerospace 2011 29 - 31 March 2011, St. Louis, Missouri
Fuzzy-based Approaches to Decision Making and Resource Allocation during Wildland Fires Nicholas Hanlon1, Dr. Manish Kumar2, Dr. Kelly Cohen3 University of Cincinnati, Cincinnati, OH, 45221 and
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Dr. Benjamin Tyler4 EDAptive Computing, Inc., Dayton, OH, 45458 This paper proposes two different methods of implementing fuzzy logic to improve upon the decision-making and resource allocation during a wildland fire. The problem is based on previous work of implementing neural dynamic programming in the theater-missile defense problem. The scenario was modified to the parameters of a wildland fire and extended to include multiple layers of defense. In addition, three uncertainty cases were introduced with incomplete knowledge of the environment. The control methodologies were critiqued by the remaining health of the assets and the execution time. The neuro-fuzzy dynamic programming and fuzzy-heuristic approaches showed superior results in both certainty and uncertainty cases while being robust to system complexity.
Nomenclature
N
PVk
= = = = = = = = = = = = = = = = = =
attack vector number of fires attacking the th asset defense vector at layer number of defense resources used to defend the th asset at layer one step cost function health value of asset total health value of asset reduced state vector reduced optimal cost at reduced state optimal expected long-term cost starting at state maximum number of fires that can be created at each discrete time step of the simulation number of assets probability of an attacking fire successfully causing damage probability of a fire suppression successfully defending an asset property value of asset number of defense resources at layer total number of wildland fires based on 3-month seasonal trend forecasts optimal control policy
1
Graduate Student, School of Aerospace Systems, AIAA Student Member. Assistant Professor, School of Dynamic Systems. 3 Associate Professor, School of Aerospace Systems, AIAA Associate Fellow Member. 4 Senior Software Developer II. 1 American Institute of Aeronautics and Astronautics 2
Copyright © 2011 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
I. Introduction Wildland fire, a natural agent of change and one of the basic environmental factors on our planet, is an essential tool in regulating complex forest ecosystems causing both destruction and birth in plant and animal life, in an effort to ensure diversity. These complex ecosystems seek a point of criticality, a state of readiness when the correct fuel accumulation is primed for ignition for the fire to fulfill its global role in our planet‟s continual survival. This state of readiness provides the balance between destruction and rebirth. The absence of natural occurring fires causes fuel sources to accumulate to hazardous levels. The severity and intensity of the fire causes utter destruction, minimizing the benefits that promote plant and animal diversity. A prescribed burn, a method of mimicking the natural occurrence of a fire attempts to restore the natural fire regime and recondition the ecosystem to fires. The intensity of wildland fires is the result of a mixture of variables such as fuel accumulation, humidity, wind speed and direction, and dryness. Once the fire is ignited, a column of smoke and heat rises miles into the atmosphere, creating a void below which rapidly funnels more oxygen into the space further fueling the fire. The repeated cycle of air movement creates gale-force winds which can blow fire embers up to half a mile in distance, hurdling over any fire barrier and starting a new spot fire8. The human species and nature are not isolated systems but coupled, each playing a vital role in the future of the other. So an uncontrolled wildland fire that encroaches on our lives creates havoc in many facets of our society. Homes, community infrastructures, and ultimately humans‟ lives may be lost. Government funded agencies along with extensive amounts of resources are used to prevent such occurrences. An estimated $10 billion of fire suppression and resources were used to fight over 90,000 wildfires in 20001. Emergency situations, such as uncontrolled wildland fires, are undoubtedly complex events within a partially known environment. It is cumbersome to obtain a precise mathematical model of the spatio-temporal behavior of a wildland fire. Nevertheless, in the event that a fire is deemed hazardous, real-time decision-making concerning resource allocation and control strategy is required although we only possess partial information and an inaccurate model. Using terminology borrowed from control systems, the resources available for fire protection include both sensors which enable information gathering and actuators which actively suppress the fire and limit its growth. Ground crews and vehicles, UAVs, satellites, and aerial vehicles are examples of the resources available. Resources such as aerial vehicles may act as a sensor (NASA‟s Ikhana UAV and the Global Hawk UAV) to detect fire intensity and direction as well as provide fire suppression (C-130 Tankers and helicopters). During wildland fires, decision-makers attempt to have an accurate perception of the environment, known as situation awareness. The use of sensors provides data into the system to understand what is currently going on in the field in spite of the inherent uncertainty and incomplete information. Ideally, complete information is needed to update the system continuously, but the data collected by sensors are added to the system in discrete time periods and may include missing elements of information which adds to the complexity of the system. Based on the situational awareness and a model of the environmental and geographical factors, we predict the growth of the fire. Decisions and resource allocation can be made based on the fire model and the process is repeated until the danger has been eliminated. The challenge, „Given a set of spatially separate fires and number of resources to suppress the fire, how do we make decisions and allocate our resources optimally to limit the damage in terms of assets destroyed?‟
II. Problem Formulation The resource allocation problem is modeled as an attacker-defender style game, such that the defender is defending its assets while the attacker is attempting to deliver maximum destruction to those assets, based on the approach developed by Bertsekas, Homer, Logan, Patek, and Sandell4 for the Theater Missile Defense (TMD) problem. Although in nature, fires do not intentionally attack assets, we assume this approach to fulfill the attackerdefender style game. The assets of the system are the economic resources that we are striving to protect, e.g. structures (governmental, commercial, and private), agriculture, land, etc. Each asset is assigned a property value (level of importance) giving preference to protecting one asset over another. The attack vector is comprised of the wildland fire hotspots that are burning through the landscape. The defense vector is the collection of resources to protect the assets, e.g. ground crews/vehicles, aerial vehicles, etc. Although the realistic situation is a continuous system, our scenario assumes a discrete time model. At least one fire attack will occur at each time step, ensuring that the simulation will terminate in a finite time due to destruction of all assets or the elimination of all fires. Proving this special case implies that the solution is valid for time periods in which no attacking fire occurs. We assume that the attacking fires at each time step are independent of one another and the selection of the asset attacked by a fire is based on probability. In addition, a success probability rate 2 American Institute of Aeronautics and Astronautics
is assigned to the attacking fire and to the defense measures of completing their intended purpose, namely and . Each asset is assigned a total health value . Every time a fire successfully reaches its target, the health of the asset is decremented by one point given the fire‟s probability for damage . represents the remaining health of asset ; an asset is completely destroyed once . The key goal is to maximize the health of the surviving assets by the end of the simulation, calculated in Eq. (1). In the following few equations, represents the number of assets.
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
(1)
The state of the system is stored in three key vectors listed below and are updated after each discrete time step. Reduced State Vector (2) where Remaining health of the th asset Number of defense resources at layer Total number of wildland fires Attack Vector (3) subject to the constraint (4) where Number of fires attacking the th asset Maximum number of fires that can be created at each discrete time step in the simulation Defense Vector (5) subject to the constraint (6) where Number of fire retardant resources defending the th asset at layer A key ingredient to a fire is the fuel source, without which a fire cannot burn. A fire typically consumes all fuel sources as it burns through the environment limiting the ability for subsequent fires to follow the same path. Once an asset is destroyed, all viable fuel paths towards the asset have been consumed. Based on this knowledge, the attack vector will not have any fires directed towards an asset that has been previously destroyed. A. Three Layer Approach In order to closely mirror the wildland fire environment, our scenario incorporates multiple layers of defense to thwart an attack. A fire must successfully elude three multiple defense attempts to effectively destroy the asset. Figure 1 depicts the three-layered defense approach.
3 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Figure 1. Three Layer Defense. Asset is attacked by fire . Defense vector is selected to eliminate the attack in the first layer. If the defense could not eliminate the threat, the surviving attack fire moves into layer 2, and so forth. represents the attacking fire that successfully navigated through the three layers and reached the asset. The decision-making process of the defense vector at each layer is independent of one another. Each layer is supplied with the updated reduced state vector and attack vector. B. Receding Horizon Approach The problem takes the three-layer approach one step further to mirror the real-world situation. In a finite-horizon model5, an agent forgoes current rewards to optimize its reward after discrete steps. In subsequent steps, the agent makes decisions on steps until the reward is one step away. This approach assumes that the agent knows how far away the horizon appears for its decision making. A modification to the finite horizon is the receding horizon; the agent will continuously make decisions based on the horizon always appearing discrete steps away. The receding horizon is ideal for the wildland fire as the terminal stage of the scenario is unknown. The method creates a virtual continuous environment that attacks are initiated at discrete steps and the simulation continues until one of two situations is satisfied: the total number of fires is extinguished or the total number of resources is depleted. C. Exclusion of Burnt Land The majority of fire growth modeling techniques adheres to the same underlying principle shape of an ellipse, which under normal conditions grows based on Huygens‟ Principle of wave propagation. Normal conditions for fire factors are spread, velocity, fuels, topography, etc., which are spatially and temporally constant, an assumption which is rarely true in the environment9.
Figure 2. Fire Growth over Three Layers. The wildland fire is consuming fuel from the land as it approaches an asset and the area of the ellipse increases exponentially at each layer, listed as 1, 2, and 3 on the figure. Since the burning of land is a natural agent of change and at times encouraged through natural events and prescribed burns, the cost of land burned in the simulation is excluded. The top priority is to minimize the economical impact (loss of assets) and any land destroyed during the simulation is considered as a prescribed burn. D. Uncertainty Analysis In a perfect world, the information gathered would be complete and accurate, simplifying many of the constraints imposed on the problem. However, we must handle uncertainty within our system since error may be introduced due to the reliability of sensors. Three different uncertainty cases are explored: 1. Fire Error Percentage: The number of fires is more than stated in the reduced state vector. 2. Breakup Percentage: Possibility that a fire jumps and creates an additional attacking fire in the third layer. 3. False Alarm Percentage: Possibility that an attacking fire is a false alarm and poses no threat or destructive capability to the assets. The false alarm always occurs in the third layer.
4 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
This information is unknown to the defender prior to the simulation. Any training or planning by the algorithms is based on the assumption of perfect knowledge of the environment. In event that a breakup or false alarm occurs in the third layer, the algorithms are responsive to the change in state and can react accordingly. E. Figures of Merit The figures of merit of the five control methodologies are based on two parameters: execution time and remaining asset health. The execution time quantifies the requirement of real-time decisions. The faster the execution time equates to a quicker reaction time to find the control policy. The remaining asset health is a measure of how well the control algorithm performed protecting its assets from the attacker. The higher the remaining asset health results in a more successful control policy. Therefore, a promising algorithm is one where the execution is fast (sufficient for real-time decision-making) and a high remaining asset health. Multiple scenarios are simulated, each scenario with a unique combination of initial attacker and defender inventories. The same scenarios are repeated with uncertainty in the system as listed in the uncertainty analysis section. F. Scenario Description A series of 14 cases with initial attacker and defender inventories were setup for the simulation shown in Table 1. represents the number of resources at each layer. is the total number of fires in the simulation and is the maximum number of fires that can occur at a given time step. and represent the engagement probabilities for the attacker and defender, respectively.
Case
Layer 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
R1 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Defender Layer 2 Layer 3 R2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
R3 3 3 3 3 3 3 3 3 3 3 3 4 4 4
Attacker TF 12 10 8 15 9 13 10 11 7 14 13 20 25 25
MF 3 4 4 3 3 4 3 4 4 3 3 4 4 5
Engagement Probabilities p(F) 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
p(R) 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
Table 1. Test Case Setup. The defense resources at each layer may only be used once per their assigned layer. After dropping their defense measure, they return to their respective base for service/restock and are readily available for the next mission task. In our scenarios, the attacker has targeted three distinct assets {asset 1, asset 2, asset 3}. The assets have initial property values of {5, 10, 15} and total health values of {5, 5, 5}. The initial asset health of the entire system is valued at 30 points calculated using Eq. (1). The calculation is a mathematical equation that is essentially dimensionless, points is a meaningless term to fill in the unit‟s placeholder.
5 American Institute of Aeronautics and Astronautics
Asset 1
Asset 2
Asset 3
(7)
In discrete time steps, the attacker will randomly attack assets. The simulation continuously runs until either all assets are destroyed or all fires have been extinguished.
III. Methodologies
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
The first three algorithms are based on the TMD problem by Bertsekas4. Then a fuzzy logic component is added to both the heuristic and neuro-dynamic programming algorithms. A. Greedy-based Heuristic The premise of the heuristic approach is to give preference to the highest value assets at all costs. The decision to protect lower valued assets (sequentially from next highest valued asset until reaching the lowest value asset) is based on difference between remaining defense resources and the number of fires. A positive surplus of defense resources permits the opportunity to protect lower level assets early in the simulation. Excess defense resources are not used when a negative surplus occurs as to keep a certain allotment of resources available for high level assets later in the simulation. B. Dynamic Programming (DP) Dynamic programming lends itself well to multi-stage decisions where there is a tradeoff between the current state‟s cost and future state‟s cost2. In addition, DP is considered as the gold standard with respect to performance because it allows us to obtain the optimal solution, albeit at a huge computational cost. The underlying theory is Bellman‟s Principle of Optimality: an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision3. The problem is cast as a Markovian decision process, such that represents the probability that the new state will be given the current reduced state, attack vector and defense vector. A Markov chain describes the transition probabilities of the system, the probability that attack will occur given the current state . Since the problem is setup as a stochastic shortest path problem, then can be formulated as (8)
where is the optimal expected long-term cost starting at state and that the next attack is given the current state . With the reduced optimal cost, equation can be rewritten as
is the conditional probability , at reduced state , Bellman‟s
(9) where (10)
Equation (10) represents the one time step cost. Thus, the goal is to find the defense vector that minimizes the expected long-term cost given the current state and attack vector . Since we know that the system will terminate in finite time (based on previous assumptions that at least one attacking fire will occur at each time point and, either the attacking fires are extinguished or the number of assets are destroyed), Bellman‟s equation will converge to a unique solution, the reduced optimal cost , for all states .
6 American Institute of Aeronautics and Astronautics
Theoretically speaking, Eq. (11) can be solved by classical methods by iterating over the equation such that the generated sequences will converge to the optimal cost for all states .
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
(11)
However, the solution by exact methods is computationally expensive due to the large number of states. To overcome this drawback, a series of one-step Monte Carlo simulations are performed on scenario models to collect empirical data for the expected engagement result , given the current state , attack vector , and defense vector . This eliminates the need to perform probabilistic analysis to create the Markov chain for transition probabilities. The advantage of DP is the number of required calculations as compared to other methods such as direct enumeration. It reduces many needless calculations based on the principle of optimality, thus paths that are known not to be optimal are not considered. DP is usually the method of choice for small-scaled problems where real-time results are not needed and the optimal cost can be computed off-line. However, the drawback is the computational requirements as the number of states and controls increase, where DP falls prey to Bellman‟s curse of dimensionality. For high-dimensional systems, a real-time solution is required and we are generally prepared to tradeoff a sub-optimal performance for computational speed 3. C. Neuro-Dynamic Programming (NDP) The Neuro-Dynamic Programming algorithm is formulated based on the concept of reinforcement learning, the idea of “an agent that must learn behavior through trial-and-error interactions with a dynamic environment5”. The overall goal for the agent is to select a control policy such that it maximizes the long-run summation of the reinforcement signal. Learning can be accomplished through systematic trial-and-error algorithms as in reinforcement learning, or by methods such as back-propagation as in Artificial Neural Networks (ANNs)5. The NDP algorithm focuses on approximating the reduced optimal cost function from Eq. (9) with a suitable by utilizing ANNs. The variable is a vector of parameters used in conjunction with the future state to approximate the cost-to-go function4. In this case, the parameters are the synaptic weights and the thresholds of the artificial neural network. The approximation produces sub-optimal results as compared to the DP algorithm, but can significantly decrease execution time of the algorithm. We utilized a fully-connected, feed-forward ANN designed specifically for this scenario. The number of inputs is based on the size of the reduced state vector depicted in Figure 3. The network contains eight neurons within one hidden layer and one neuron for the output.
Figure 3. Fully Connected ANN with 7 Input Neurons, 8 Hidden Neurons, and 1 Output Neuron. All the neurons in the network utilize a symmetrical sigmoid function6 for the activation function:
7 American Institute of Aeronautics and Astronautics
(12)
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
where represents the input into the neuron. The neural network is trained using approximate policy iteration using Monte Carlo Simulations by iterating a predefined amount over the following four steps. Figure 4 shows an illustrated view of the cyclical training process.
Figure 4. ANN Training Process. 1. Neural Network Approximator The weight matrix is initialized with random weights and the neural network generates values for the costto-go function . 2. Policy Update The optimal control policy is obtained based on the generated cost-to-go values from the ANN, by the following equation: (13) 3. Monte Carlo Simulations An initial reduced state is randomly created and all possible attack vectors are generated for that state. The optimal policy in step 2 generates the defense vector for every pair. The simulation continues the scenario generating defense vectors for every pair until a terminating state is reached. Sample costs are calculated from the equation: (14)
The combination of the reduced state and its respective sample cost represents the training set used for training the neural network. The Monte Carlo simulations are executed multiple times such that a sufficient training set is generated for training. 4. Neural Network Training The goal of the ANN training is to minimize the error between the output of the neural network and the output of the sample costs from the Monte Carlo simulations. The learning algorithm uses backpropagation by the gradient descent method and since the gradient descent method requires differentiating the activation function to minimize the error function, the sigmoid function guarantees continuity and differentiability6. The following equation is used to adjust the weights of the network by minimizing the squared error: 8 American Institute of Aeronautics and Astronautics
(15)
where
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Number of sample costs values obtained for state Sample cost at the th sample In addition, predefined parameters are established prior to training: Error ratio tolerance: 0.15 Training iteration limit: 1500 Learning class: Batch Learning rate: 0.8 Momentum rate: 0.5 After completion of neural network trainer, the NDP algorithm is primed for production use. Although it takes time to train the ANN, the process listed above is completed offline so that the parameter vector is adjusted accordingly; subsequently, the ANN becomes a „fixed network‟ once simulation runs are executed. Since the ANN is trained offline, the training time is removed from the execution time of the simulation. D. Neuro-Fuzzy Dynamic Programming (NFDP) The NDP algorithm assumes complete knowledge of the environment with a heavy emphasis on the maximum number of wildland fires in the environment. Removing these assumptions introduces a new complexity to the problem, uncertainty to our inputs that the prior algorithms DP and NDP struggle to cope with during simulations. The NFDP algorithm seeks to minimize the impact of the uncertainty by providing robustness in the system. The NDP algorithm maintained a single ANN across all layers. The NFDP approach distributes three instances of the neural network at each layer. Although the decision of each ANN is made independent of one another, they must cooperate and make decisions to achieve their overall goal of minimizing the cost function. The one step cost function defined previously in the DP and NDP algorithms is the expected cost of assets destroyed during that time period. By distributing the neural networks to each layer, the one step cost must represent the combined cost of the three layers since a fire causes no damage until it eludes the last layer of defense. Therefore, the one step cost defined in Eq. (10) is modified to incorporate the uncertainty in the system and implementation of the multi-layer cooperation. The fuzzy term is used to estimate the contribution of the step cost at each layer. (16)
where The normalized weight is defined as (17)
Output of the fuzzy controller The defense resources must be used effectively given that the estimate of the maximum number of predicted fires may be inaccurate. The inputs and rule-base for the fuzzy inference system (FIS) are based on the rules of engagement in the Strategic Defense Initiative and terminal phase Arrow interceptor system7. The framework is based on the theater missile defense and the transition to the wildland fire scenario is mostly transparent as the concepts of the two are similar. The NFDP algorithm employs a multi-input, single-output Sugeno FIS shown in Figure 5.
9 American Institute of Aeronautics and Astronautics
Figure 5. Block Diagram of Fuzzy Controller
(18)
Each input consists of five membership functions based on sigmoid and Gaussian distributions.
Figure 6. ND Membership Functions
Figure 7. AI Membership Functions
The rule base is the application of heuristic rules in a series of IF-THEN statements to specify the output of the FIS. Each rule is a unique combination of the two inputs resulting in a total of 25 rules.
Very Far AI (Asset Importance)
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
The ND input represents the normalized distance of the fire in relation to the asset. The AI input is the asset importance calculated in Eq. (18). The value is adjusted to fit in a range from 0.85 to 1.00.
Very Low
Zero
Low
Very Light
Medium
Light
High
Slightly Medium
Very High
Medium
ND (Normalized Distance) Mid Near Slightly Light Medium Slightly Light Medium Medium Slightly Very Medium Medium Medium Very Slightly Medium Medium Heavy Very Slightly Heavy Medium Heavy Far Very Light
Table 2. Rule Base for NFDP Algorithm The Sugeno FIS output functions are characterized as crisp values: 10 American Institute of Aeronautics and Astronautics
Very Near Medium Very Medium Slightly Heavy Heavy Very Heavy
Output Membership Function Zero Very Light Light Slightly Medium Medium
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Very Medium Slightly Heavy Heavy Very Heavy
Value (zj) 0.00 0.25 0.50 0.75 1.00 1.25 1.50 2.00 3.00
Table 3. Output Membership Functions for NFDP Algorithm The defuzzification stage converts the rule base results into a final crisp output value. The output level of each rule is weighted by the fire strength of the rule. The final output of the FIS is the weighted average of all rule outputs computed as (19) where The total number of rules The weight of the rule The output level at the rule E. Fuzzy-Heuristic The fuzzy-heuristic methodology is similar to the NFDP method but vastly simpler in design. It uses a straightforward approach of applying expert knowledge in the decision making process to decide on the allocation of resources. The fuzzy controller is implemented at each layer, making decisions independently from one another. When confronted with the decision on how to allocate resources effectively and efficiently, the decision-maker simply needs to know the importance of the asset at stake and the distance the fire is from the asset. Information that is very much fuzzy. As seen in the NFDP algorithm, the two inputs are the same in the FIS yet the fuzzy-heuristic algorithm employs three membership functions for the inputs as opposed to five for the NFDP.
Figure 8. ND Membership Functions
Figure 9. AI Membership Functions
Given the fewer membership functions, the rule base is comprised of only nine rules compared to the NFDP. 11 American Institute of Aeronautics and Astronautics
AI (Asset Importance)
ND (Normalized Distance) Low Mid High Far
Zero
Low
Medium
Mid
Low
Medium
High
Near
Medium
High
Very High
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Table 4. Rule Base for Fuzzy-Heuristic Similarly, the output membership functions are crisp values based on the Sugeno FIS.
Output Membership Function Zero Low Medium High Very High
Value 0.00 1.25 2.50 3.75 5.00
Table 5. Output Membership Functions for Fuzzy-Heuristic
The fuzzy controller is used essentially to construct the defense vector at each layer. The output of the FIS is the number of resources used to defend in incoming attack for each asset . Once the defense vector is constructed, it is checked to ensure that it adheres to the constraints of the problem, the number of available resources at the stated layer. If the total number of defense resources exceeds the available amount, the defense vector is iteratively trimmed until the constraint is met by linguistic reasoning. It deducts a defense resource from the lowest priority asset, ensuring that high valued assets are always covered.
IV. Results The results are grouped into four categories based on the certain and uncertainty scenarios. Within each scenario are two tables depicting the remaining asset health and execution time of the 14 cases. The accompanying graph is a visual illustration of the average values. Due to the large range of execution time values, the y-axis is in logarithmic scale. The preferred methodology should approach the bottom right corner of the graph encompassing both high remaining asset health and low execution time. A. No Uncertainty The no uncertainty case assumes the defender has complete knowledge of the environment, is fully informed of the number of fires, and all decisions are based on this knowledge.
12 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
No Uncertainty: Remaining Asset Health (pts) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
23.855 24.918 26.255 22.597 26.017 23.107 25.463 24.229 26.858 23.223 23.888 20.971 18.261 18.008 23.404
FuzzyHeuristic 28.341 27.840 28.324 27.873 28.793 27.131 28.643 27.585 28.586 28.060 28.208 26.238 25.222 23.279 27.437
NDP
NFDP
DP
23.603 24.539 25.633 22.000 25.174 22.922 24.640 23.941 26.181 22.460 23.095 19.252 16.999 16.256 22.621
27.256 26.980 27.587 26.581 27.960 26.003 27.785 26.632 27.995 26.869 27.089 24.758 23.352 21.335 26.299
23.501 24.518 25.571 21.879 25.155 22.846 24.576 24.000 26.084 22.430 23.027 19.430 16.906 16.442 22.598
Table 6. No Uncertainty: Remaining Asset Health.
No Uncertainty: Execution Time (s) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
0.065 0.022 0.019 0.033 0.026 0.024 0.024 0.061 0.016 0.033 0.033 0.036 0.044 0.039 0.034
FuzzyHeuristic 0.042 0.019 0.016 0.032 0.019 0.026 0.022 0.022 0.014 0.029 0.028 0.038 0.046 0.043 0.028
NDP
NFDP
DP
69.472 97.129 75.404 89.599 50.733 131.831 57.235 108.099 63.476 84.139 77.075 342.376 430.694 636.462 165.266
1.736 1.480 1.186 2.167 1.299 1.929 1.445 1.629 1.028 2.010 1.862 3.201 3.996 4.171 2.081
59.118 68.084 47.415 77.102 39.111 113.161 40.366 81.108 40.504 64.315 62.925 330.441 445.853 664.130 152.402
Table 7. No Uncertainty: Execution Time.
13 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Figure 10. No Uncertainty: RAH vs Execution Time.
B. Fire Error 20% The fire error represents the percentage increase in attacking fires unbeknownst to the defender. The decision making process of the defender will be based on its estimated number of fires and not the true amount.
Fire Error 20%: Remaining Asset Health (pts) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
22.552 24.053 25.398 21.285 25.165 21.782 24.660 22.870 25.911 21.953 22.654 19.147 16.128 15.699 22.090
FuzzyHeuristic 27.884 27.316 27.834 27.452 28.494 26.295 28.343 26.863 28.115 27.604 27.760 25.397 24.232 21.989 26.827
NDP
NFDP
DP
21.969 23.441 24.529 20.312 24.106 21.227 23.525 22.324 25.070 20.927 21.446 17.407 14.491 13.826 21.043
26.633 26.361 27.017 25.906 27.509 24.904 27.382 25.618 27.191 26.037 26.361 23.593 22.061 19.581 25.440
21.883 23.370 24.423 20.321 24.063 21.220 23.486 22.275 24.986 20.849 21.361 17.325 14.451 13.850 20.990
Table 8. Fire Error 20%: Remaining Asset Health.
14 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Fire Error 20%: Execution Time (s) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
0.033 0.023 0.020 0.038 0.024 0.030 0.026 0.027 0.018 0.036 0.034 0.043 0.053 0.047 0.032
FuzzyHeuristic 0.031 0.023 0.020 0.037 0.023 0.030 0.027 0.027 0.018 0.035 0.034 0.045 0.055 0.051 0.033
NDP
NFDP
DP
90.633 119.218 96.432 108.935 63.704 164.638 70.641 142.054 85.463 103.132 96.536 411.613 526.483 768.091 203.398
2.162 1.786 1.474 2.598 1.601 2.408 1.731 2.066 1.329 2.461 2.299 3.859 4.819 4.979 2.541
76.300 84.597 62.138 90.986 49.570 139.550 49.054 110.684 59.324 82.557 79.300 394.061 523.542 772.289 183.854
Table 9. Fire Error 20%: Execution Time.
Figure 11. Fire Error 20%: RAH vs Execution Time.
C. Breakup 50% The breakup percentage is the number of fires that will jump to create another burning hotspot in the landscape when entering the third layer. This information is unknown to the defender prior to the simulation, but we assume that it recognizes the fire jump and can account for the increase in fires.
15 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Breakup 50%: Remaining Asset Health (pts) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
26.937 27.545 28.174 26.332 28.044 26.597 27.758 21.225 25.088 19.772 20.670 16.728 13.260 12.875 22.929
FuzzyHeuristic 26.589 25.798 26.756 25.646 27.546 24.435 27.194 25.412 27.255 26.022 26.301 22.940 21.347 18.934 25.155
NDP
NFDP
DP
27.112 27.572 28.051 26.356 27.821 26.850 27.551 18.960 22.816 16.297 17.182 11.684 8.371 7.939 21.040
28.817 28.692 28.973 28.514 29.109 28.288 29.014 23.851 26.303 24.127 24.589 20.530 18.328 15.971 25.365
27.091 27.610 28.031 26.377 27.803 26.869 27.576 18.972 22.806 16.177 17.062 11.624 8.384 8.153 21.038
Table 10. Breakup 50%: Remaining Asset Health.
Breakup 50%: Execution Time (s) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
0.027 0.020 0.017 0.033 0.021 0.025 0.023 0.021 0.014 0.030 0.029 0.037 0.048 0.043 0.028
FuzzyHeuristic 0.025 0.020 0.016 0.032 0.020 0.026 0.022 0.022 0.014 0.030 0.028 0.039 0.048 0.045 0.028
NDP
NFDP
DP
70.460 97.190 75.521 89.379 51.470 131.432 57.206 108.221 62.849 82.816 77.205 340.267 432.603 634.811 165.102
1.749 1.494 1.190 2.174 1.299 1.954 1.450 1.637 1.026 2.024 1.863 3.258 4.095 4.266 2.106
57.176 91.768 57.288 92.103 40.943 121.966 52.356 105.900 50.370 81.940 68.622 334.395 471.118 701.496 166.246
Table 11. Breakup 50%: Execution Time.
16 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Figure 12. Breakup 50%: RAH vs Execution Time.
D. False Alarm 50% The false alarm percentage represents the number of fires that are rendered harmless due to natural occurring events (loss of fuel, change in environmental conditions, natural fire barrier, etc).
False Alarm 50%: Remaining Asset Health (pts) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
26.937 27.545 28.174 26.332 28.044 26.597 27.758 27.191 28.424 26.647 27.025 25.481 24.101 23.931 26.728
FuzzyHeuristic 29.332 29.173 29.360 29.169 29.511 28.881 29.448 29.060 29.461 29.227 29.276 28.656 28.267 27.474 29.021
NDP
NFDP
DP
27.112 27.572 28.051 26.356 27.821 26.850 27.551 27.309 28.313 26.652 26.850 25.218 24.377 24.209 26.732
28.817 28.692 28.973 28.514 29.109 28.288 29.014 28.556 29.149 28.599 28.647 27.820 27.179 26.481 28.417
27.091 27.610 28.031 26.377 27.803 26.869 27.576 27.339 28.261 26.618 26.830 25.480 24.345 24.207 26.745
Table 12. False Alarm 50%: Remaining Asset Health.
17 American Institute of Aeronautics and Astronautics
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
False Alarm 50%: Execution Time (s) Case
Heuristic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Avg
0.026 0.019 0.016 0.032 0.020 0.024 0.021 0.024 0.014 0.030 0.033 0.035 0.044 0.043 0.027
FuzzyHeuristic 0.024 0.018 0.015 0.031 0.019 0.024 0.021 0.020 0.013 0.028 0.027 0.036 0.045 0.041 0.026
NDP
NFDP
DP
69.925 96.145 76.104 89.825 51.325 131.760 59.017 108.916 63.848 84.040 76.830 341.869 434.654 630.093 165.311
1.635 1.396 1.132 2.057 1.276 1.909 1.353 1.509 0.949 1.890 1.764 2.834 3.573 3.620 1.921
59.779 80.064 49.731 85.053 41.034 116.249 43.229 84.215 41.572 71.430 58.344 300.587 461.398 668.980 154.405
Table 13. False Alarm 50%: Execution Time.
Figure 13. False Alarm 50%: RAH vs Execution Time.
18 American Institute of Aeronautics and Astronautics
E. Summary Results for All Scenarios
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Remaining Asset Health, pts Fuzzy Heuristic NDP NFDP Heuristic No Uncertainty Fire Error 20% Breakup 50% False Alarm 50%
23.404 22.090 22.929 26.728
27.437 26.827 25.155 29.021
22.621 21.043 21.040 26.732
26.299 25.440 25.365 28.417
DP 22.598 20.990 21.038 26.745
Table 14. Summary Results for Remaining Asset Health
Heuristic No Uncertainty Fire Error 20% Breakup 50% False Alarm 50%
0.034 0.032 0.028 0.027
Execution Time, s Fuzzy NDP NFDP Heuristic 0.028 0.033 0.028 0.026
165.266 203.398 165.102 165.311
2.081 2.541 2.106 1.921
DP 152.402 183.854 166.246 154.405
Table 15. Summary Results for Execution Time
V. Conclusion The expectation of the dynamic programming algorithm was the most optimal solution but at a huge computational cost. Although the DP algorithm was slow in execution time, it did not perform as well as expected in results. This may be due to the fact that the system is more stochastic than deterministic due to , the algorithm essentially self destructs as it uses previous solutions from revisited states that have roughly a 50/50 chance of being the optimal choice. Increasing by improvements in firefighting defense capabilities may improve upon the results for the certainty scenario but the DP still is limited by the execution time. The poor performance may also be due to the limited learning; we executed 15,000 Monte Carlo simulations to use as training data. Increasing the number of Monte Carlo simulations may improve the remaining asset health but similarly we do not foresee any improvement in execution time. The tables show that as the complexity of the system increases, the execution increases exponentially relinquishing any possibility of a real-time decision. The objective of the NDP algorithm was to reduce the computation costs of DP by training an artificial neural network to quickly compute the expected cost-to-go function. The algorithm performed relatively as poor as DP. The parameters of the neural network (error ratio tolerance, learning rate, momentum rate) remained constant throughout the project. Adjustments to these parameters may improve upon the execution time but we do not expect any significant improvement in the remaining asset health values. The greedy-based heuristic was following simple linguistic rules. The decision tree was simple and the execution time was quick as expected. It also performed better than expected for the remaining asset health metric, which may be in large part due to the scenario setup and inventories of each case. Adding more varying and complex cases may show if the algorithm is truly robust to changes. The key purpose of the paper was to show the ability of fuzzy logic to generate near optimal results in real-time while being robust to uncertainties in the environment. The first application was applying a fuzzy inference system 19 American Institute of Aeronautics and Astronautics
on the NDP algorithm, which showed a vast improvement in both metrics of interest. The algorithm remained robust while only slightly increasing in execution time as the complexity of the system increased. The goal of the fuzzy-heuristic algorithm was to create a simple fuzzy controller based on expert knowledge, staying clear of the heavy mathematical optimization approaches which require more computational requirements on the system. The controller was fine-tuned until its performance nearly dominated all other methodologies (the NFDP performed better only in the breakup scenario). Additionally, the fuzzy-heuristic execution time was 1 to 2 orders of magnitude better for all scenarios compared to the NFDP algorithm. There is no guarantee that the fuzzy-heuristic controller provides the most optimal control policy as attainable, as it is governed by our heuristics. However, we have shown through our certain and uncertainty scenarios that a fuzzy logic component adds robustness to the system as the complexity increases, provides fast execution times, and near optimal results.
Downloaded by UNIVERSITY OF CINCINNATI on November 24, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2011-1450
Acknowledgments We would like to thank Dr. Praveen Chawla and Edaptive Computing, Inc. for their support in this project.
References 1
Mandel, J., Chen, M., Franca, L. P., Johns, C., Puhalskii, A., Coen, J. L., Douglas, C. C., Kremens, R., Vodacek, A., and Zhao, W., A Note on Dynamic Data Driven Wildfire Modeling, Springer-Verlag, 2004, pp. 725-731. 2 Bertsekas, D. P., Dynamic Programming and Optimal Control, 3rd ed., Vol. 1, Athena Scientific, Belmont, 2005. 3 Bertsekas, D. P., and Tsitsiklis, J., Neuro-Dynamic Programming, Athena Scientific, Belmont, 1996. 4 Bertsekas, D. P., Homer, M. L., Logan, D. A., Patek, S. D., and Sandell, N. R., “Missile Defense and Interceptor Allocation by Neuro-Dynamic Programming,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 30, No. 1, 2000, pp. 42-51. 5 Kaelbling, L. P., Littman, M. L., and Moore, A. W., “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, Vol. 4, 1996, pp. 237-285. 6 Rojas, R., Neural Networks A Systematic Introduction, Springer, Berlin, 1996. 7 Naveh, B., Levy, E., and Cohen, K., “Theater Ballistic Missile Defense Architecture Development,” Theater Ballistic Missile Defense, AIAA, Reston, 2001, pp. 77-97. 8 Walch, B., “The Fire This Time,” TIME, November 2007, pp. 14-17, [http://www.time.com/time/classroom/glenspring2008/pdfs/Nation.pdf Accessed 6/10/10.] 9 Finney, M. A., “Rocky Mountain Research Station”, US Forest Research and Development, March 2004, [http://www.fs.fed.us/rm/pubs/rmrs_rp004.html Accessed 2/20/09.]
20 American Institute of Aeronautics and Astronautics