Non-Darwinian evolution for the source detection

0 downloads 0 Views 1MB Size Report
the source identification of all Prairie Grass releases was performed with a decreasing number of sensor ...... Learning. Addison Wesley, Reading, MA.
Atmospheric Environment 45 (2011) 4497e4506

Contents lists available at ScienceDirect

Atmospheric Environment journal homepage: www.elsevier.com/locate/atmosenv

Non-Darwinian evolution for the source detection of atmospheric releases Guido Cervone a, *, Pasquale Franzese b a b

Dept. of Geography and Geoinformation Science, George Mason University, 4400 University Dr., Fairfax, VA, USA Center for Earth Observing and Space Research, George Mason University, 4400 University Dr., Fairfax, VA, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 28 October 2010 Received in revised form 18 April 2011 Accepted 19 April 2011

A non-Darwinian evolutionary algorithm is presented as search engine to identify the characteristics of a source of atmospheric pollutants, given a set of concentration measurements. The algorithm drives iteratively a forward dispersion model from tentative sources toward the real source. The solutions of non-Darwinian evolution processes are not generated through pseudo-random operators, unlike traditional evolutionary algorithms, but through a reasoning process based on machine learning rule generation and instantiation. The new algorithm is tested with both a synthetic case and with the Prairie Grass field experiment. To further test the capabilities of the algorithm to work in real-world scenarios, the source identification of all Prairie Grass releases was performed with a decreasing number of sensor measurements, and a relationship is found between the precision of the solution, the number of sensors available, and the levels of concentration measured by the sensors. The proposed methodology can be used for a variety of optimization problems, and is particularly suited for problems where the operations needed for evaluating new candidate solutions are computationally expensive. Published by Elsevier Ltd.

Keywords: Non-Darwinian evolution Source characterization Evolutionary algorithms Dispersion modeling

1. Introduction The problem of designing efficient methodologies to locate and characterize the sources of atmospheric releases is attracting much interest. A promising approach is the coupling of artificial intelligence algorithms with dispersion models. Typically, the dispersion model performs a number of simulations from a series of tentative sources. The purpose of the artificial intelligence algorithm is to determine the series of tentative sources in order to efficiently converge toward the real source. Concentration measurements are necessary to implement the process. For each tentative source, the algorithm analyzes the error between the simulated and the observed values, and prescribes the next tentative source (or the next set of tentative sources). Knowledge of some meteorological variables is not strictly required in principle, but in practice may be essential especially for large scale (i.e., no fixed winds) dispersion scenarios. The main appeal of this paradigm is that the dispersion model does not need to be modified. Therefore, the method is easy to implement with any existing dispersion model independent of its complexity and internal working.

* Corresponding author. E-mail addresses: [email protected] (G. Cervone), [email protected] (P. Franzese). 1352-2310/$ e see front matter Published by Elsevier Ltd. doi:10.1016/j.atmosenv.2011.04.054

Several popular artificial intelligence algorithms use Bayesian inference coupled with stochastic sampling (e.g., Gelman et al., 2003; Johannesson et al., 2004; Senocak et al., 2008; Delle Monache et al., 2008; Chow et al., 2006). A different approach based on evolutionary algorithms has also been proposed and tested (e.g., Haupt et al., 2007; Allen et al., 2007; Cervone and Franzese, 2010b, 2010a). Evolutionary computation algorithms (of which genetic algorithms are one of the main paradigms, in addition to evolutionary strategy, evolutionary programming and genetic programming) are iterative stochastic methods that evolve in parallel a set of potential solutions through a trial and error process. Potential solutions are encoded as vectors of values, which can include a number of source characteristics such as, e.g., its geometry and size, location, and emission rate. The potential solutions are evaluated according to an objective function (often referred to as fitness function, cost function, error function, or skill score), which is a measure of the difference between the concentration field simulated by the dispersion model from a tentative source, and the available observations. The evolutionary process consists of selecting one or more candidate solutions whose vector values are modified to minimize the objective function. A selection process is invoked that determine which of the new solutions survive into the next generation. While the methodologies and algorithms that are subsumed by this name are numerous, most of them share one fundamental characteristic: they use

4498

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

non-deterministic operators such as mutation and recombination as the main engine of the evolutionary process. These operators are semi-blind (use little to no information on previous performance), and the evolution is not guided by knowledge learned in the past generations. In fact, most evolutionary computation algorithms are inspired by the principles of Darwinian evolution, defined by “.one general law, leading to the advancement of all organic beings, namely, multiply, vary, let the strongest live and the weakest die” (Darwin, 1859). The Darwinian evolution model is simple and fast to simulate, and it is domain-independent. Because of these features, evolutionary algorithms have been applied to a wide range of optimization problems (Ashlock, 2006). In this paper we introduce a multi-strategy iterative approach that pairs the traditional Darwinian operators with a nonDarwinian machine learning evolutionary process. This approach provides a different search strategy where new candidate solutions are generated by an inductive inference reasoning process. The main drawback compared with traditional algorithms is higher algorithm complexity and a possible increase of computational needs. There have been several attempts to extend the traditional Darwinian operators with statistical and machine learning approaches that use history information from the evolution to guide the search process. The main challenges are to avoid local maxima in the objective function and to increase the rate of convergence. The majority of such methods use some form of memory and/or learning to direct the evolution towards particular directions thought more promising (Grefenstette, 1987, 1991; Sebag and Schoenauer, 1994; Sebag et al., 1997; Reynolds, 1999; Hamda et al., 2002). Estimation-of-Distribution Algorithms (EDA) are a form of evolutionary algorithms where an entire population may be approximated with a probability distribution (Lozano, 2006). New candidate solutions are not chosen at random, but using statistical information from the sampling distribution. The aim is to avoid premature convergence (i.e., quickly converging towards a local solution) due to a lack of proper exploration of the search space, and to provide a more compact representation. Discriminating between best and worst performing individuals could provide additional information on how to guide the evolutionary process. The Learnable Evolution Model (LEM) includes a machine learning rule induction algorithm to learn attributional rules which discriminate between best and worst performing candidate solutions (Cervone et al., 2000b, 2000a). New individuals are then generated according to inductive hypotheses discovered by the machine learning program. The main difference with Darwinian-type evolutionary algorithms is the way new individuals are generated. In contrast to Darwinian operators of mutation and/or recombination, non-Darwinian machine learning conducts a reasoning process in the creation of new individuals. Specifically, at each step (or selected steps) of evolution, a machine learning method generates hypotheses characterizing differences between high-performing and low-performing individuals. These hypotheses are then instantiated in various ways to generate new individuals. The hypotheses indicate the areas in the search space that are likely to contain high-performing individuals. New individuals are selected from these areas and then classified as belonging to either a high-performance or a low-performance group, depending on their fitness value. These groups are then differentiated by the machine learning program, yielding a new hypothesis as to the likely location of the global solution. The possible advantages in generating new individuals using non-Darwinian compared to traditional Darwinian operations mainly depend on the evolution length, defined as the number of function evaluations needed to determine the target solution, and

the evolution time, defined as the execution time required to achieve this solution. A choice between non-Darwinian and Darwinian algorithms is based on the tradeoff between the complexity of the population-generating operators and the evolution length. In our case, the proposed operations of hypothesis generation and instantiation are more computationally costly than mutation and/or crossover, but the evolution length is typically much shorter than in Darwinian evolutionary algorithms. If the evaluation of a new individual is computationally taxing, speeding up the convergence rate becomes particularly important. Therefore, the non-Darwinian method as engine of evolution is only convenient for problems with high objective function evaluation complexity. Non-Darwinian evolution is not a replacement for traditional evolutionary algorithms. It is a new paradigm to speedup a certain class of problems that require particularly complex operations to generate and evaluate new candidate solutions. The source detection of atmospheric pollutants is an ideal problem due to the complexity of the function evaluation which may require expensive numerical simulations. This paper describes an application of the combined Darwinian and non-Darwinian evolutionary processes for the source characterization of atmospheric releases. The AQ for Source Detection (AQ4SD) program (Cervone et al., 2010b) was used as main engine of evolution to generate the hypotheses. The proposed system differs from earlier LEM applications in three ways: a) The reasoning process generates hypotheses (rules) that discriminate between high and low performing individuals; b) The native real value encoding of variables; and c) The selection mechanism that determines the makeup of the next generation. The new system was tested in two cases: simulated data generated by synthetic releases, and the real concentration measurements from the Prairie Grass controlled field experiment (Barad, 1958). 2. Methodology The proposed methodology is based on an iterative process guided by hypotheses generation and instantiation. Similarly to an evolutionary computation methodology, it evolves solutions in parallel, and evaluates them according to an objective function. However, the traditional Darwinian birth operators are paired with a non-Darwinian machine learning process that learns hypotheses, in the form of rules, to describe the characteristics of the highest performing candidate solutions. Specifically, at each step of evolution, a population is divided into High-performing (H-group) and Low-performing individuals (L-group), according to the fitness score as determined by the objective function. These groups are selected from the current population, or a combination of the current and past populations. Then a learning program creates general hypotheses distinguishing between these two groups, which are instantiated in various ways to produce new candidate individuals. New births occur in the areas of the search space identified as regions most favorable for good solutions. 2.1. Hypotheses generation The core of the system is the machine learning rule induction algorithm to generate the hypotheses. In this study, the AQ4SD classifier program was specifically designed and optimized to be run iteratively and to drive a non-Darwinian evolutionary process (Cervone et al. 2010b). AQ is a machine learning classifier methodology which generalizes sets of examples with respect to one or more sets of counter-examples (Michalski, 1969, 1973, 1983; Mitchell, 1997; Cervone et al., 2001, 2010b; Keesee, 2006).

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

AQ is a form of supervised learning, wherein classified data are generalized to identify the characteristics of the entire class. More details on the AQ classifier are given in Cervone et al. (2010b). In general, a multivariate description is a classified event of type x1 ; .; xk , and c, where each x is an attribute value and c is the class it belongs into. For each class c, AQ considers as positive all the events that belong to class c, and as negative all the events belonging to the other classes. In its simplest form, given two sets of multivariate descriptions P1 ; .; Pn (positive events) and N1 ; .; Nm (negative events), AQ finds rules that cover all P examples, and do not cover any of the N examples. The algorithm learns from examples (positives) and counterexamples (negatives) patterns of attribute-values (or rules) that discriminate the characteristics of the positive events with respect to the negative events. Such patterns are generalizations of the individual positive events, and depending on AQ’s mode of operation may vary from being totally complete (covering all positives) and consistent (not covering any of the negatives), to accepting a tradeoff of coverage to gain simplicity of patterns. AQ uses a highly descriptive language to represent the learned knowledge. In particular it uses rules to describe patterns in the data. A prototypical AQ rule is defined as the following logical equation:

Consequent)Premise

(1)

where Consequent and Premise are conjunctions of Conditions. A Condition is simply a relation between an attribute and a set of values it can take:

½Attribute $ Relation $ ValueðsÞ

(2)

Depending on the attribute type, different relations may exist. For example, for unordered categorical attributes the relations ’’ cannot be used as they are undefined. Typically the consequent consists of a single condition, whereas the premise consists of a conjunction of several conditions. A sample rule used in the source detection problem is:

½Group ¼ H)½X ¼ 20:12::33:33m ½Y < 33:44m ½Theta ¼ 128 Deg : p ¼ 61; n ¼ 3

The learned hypotheses (attributional rules) are used to generate new individuals by randomizing variables within the ranges of values defined by the rule conditions. If a rule does not refer to some variable, it means that this variable was not needed for distinguishing between the H-group and the L-group. A problem then arises as to what values should be assigned to such variables when generating new individuals. There can be different methods for handling this problem. Early experimental results showed that values can be assigned randomly within the range of the individuals in the current population. This is a conservative method that does not introduce values not already present in the population. 2.3. Evolutionary algorithm The evolutionary process can use hypothesis generation and instantiation as the sole engine or evolution, or pair it with traditional Darwinian operators of mutation and recombination. The two strategies are paired because hypothesis creation and instantiation typically displays a faster convergence rate than mutations and/or recombination, but it is computationally more costly (e.g. (Cervone et al., 2000a, 2000b). By allowing the interchangeable execution of both Darwinian and non-Darwinian operators, our method can utilize the best features of both paradigms. Fig. 1 presents a general flow diagram of the methodology. The Initialize population module creates individuals randomly or according to a given distribution. The number of individuals generated is specified by the population size parameter. The Mode Selection module switches between non-Darwinian and Darwinian modes when a mode-termination condition is satisfied. Mode termination is prompted when the rate of convergence of the objective function falls below a threshold over a given

(3)

where the annotations p and n indicate the number of positive and negative events covered by this rule. This type of rule is usually called attributional to be distinguished from more traditional rules which use a simpler representation language. The main difference with traditional rules is that referee (attribute), relation and reference may include internal disjunctions of attribute values, ranges of values, internal conjunctions of attributes, and other constructs. Such a rich representation language means that very complex concepts can be represented using a compact description. However, attributional rules have the disadvantage to be more prone to over fitting with noisy data. Usually multiple rules must be learned to cover all the examples, and are called a ruleset. A ruleset is a disjunction of rules, meaning that even if only one rule is satisfied, then the consequent is true. Multiple rules can be satisfied at one time because the learned rules could be intersecting each other.

2.2. Hypotheses instantiation The learned rules identify a portion of the search space as areas where good solutions are likely to be found. These rules can be easily instantiated to create new candidate solutions within the new boundaries. This is arguably one of the main advantages in using attributional rulesets to drive the evolutionary process.

4499

Fig. 1. Flowchart of the non-Darwinian evolutionary algorithm.

4500

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

maximum number of generations specified by a persistence parameter. The toggling between the two modes continues throughout the evolutionary process. In non-Darwinian mode, two methods of selecting H- and Lgroups are supported: Fitness-based and Population-Based. In the Fitness-based method, the H-group and L-group consist of individuals whose fitness is above the High Fitness Threshold (HFT) and below the Low Fitness threshold (LFT), respectively. In the Population-based method, the H-group and L-group consist of portions of the population defined by the High Population Threshold (HPT) and the Low Population Threshold (LPT), respectively. HPT defines the percentage of the highest performing individuals, and LPT the percentage of the lowest performing individuals to be selected for the H- and L-group, respectively. HFT, LFT, HPT and LPT are controllable parameters. The H- and L-group are used as examples and counter-examples in a machine learning classification problem. Although the general methodology is not tailored towards a particular classifier, the AQ4SD learning system (Cervone et al., 2010b) was specifically developed for the source detection problem. The hypotheses generation and instantiation processes are described in Sections 2.1 and 2.2. In Darwinian mode new individuals are generated by selecting representative individuals and mutating them. The underlying methodology used derives from the Evolutionary Strategy (ES) algorithms (Rechenberg, 1971; Schwefel, 1974). First, representative individuals (parents) are selected from the current population according to a fitness proportional selection, in which candidate solutions with a smaller error are more likely to be selected, while at the same time leaving a chance to inferior solution to create offspring Goldberg (1989), p. 11). New individuals are created by mutation (Bäck, 1996), taking as input the selected individual (the parent) and generating one or more offspring (the child) by varying one or more of its variables. In the current work, we used an adaptive mutation operator as described in Cervone et al. (2010a). In brief, the adaptive mutation operator encodes a si parameter as one of the unknowns, and uses this parameter to control the magnitude of mutations. Unlike the case of canonical ES with the one-fifth rule (Schwefel, 1974), there is no global set of si meta parameters, as each solution has and uses its own copy. The intuition behind this adaptive mechanism is that for a solution to perform well (i.e., be better than other potential solutions generated) it needs to have been generated both from a good combination of optimized parameters and using an adequate set of meta parameters. Furthermore, this solution will pass on to its offspring (with variation) both its own good combination of optimized parameters and a good working set of meta parameters. The next operations are common for both modes of operation. The Evaluate new individuals module determines the fitness of each individual. Evaluating an individual involves running a transport and dispersion model, and comparing the simulated and the observed concentrations. In our case, this is not a computationally expensive operation because we are using an analytical model. However, if a complex numerical dispersion model is used, a new individual evaluation can be computationally taxing. In this case, the advantage of reducing the number of evaluations can offset the additional complexity of generating and instantiating hypotheses. The Population survival selection combines previous individuals with the newly generated ones. The methodology uses binary tournament to select new individuals (Goldberg, 1989). Specifically, old and new candidate solutions are merged into a single population. Then, two solutions are randomly drawn and compared; the one with the higher error is removed from the population, the other is kept in the population. The process continues until the population is resized to the fixed population size parameter.

The Adjust parameters module collects statistics of previous generations (e.g. success rate, search domain reduction) and permits the adjustment of the parameter settings for both nonDarwinian (e.g. generality of hypotheses instantiation, HPT and LPT) and Darwinian (e.g. mutation rate) modes of operation. In the current work, the only dynamically adjusted parameter is the mutation rate for the Darwinian mode. All parameters for the nonDarwinian mode are kept invariant. The Terminating condition determines when to stop the execution of the program. The condition can be defined by a maximum number of function evaluations, by the number of generations without improvement, or by a limit on the execution time. 2.4. Graphical illustration of the proposed methodology Fig. 2 shows one iteration of the non-Darwinian evolutionary process described. A simple source detection problem is defined for four variables: the x1 and x2 and x3 coordinates of the source in m, and the source emission rate x4 in gs-1. The continuous variables ðx1 ; .; x4 Þ are represented in Fig. 2 as discrete with domain D ¼ f0; .; 4g. The solution of the problem is found when x1 ¼ x2 ¼ x3 ¼ x4 ¼ 3. Fig. 2a shows the initial population of candidate solutions. Each candidate solution is divided into the H- and L-group according to its evaluation score (Fig. 2b). In this example, the HPT and LPT parameters were set to .4, and therefore the H and L groups are comprised of 80% of the initial population, 40% for the highest and lowest performing individuals, and the remaining 20% are disregarded. The H- and L- group are used by the machine learning classifier to learn descriptions (rules) that generalize the H-group, and thus reduce the search space to only certain area of the original space. The result of this learning process is shown in Fig. 2c, and it consists of a ruleset comprising two rules:

½Group ¼ H)½x1 ½x2 ½x3 )½x2 ½x4

¼ ¼ ¼ ¼ ¼

1::3 1::4 1::4 : p ¼ 11; n ¼ 0 3::4 1::3 : p ¼ 11; n ¼ 0

(4)

The two rules coincidentally have the same statistical values. Assuming the 15 positive candidate solution (events) associated with the H group (Fig. 2b), both rules in Eq. (4) cover 11 H-group and 0 L-group events. This means that the rule are highly intersecting, since they cover 22 events out a possible 15. The rules are instantiated by generating new candidate solutions within the new boundaries of the search space. For the variables cited in the rules, random values within the reference set are chosen. For the variables not cited in the rules, values are set equal to those of a randomly selected individual within the original population (it can be restricted to only the H-group). Therefore, using Rule 1, a sample candidate solution might look like the following: E ¼ {x1 ; x2 ; x3 ; x4 } ¼ {1; 3; 2; 0} where x1 ; .; x4 were selected according to the rule and x4 selected from a randomly drawn event. The result of the instantiation process is a new population of candidate solutions (Fig. 2d). In this step, all new events lie within the generalized region. The search mechanism described is thus very different than traditional evolutionary operators, since it consists of a progressive reduction of the solution space. 2.5. Transport and dispersion simulations The dispersion model used to perform the forward simulations is an analytical Gaussian model, which is adequate for simple and short range dispersion configurations. More complex scenarios

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

may require more sophisticated numerical models. We use the following Gaussian model with a single reflection at the ground, which determines the predicted mean concentration Cp at a location x, y and z generated by a source located at xs, ys, and zs as:

Cp ðx; y; z; xs ; ys ; zs Þ ¼

2pU

h

Qgy gz 

s2s þ s2y s2s þ s2z

i1=2

(5)

with

# ðy  ys Þ2  ; gy ¼ exp   2 s2s þ s2y

(6)

# " # ðz  zs Þ2 ðz þ zs Þ2  þ exp    gz ¼ exp   2 s2s þ s2z 2 s2s þ s2z

(7)

"

"

where Q is the source mass emission rate, U is the wind speed, sy ðx; xs ; jÞ and sz ðx; xs ; jÞ are the crosswind and vertical dispersion coefficients (i.e. the plume spreads) where j describes the atmoj ¼ A to j ¼ F), and spheric stability class (i.e., s2s ¼ s2y ðxs ; xs ; jÞ ¼ s2z ðxs ; xs ; jÞ is a measure of the area of the source. The result of the simulation is the concentration field

4501

generated by the release along an arbitrary wind direction q. The dispersion coefficients were computed from the tabulated curves of Briggs (Arya, 1999). In this study, U and j are assumed to be known, and their values set according to the reported observations. 2.6. The Prairie Grass field dispersion experiment The Prairie Grass experiment is a well documented campaign of dispersion measurements in controlled conditions (Barad, 1958). The experiment consists of 68 releases of about 10 min each of trace gas SO2 from a ground level point source. Mean concentration samplers were deployed along five arcs located at 50 m, 100 m, 200 m, 400 m and 800 m from the source, measuring 10 min average concentrations. Extensive meteorological measurements were collected during the experiments. The stability of the atmosphere depends on the day and time of the release, and experiments were conducted under all stability classes. Fig. 3 shows reconstructed concentration fields averaged over all releases in each atmospheric class, with the wind direction aligned with the horizontal axis. In addition to the observed Prairie Grass measurements, we have created a synthetic dataset simulating each of the 68 releases using the Gaussian model (5). The synthetic dataset is generated using the

Fig. 2. Sample illustration of one generation using the proposed methodology: a) Initial population, b) Division of the population in H- and L- groups c) Learned rules that generalize the H-group, and d) New population generated through rule instantiation. All new candidate solutions are generated within the boundaries of the learned hypotheses.

4502

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

white and gray background. Within each arc, the sensors are sorted counterclockwise. The middle and right graphs show the concentration field reconstructed from the measurements and from the simulated values respectively. The top graphs refer to release 55, which shows a very good agreement between the observed and simulated values. The bottom graphs show release 17, where there is a large discrepancy between the observed and simulated values. The shift in the location of the peak concentration is most likely due to errors in the measurement of the wind direction. The accuracy of the simulations of all Prairie Grass experiments is assessed by calculating the following measure of the error between simulations and observations (Allen et al., 2007):

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u  2 u u log10 ðCo þ 1Þ  log10 Cp þ 1 Dc ¼ t ½log10 ðCo þ 1Þ2 Fig. 3. Reconstructed concentration fields averaged over all releases in each atmospheric class, with the wind direction aligned with the horizontal axis.

source, wind characteristics, and atmospheric class of each of the original Prairie Grass experiment, and the simulated concentrations are recorded at the corresponding sensor locations for the original experiments. The synthetic dataset is generated for two reasons. First, it allows the assessment of the accuracy of the model simulations with respect to the observed measurements. This gives an indication of how much of the error in the predicted source location of the real case is attributable to the search algorithm and how much is intrinsic to the dispersion model; second, it provides an ideal dataset against which the search algorithm can be tested and assessed in the absence of noise and of modeling uncertainties. Fig. 4 illustrates the differences between the simulated and observed concentrations for two of the Prairie Grass releases. The left graphs show the concentration observed at each sensor, plotted along with the simulated concentration profile. The sensors are positioned along the five concentric arcs indicated by alternate

(8)

where Co and Cp are the observed and predicted concentrations at the sensors locations, respectively, and the bar indicates an average over all the observations. Eq. (8) was calculated for each of the 68 Prarie Grass releases. The function Dc was shown to be a suitable objective function for source detection algorithms (Cervone and Franzese, 2010b). In this study, it will also be used as the objective function to be minimized by the search algorithm. Table 1 reports for each Prairie Grass experiment the atmospheric class, the experiment ID (as assigned in the original dataset), the heat flux, release rate, mean wind speed and direction, the number of observations (i.e. the number of sensors detecting concentration), the sum of all the concentration values measured at the sensors, and the error Dc as calculated by Eq. (8). In general, unstable atmospheric conditions (class A) correspond to a large number of sensors because of enhanced plume spread. Conversely, stable atmospheric conditions (class F) result in the smallest number of ground measurements, because the footprint of the dispersion is small and fewer sensors are active compared to the other stability classes. The errors associated with the experiments in atmospheric class F are mainly attributable to two factors. First,

Fig. 4. Comparison of observed and synthetic concentrations for Prairie Grass release 55 (top) and 17 (bottom). Left: Cross-wind profiles of observed and synthetic concentrations plotted as functions of the sensor number. The sensors are positioned along five concentric arcs, located at 50 m, 100 m, 200 m, 400 m and 800 m from the source. Each arc is represented in one of the panels which are indicated by alternate white and grey background. For each arc, the sensor numbers are sorted counterclockwise. Center: Reconstructed surface concentrations from observed measurements. Right: Reconstructed surface concentrations from simulated synthetic concentrations.

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

Table 1 Characteristics of the 68 Prarie Grass field experiments, and the calculated error Dc between the simulated and observed concentrations.

j ID

HeatFlux Q (g/s) U (m/s) q (deg) # SID Dc C (W/m2) Measurements (mg/m3)

A A A A A

15 16 25 47 52

129.90 213.54 94.99 217.33 293.13

95.5 93.0 101.4 103.1 104.0

2.90 2.96 2.48 3.02 4.04

209 192 177 243 132

135 158 224 148 196

5554 3621 5747 4543 3999

0.38 0.31 0.47 0.21 0.44

B B B B B

1 2 7 10 48S

81.50 37.12 265.13 203.11 140.17

81.5 83.9 89.9 92.1 104.0

2.39 1.74 4.02 4.15 2.77

150 100 188 225 216

223 202 260 155 206

4882 5231 3482 3407 3714

0.65 0.88 0.75 0.38 0.68

C C C C C C C C C C

5 8 9 19 27 43 44 49 50 62

202.83 127.33 194.07 187.53 239.83 229.18 263.20 270.34 331.86 106.64

77.8 91.1 92.0 101.8 98.8 98.9 100.7 102.0 102.8 102.1

5.15 4.06 6.11 5.33 5.40 4.68 5.39 6.03 6.06 4.61

176 184 204 166 184 170 158 199 215 212

138 139 154 126 133 189 177 159 138 127

2759 4138 3101 3645 3559 4146 3774 3504 3498 4266

0.35 0.30 0.42 0.38 0.39 0.49 0.51 0.40 0.35 0.26

D 6 80.10 D 11 157.05 D 12 284.41 D 17 15.71 D 20 396.31 D 21 29.26 D 22 45.95 D 23 26.78 D 24 18.40 D 26 200.05 D 29 43.29 D 30 243.64 D 31 155.75 D 33 180.74 D 34 254.66 D 35S 22.23 D 37 23.12 D 38 18.13 D 42 37.95 D 45 62.81 D 46 31.19 D 48 198.21 D 51 242.16 D 54 32.86 D 55 41.27 D 56 28.52 D 57 48.22 D 60 36.20 D 61 305.16 D 65 47.79 D 67 22.77

89.5 95.9 99.1 56.5 101.2 50.9 48.4 40.9 41.2 97.6 41.5 98.4 96.0 94.7 97.4 41.8 40.3 45.4 56.4 100.8 99.7 104.1 102.4 43.4 45.3 45.9 101.5 38.5 102.1 44.1 45.0

5.99 6.77 7.21 2.87 8.26 5.31 6.39 5.37 5.21 5.68 3.40 6.28 6.91 6.90 8.46 3.40 4.00 3.70 5.27 5.31 4.86 6.91 6.18 3.40 5.17 4.15 6.42 4.04 7.00 3.93 3.85

183 184 194 184 178 181 176 128 141 190 220 196 225 181 146 135 187 170 212 163 134 214 245 140 156 153 200 198 203 178 185

110 97 105 61 121 74 69 79 78 161 132 131 140 116 99 67 81 63 78 103 99 105 150 62 73 73 119 66 158 59 69

3268 2963 2834 5079 2879 2563 1847 1664 1642 3202 3568 3443 3102 2621 2518 3176 2045 2918 2440 4593 5612 2525 3755 2981 2039 2668 3601 2059 3153 2476 2600

0.47 0.16 0.39 0.58 0.55 0.53 0.45 0.19 0.14 0.63 0.51 0.57 0.54 0.47 0.30 0.39 0.29 0.30 0.27 0.34 0.26 0.36 0.58 0.38 0.21 0.20 0.40 0.34 0.61 0.55 0.19

E E E E E

18 28 41 66 68

24.52 17.25 32.58 33.58 16.88

57.6 41.7 39.9 43.1 42.8

2.68 2.12 3.16 2.56 2.19

187 174 198 166 174

68 74 49 94 75

5460 4832 2782 4357 5236

0.51 0.31 0.38 0.35 0.24

F F F F F F F F F F F F

3 4 13 14 32 35 36 39 40 53 58 59

0.68 4.54 4.38 2.13 17.72 7.45 10.00 17.95 15.39 20.44 18.44 22.24

56.3 50.5 61.1 49.1 41.4 38.8 40.0 40.7 40.5 45.2 40.5 40.2

0.66 0.91 0.92 0.74 1.60 1.10 1.37 1.69 1.58 1.56 1.65 2.02

150 216 190 170 171 132 160 140 180 132 178 174

289 443 99 153 64 66 70 79 102 47 42 47

5550 10619 7062 8990 7167 7497 7530 3919 4011 6625 6881 5583

1.00 0.86 0.68 0.52 0.32 0.71 0.47 0.48 0.73 0.54 0.50 0.29

4503

the fewer number of measurements compared to other atmospheric classes generates a larger uncertainty in the reconstructed field, and second, small errors in the observed wind direction cause large errors in the reconstructed field (e.g., see Fig. 4). 3. Results The proposed methodology was tested to reconstruct the source characteristics for both the synthetic and the observed dataset. A total of 136 reconstructions were performed (68 for each dataset). Each reconstruction is the average of 30 runs, where each run differs only from the initial randomly generated population of candidate solutions. A total of 4080 runs are performed, requiring under 6 h of CPU time on a dual core E8400 3.00 GHz computer. The experiments are run with the following input parameters: The population consists of 500 individuals; 50 new candidate solutions are generated at each iteration; A total of 500 iterations are made; The Low and High population thresholds (HPT and LPT) are set to 30%; The mode selection persistence parameter is set to 10 (i.e., if the rate of convergence in 10 consecutive runs is below an internal threshold, the algorithm toggles between non-Darwinian and Darwinian modes); A binary tournament selection was used as survival selection mechanism (Goldberg, 1989). The source characterization problem involves identifying 5 variables, namely the x, y and z coordinates in m, the source size ss in m2, and the emission rate Q in g/s. We assume that wind speed and direction, atmospheric class and terrain characteristics are known. Fig. 5 shows a summary of the predicted distance from the real source using the observed Prairie Grass data and the synthetic data. The atmospheric class of each experiment is also shown. The light color in each pair of columns indicates the synthetic dataset, the darker tone indicates the observed Prairie Grass dataset. The Prairie Grass experiment identifier is reported at the base of each column, the predicted distance from the source in meters is reported at the top of the columns. In most cases, the results for the synthetic and observed datasets are comparable. The results for atmospheric class F are consistently worse. In general, the results show that the search algorithm performs consistently better with the synthetic data, mostly due to the model’s ability to better characterize the source in the absence of noise and error measurements. However, even with the observed data, the errors are usually very small, in most cases about or smaller than 50 m, which is the distance from the source at which the closest ground measurements are made. Table 2 summarizes the mean predicted source characteristics for both the synthetic and the Prairie Grass datasets as functions of the atmospheric class. The results are given in terms of absolute errors D, where for any variable x, Dx ¼ xobs  xpred . The distance from the source D is calculated as D ¼ ½ðDxÞ2 þ ðDyÞ2 þ ðDzÞ2 1=2 . 3.1. Sensitivity to the number of sensors and their measured concentrations In the following, we investigate the relationship between the amount of information provided to the search algorithm and the final error in the characteristics of the source. The first test consists of a series of runs using the observed Prairie Grass dataset, but only using a limited number of sensors. The runs were performed using an increasing number of sensors, selected at random. For each number of sensors, 30 different combination of sensors were used. Each combination was run 30 times with different initial random candidate solutions. All 68 releases were simulated. Fig. 6 shows the cumulative error for all 68 Prairie Grass experiments as a function of the number of sensors actually used, and as a function of the atmospheric class of the experiments. Each bar in Fig. 6 was

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

Data

50 13

5

33

65

52

27

37

16

20

15

7

48S 17

43

6

46

47

2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

34

1

3

6

8

8

37

50

35

10

11

10

Observed

51

22

28

37

30

26

24

4

4 12

49 1803

44

4257

19

239

55

5

8

16 29

58

4

4

4

14

95

45

225

31

17

14 16

9 8 54

13

7 7

30 7

24 7

68

4

2

38

3

5

5

14

24

32

66 18 5

5

3

4

1e+01

56

39

101 38

62

3

3

3 62

180

1

88

61

429

57

3

6 30

3

3

3

3

42

81

29

47 80

67

493

50

2

2

2

2

32

34

26

7

13

5

10

2

2

2

2

2

1e+01

7

23

59

63

35

62

56 20

18

50

21

27

85

326

25

1

1

1

1

21

1e+03

9

1e−01

Distance(m)

Synthetic

58

F

49

E

32

D

30

33

16 6

C

7

9 0

0

0 1e−01

8

1e+03

2

1e−01

Distance(m)

B

61 1e+01

Distance(m)

1e+03

Atmospheric Class A

29

4504

18

66

11

48

41

60

23

28

40

59 35S

4

36

35

53

39

3

A B C D E F

2000

C

D

E

F

6

8

10

12

15

1000

1500

B

Synthetic

D (m)

DQ (gs1)

Ds2s (m2)

D (m)

DQ (gs1)

Ds2s (m2)

15.27 49.92 24.26 27.88 24.76 328.94

100.20 104.40 81.10 51.75 11.96 73.17

1.78 0.00 0.36 0.48 0.74 0.97

0.89 1.43 2.21 3.97 5.70 387.88

0.46 3.88 5.42 12.65 15.20 69.63

0.84 0.60 0.64 0.51 0.68 0.62

0

Prarie Grass

A

500

Table 2 Summary of results for the synthetic dataset and for the Prairie Grass dataset.

Atmospheric Class

Distance (m)

generated by plotting the results of 61200 runs, and shows the results obtained for 2, 3, 4, 5, 6, 8, 10, 12, 15 and 20 sensors, grouped by atmospheric class (a total of 612,000 runs were performed). The error decreases non-linearly with the number of sensors. In particular, the test shows that beyond a minimum number of sensors, there is little gain using a large amount of information. For instance, acceptable results are obtained with as few as 5 sensors selected at random. As expected, Fig. 6 also shows that the reconstructions of experiments conducted for the atmospheric class F are noticeably worse than for the other cases.

2500

Fig. 5. Summary of the predicted distance from the real source using the observed Prairie Grass data and the synthetic data.

2

3

4

5

20

Number of Sensors

Fig. 6. Cumulative error for all the 68 Prarie Grass cases as a function of number of sensors and atmospheric stability class.

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

Releases

11

24 0.1

03 sensors

34

38

1

10

45 100

54 1000

55

4505

60

67

10000

04 sensors

06 sensors

10000

1000

100

Distance (m)

10

08 sensors

10000

10 sensors

1

12 sensors

1000

100

10

1 0.1

1

10

100

1000

10000

0.1

1

10

100

1000

10000

Cumulative Concentration (mg m−3) Fig. 7. Shows the predicted distance to the real source location as a function of the cumulative concentration reading, when the search algorithm is run using only 3, 4, 6, 8, 10 and 12 sensors.

Further insight into the relation between amount of information required and performance of the search algorithm is gained by another experiment, in which the measurements were not chosen at random, but in such a way that the sum of the concentration over the selected sensors, i.e. the cumulative concentration, would be within a threshold range. Fig. 7 shows the predicted distance to the real source location as a function of the cumulative concentration reading, when the search algorithm is run using only 3, 4, 6, 8, 10 and 12 sensors. For each number of sensors, each release was simulated 30 times with 30 different combination of sensors, for which the cumulative concentration was within the selected bin. Fig. 7 was thus generated by a total of 162,000 simulations. Using a larger number of sensors gives results essentially comparable with the case of 12 sensors and are not reported. In order to eliminate the effects of different atmospheric classes, we performed the experiments using only releases of class D, indicating a neutral atmosphere. The results clearly show that the number of sensors itself is not a determining factor in the performance of the algorithm. While there is an overall slight improvement in the results obtained with large number of sensors, Fig. 7 shows that the fundamental quantity governing performance is in fact the cumulative concentration over the sensors. Simulations conducted using only three sensors give results comparable to simulations using twelve or more sensors, as long as the cumulative concentration over the sensors is the same. In particular, the distance D of the predicted source from the real source as a function of the cumulative concentration is found to decrease with an approximate 2=3 slope, independent of the number of sensors considered:

Dzk

X n

!2=3 ðnÞ Co

(9) ðnÞ Co

where k is a constant, and is the measurement of concentration at the sensor n. At this stage it is not possible to determine a minimum number of sensors over which Eq. (8) holds. This will likely depend on the characteristics of the problem, and perhaps

on the dispersion model used. In our case, the trend is observed already using 3 sensors, but for more complex scenarios it is possible that the minimum number of sensors necessary to the source characterization algorithm may be larger than 3. Clearly, the question arises if this result can be observed only for the specific algorithm used or whether it is independent of the search paradigm. While we did not test Eq. (9) using different algorithms, we expect it to hold. However, we do expect this relationship to be sensitive to the characteristics of the problem. For example, for large scale events where more complex dispersion models need to be used, it is possible that the relationship between cumulative concentration and number of sensors may differ from Eq. (9). 4. Conclusions This paper presented an innovative evolutionary algorithm which pairs traditional Darwinian to non-Darwinian machine learning operators, applied to the problem of identifying the location, size and emission rate of an unknown source of atmospheric contaminant. The proposed method is particularly advantageous in problems with complex evaluation functions, where the additional computational complexity introduced by the learning process is offset by the smaller number of evolutions required for convergence. The methodology was tested using the real concentration and meteorology measurements from the Prairie Grass field experiment. An additional synthetic data was also generated to conduct sensitivity studies in a noiseless environment. The solutions were generally very accurate, e.g. the source was often located within a few meters from the real source with synthetic data and few tens of meters with real data. The relationship between number of sensors and accuracy of the solution was investigated. A power law relationship was found characterizing the calculated distance from the real source, and the cumulative concentrations measured at the sensors. The power law relationship was found to be independent of the

4506

G. Cervone, P. Franzese / Atmospheric Environment 45 (2011) 4497e4506

number of sensors used, beyond a minimum threshold of 3 sensors.

Acknowledgments Work performed under this project has been partially supported by the NSF through Award 0849191 and by George Mason University Summer Research Funding.

References Allen, C.T., Young, G.S., Haupt, S.E., 2007. Improving pollutant source characterization by better estimating wind direction with a genetic algorithm. Atmospheric Environment 41 (11), 2283e2289. Arya, P.S., 1999. Air Pollution Meteorology and Dispersion. Oxford University Press. Ashlock, D., 2006. Evolutionary Computation for Modeling and Optimization. Springer-Verlag. Bäck, T., 1996. Evolutionary Algorithms In Theory And Practice: Evolutionary Strategies, Evolutionary Programming, and Genetic Algorithms. Oxford University Press. Barad, M., 1958. Project Prairie Grass, a field program in diffusion. Tech. Rep. Geophysical Research Paper, No. 59, Report AFCRC-TR-58e235, Air Force Cambridge Research Center, pp 218. Cervone, G., Franzese, P., January 2010a. Machine learning for the source detection of atmospheric emissions. In: Proceedings of the 8th Conference on Artificial Intelligence Applications to Environmental Science. No. J1.7. Cervone, G., Franzese, P., 2010b. Monte Carlo source detection of atmospheric emissions and error functions analysis. Computers & Geosciences 36 (7), 902e909. Cervone, G., Franzese, P., Gradjeanu, A., 2010a. Characterization of atmospheric contaminant sources using adaptive evolutionary algorithms. Atmospheric Environment 44, 3787e3796. Cervone, G., Franzese, P., Keesee, A.P., 2010b. Algorithm quasi-optimal (AQ) learning. WIREs: Computational Statistics 2 (2), 218e236. Cervone, G., Kaufman, K., Michalski, R., 2000a. Experimental validations of the learnable evolution model. In: Evolutionary Computation, 2000. Proceedings of the 2000 Congress on Evolutionary Computation. Vol. 2. pp. 1064e1071. Cervone, G., Michalski, R., Kaufman, K., Panait, L., 2000b. Combining machine learning with evolutionary computation: Recent results on LEM. In: Proceedings of the Fifth International Workshop on Multistrategy Learning (MSL-2000), Guimaraes, Portugal. pp. 41e58. Cervone, G., Panait, L., Michalski, R., 2001. The development of the AQ20 learning system and initial experiments. In: Proceedings of the Fifth International Symposium on Intelligent Information Systems, June 18-22, 2001, Zakopane, Poland. Physica Verlag, p. 13. Chow, F., Kosovi c, B., Chan, T., 2006. Source inversion for contaminant plume dispersion in urban environments using building-resolving simulations. In: Proceedings of the 86th American Meteorological Society Annual Meeting, Atlanta, GA. pp. 12e22.

Darwin, C., 1859. On the Origin of Species by Means of Natural Selection, Or The Preservation of Favoured Races in the Struggle for Life. Oxford University Press, London. Delle Monache, L., Lundquist, J., Kosovi c, B., Johannesson, G., Dyer, K., Aines, R., Chow, F., Belles, R., Hanley, W., Larsen, S., Loosmore, G., Nitao, J., Sugiyama, G., Vogt, P., 2008. Bayesian inference and Markov Chain Monte Carlo sampling to reconstruct a contaminant source on a continental scale. Journal of Applied Meteorology and Climatology 47, 2600e2613. Gelman, A., Carlin, J., Stern, H., Rubin, D., 2003. Bayesian Data Analysis. Chapman & Hall/CRC. 668. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, Reading, MA. Grefenstette, J., 1987. Incorporating problem specific knowledge into genetic algorithms. Genetic Algorithms and Simulated Annealing 4, 42e60. Grefenstette, J., 1991. Lamarckian learning in multi-agent environments. In: Proceedings of the Fourth International Conference on Genetic Algorithms. p. 8. Hamda, H., Jouve, F., Lutton, E., Schoenauer, M., Sebag, M., 2002. Compact unstructured representations for evolutionary design. Applied Intelligence 16 (2), 139e155. Haupt, S.E., Young, G.S., Allen, C.T., August 2007. A genetic algorithm method to assimilate sensor data for a toxic contaminant release. Journal of Computers 2 (6), 85e93. Johannesson, G., Hanley, B., Nitao, J., October 2004. Dynamic Bayesian Models via Monte Carlo - An Introduction with Examples -. Tech. Rep. UCRL-TR-207173. Lawrence Livermore National Laboratory.. Keesee, A.P., 2006. How sequential-cover data mining programs learn. College of Science, George Mason University. Lozano, J., 2006. Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms. Springer. Michalski, R., 1969. On the quasi-minimal solution of the general covering problem. In: Proceedings of Fifth International Symposium on Information Processing (FCIP 69). Vol. A3. pp. 125e128. Michalski, R., 1973. AQVAL/1 computer implementation of a variable-valued logic system VL 1 and examples of its application to pattern recognition. In: First International Joint Conference on Pattern Recognition. pp. 3e17. Michalski, R., 1983. A theory and methodology of inductive learning. In: Machine Learning: An Artificial Intelligence Approach, 1 83e134. Mitchell, T., 1997. Machine Learning. Mac Graw Hill. Rechenberg, I., 1971. Evolutionstrategie - optimierung technischer systeme nach prinzipien der biologischen evolution. Ph.D. Thesis, Technical University of Berlin. Reynolds, R., 1999. Cultural Algorithms: Theory and Applications. Mcgraw-Hill’s Advanced Topics. In Computer Science Series, 367e378. Schwefel, H.P., 1974. Numerische optimierung von computer-modellen. Ph.D. Thesis, Technical University of Berlin. Sebag, M., Schoenauer, M., 1994. Controlling crossover through inductive learning. Lecture Notes in Computer Science, 209. Sebag, M., Schoenauer, M., Ravise, C., 1997. Inductive learning of mutation step-size in evolutionary parameter optimization. Lecture Notes in Computer Science, 247e261. Senocak, I., Hengartner, N., Short, M., Daniel, W., 2008. Stochastic event reconstruction of atmospheric contaminant dispersion using Bayesian inference. Atmospheric Environment 42 (33), 7718e7727.

Suggest Documents