Multiobjective Optimization of Temporal Processes - Semantic Scholar

7 downloads 2915 Views 816KB Size Report
In model (1), Ωx is a feasible search space of the controllable variables x. ΩyNP ... rithms were used in [2] to optimize engine emissions, where the objective ...... sity of the population was guaranteed by the classification of the individuals into ...
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

845

Multiobjective Optimization of Temporal Processes Zhe Song, Member, IEEE, and Andrew Kusiak, Member, IEEE

Abstract—This paper presents a dynamic predictiveoptimization framework of a nonlinear temporal process. Datamining (DM) and evolutionary strategy algorithms are integrated in the framework for solving the optimization model. DM algorithms learn dynamic equations from the process data. An evolutionary strategy algorithm is then applied to solve the optimization problem guided by the knowledge extracted by the DM algorithm. The concept presented in this paper is illustrated with the data from a power plant, where the goal is to maximize the boiler efficiency and minimize the limestone consumption. This multiobjective optimization problem can be either transformed into a single-objective optimization problem through preference aggregation approaches or into a Pareto-optimal optimization problem. The computational results have shown the effectiveness of the proposed optimization framework. Index Terms—Data mining (DM), dynamic modeling, evolutionary algorithms (EAs), multiobjective optimization, nonlinear temporal process, power plant, predictive control, preference-based optimization.

dxyi ,min , dxyi ,max dvyi ,min , dvyi ,max X X 1,yi , X 2,yi Y NP Y 1,NP , Y 2,NP

αi (•) yi (LB), yi (CP ), yi (U B) Δyi

N OMENCLATURE x xi v vi y yi yP y NP Ωx ΩyNP f (•) t dyi , dxi , dvi Dyyii , Dxyii , Dvyii

Vector of the controllable variables of a process. ith controllable variable. Vector of the noncontrollable variables of a process. ith noncontrollable variable. Vector of the response variables of a process. ith response variable. Vector of the performance variables of a process, y P is a subset of y. Vector of the nonperformance variables of a process, y NP is a subset of y. Search space of x. Constraint space of y NP . Function capturing the mapping between (x, v) and y. Sampling time stamp. Maximum possible time delays for yi , xi , vi . Sets of time delay constants selected for the corresponding variables yi , xi , vi under the response variable yi .

Ωp Regioni C(•) R, S β(•), w1 , . . . , wM , wC λ μ Si σi N (•) δ X ∗, Regioni (t) X Regioni (t) nlocal nglobal

Minimum and maximum values of sets Dxyi1 , . . . , Dxyik . Minimum and maximum values of sets Dvy1i , . . . , Dvyki . Set of all controllable variables. Two subsets of X, actionable and nonactionable sets for yi . Set of response variables which are not the performance variables. Two subsets of Y NP , one is affected by changing the controllable variables, the other one is not. Preference function for the performance variable yi . Lower bound, center point, and upper bound for the preference function αi (•). Small positive constant to evaluate the decrease or increase of yi . Preference space. Region i in the preference space characterized by its preference values. Cost function of the controllable variables. Positive semidefinite matrices. Aggregation function and the weights used in it. Offspring size. Parent size or initial population size. Solution vector of ith individual. Mutation vector of ith individual. Normal distribution. Threshold-distance vector to differentiate two individuals. Set of Pareto-optimal solutions leading to Regioni at sampling time t. Set of solutions in the offspring pool leading to Regioni at sampling time t. Number of dominated individuals in a preference region. Number of dominated individuals in the preference space. I. I NTRODUCTION

Manuscript received November 2, 2008; revised February 8, 2009 and April 25, 2009. First published November 6, 2009; current version published June 16, 2010. This work was supported by the Iowa Energy Center under Grant 07-01. This paper was recommended by Associate Editor Y. S. Ong. The authors are with the Department of Mechanical and Industrial Engineering, The University of Iowa, Iowa City, IA 52242-1527 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2009.2030667

O

PTIMIZING nonlinear and nonstationary processes with multiple objectives presents a challenge for traditional solution approaches. In this paper, a process is represented as a triplet (x, v, y), where x ∈ Rk is a vector of k controllable variables, v ∈ Rm is a vector of m noncontrollable (measurable) variables, and y ∈ Rl is a vector of l system response variables. The value of a response variable changes due to

1083-4419/$26.00 © 2009 IEEE Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

846

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

the controllable and noncontrollable variables. The controllable and noncontrollable variables are considered in this paper as input variables. The underlying relationship is represented as y = f (x, v), where f (•) is a function capturing the process, and it may change in time. y = f (x, v) could be further expanded as y1 = f1 (x, v), y2 = f2 (x, v), . . . , yl = fl (x, v) for each response variable. Finding the optimal control settings for optimizing the process can be formulated as a multiobjective optimization problem with constraints. Without loss of generality, the first lp response variables (lp ≤ l) are assumed to be the performance metrics to be maximized. Let y P = [y1 , . . . , ylp ]T be a vector of all the performance response variables. y NP = [ylp +1 , . . . , yl ]T is a vector of the left nonperformance response variables max{y1 , y2 , . . . , ylp } x

s.t. x ∈ Ωx

y NP ∈ ΩyNP .

(1)

In model (1), Ωx is a feasible search space of the controllable variables x. ΩyNP is the constraint space in which y NP has to stay. In many industrial applications, the noncontrollable variables v, the underlying function f (•), and the search spaces Ωx and ΩyNP are time dependent. Thus, the optimization model should be solved repeatedly. Finding the optimal control settings for the nonlinear and temporal processes with multiobjectives poses several challenges. 1) It is difficult to derive analytic models describing f (•). For example, modeling the relationship between combustion process efficiency and input variables is not trivial. Thus, it is difficult to solve model (1) with traditional optimization techniques. 2) The function f (•) is nonstationary. Updating f (•) is necessary in practical applications. The function f (•) can be extracted with data-mining (DM) algorithms from the current process data, and, thus, it remains current. For example, a combustor ages over time. Regular maintenance and repair change the combustor’s properties, thus impacting the combustion process and the function f (•). 3) How to decide the tradeoffs among multiobjectives? How to find a set of potential solutions which is Pareto optimal? To deal with the aforementioned challenges, a framework integrating DM and evolutionary algorithms (EAs) is presented. The underlying process is captured with dynamic equations learned by DM algorithms from process data. Then, the optimization model is solved using EAs. Domain knowledge is thereby reflected through the definition of a preference function transforming performance metrics into the preference space, which is easier for the decision makers to understand. Recent advances in EAs and DM present an opportunity to model and optimize complex systems using operational data. DM algorithms have proved to be effective in applications such as manufacturing, marketing, combustion process, and so on [4], [20], [25], [33]. The use of DM algorithms to extract process models and then to optimize the models by EA can be traced back to [31] and [37], where neural networks (NNs) were used to identify the process with static equations (i.e., no time

delays were considered). Clustering algorithms were used to extract patterns, leading to higher combustion efficiency in steady states in [25]. It has been recognized that DM algorithms can identify dynamic equations from process data, which then, in turn, can be solved by EAs. Numerous successful EA applications have been reported in the literature [1]–[3], [8], [9], [15], [18], [19], [23], [31], [32], [37]–[39], [42], [43]. An EA was applied to solve a multiobjective power dispatch problem [1] and racing car design optimization [3]. NNs and genetic algorithms were used in [2] to optimize engine emissions, where the objective function was a nonlinear weighted function. An EA was applied to the multiobjective optimization of gas turbine emissions and pressure fluctuation [8]. The research results reported in the literature offer a promising direction for solving complex problems that are difficult to handle by traditional analytical approaches. This paper presents a framework for optimizing temporal processes. The process model is described by a set of dynamic equations which is identified by DM algorithms. As gradients are usually not available for solving these dynamic equations assembled in a model, an EA is used to solve this model. The optimization problem discussed in this paper can be considered as a dynamic multiobjective optimization problem [15], [44]. The focus of this paper is on modeling processes with dynamic equations generating solutions at specific intervals required by the application. The latter parallels solving static optimization problems. From the dynamic optimization perspective, tracking optimal solutions is achieved by iteratively solving the optimization problem or updating the underlying process model with recent process data. The proposed approach integrating DM with evolutionary computation has been applied to optimize combustion process in a power plant. Computational results have shown that the models extracted by the DM algorithms are accurate and can be used to control an industrial combustion process. The optimization framework presented is applicable to other processes, e.g., refinery process and wind energy conversion, where analytical approaches are not able to handle the complexity and scope of the models. The proposed approach calls for the use of a large volume of process data representing the process. II. P ROCESS M ODELING AND O PTIMIZATION BASED ON DYNAMIC E QUATIONS A process can be considered as a dynamic multiple-input– multiple-output system. Assume that the value of the first performance response variable at time t, i.e., y1 (t), could be determined by the values of the previous system status (i.e., predictors): {y1 (t − 1), . . . , y1 (t − dy1 )}, {x1 (t − 1), . . . , x1 (t − {v1 (t − 1), . . . , dx1 )}, . . . , {xk (t − 1), . . . , xk (t − dxk )}, v1 (t−dv1)}, . . . , {vm(t−1), . . . , vm(t−dvm)}. Similarly, yi (t) could be affected by {yi (t − 1), . . . , yi (t − dyi )}, {x1 (t − 1), . . . , x1 (t−dx1)}, . . . , {xk (t−1), . . . , xk (t−dxk)}, {v1 (t−1), . . . , v1 (t−dv1)}, . . . , {vm (t−1), . . . , vm (t−dvm)}, for i = 1 to l. dyi , dx1 , . . . , dxk , dv1 , . . . , dvm are some maximum possible time delays to be considered for the corresponding variables, and they are all positive constants. To obtain an accurate dynamic model that can be applied to optimize the

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES

process, selecting the appropriate predictors is important. For example, for the performance variable y1 (t), a predictor selection algorithm selects a set of important predictors among {y1 (t−1), . . . , y1 (t−dy )}, {x1 (t−1), . . . , x1 (t−dx1 )}, . . . , {xk (t−1), . . . , xk (t−dxk )}, {v1 (t−1), . . . , v1 (t−dv1)}, . . . , {vm (t − 1), . . . , vm (t − dvm )}. DM offers algorithms that can perform such a task. For example, the boosting tree [16], [17] algorithm can be used to determine the predictor’s importance. Wrapper and genetic random search procedures can determine the best set of predictors [14], [35]. Aside from the algorithms for predictor selection, domain knowledge is another important source of information for case-by-case applications. The predictor selection is not discussed in detail in this paper; rather, it is accomplished by domain knowledge and importance ranking. For clarity of presentation, some definitions and observations about the dynamic equations are included in the Appendix. Since the process is modeled with a set of dynamic equations, model (1) can be reformulated as a one-step-predictiveoptimization model. The predictive-optimization model resembles the widely used predictive control idea, which has proven to be successful in industry [18], [21], [26], [27], [36]. At sampling time t, the system status is {y1 (t), . . . , yl (t), x1 (t), . . . , xk (t), v1 (t), . . . , vm (t)}, and all historical information is available. The optimal values of {x1 (t), . . . , xk (t)} are determined by solving         ylp,min y1 ,min y2 ,min , y2 t+ dx , . . . , yM t+ dx max y1 t+ dx x(t)

s.t. x(t) ∈ Ωx      T yl +1 ,min ylp +1 t+ dx p , . . . , yl t+ dyxl ,min ∈ ΩyNP   yi t + dxyi ,min  = fi . . . , [xj (t)]xj ∈X 1,yi , 

  xjd t − d + dxyi ,min xj ∈X 2,yi ,d∈Dyi , . . . xj

i = 1, . . . , l.

(2)

thus, the performance metrics {y1 , . . . , ylp } are optimized at yl ,min

}. sampling time {t + dxy1 ,min , t + dxy2 ,min , . . . , t + dx p Based on observation 3, the x vector in model (2) is lp X 1,yi , i.e., x(t) = composed of variables belonging to i=1 [xj (t)]x ∈ lp X 1,yi . Note that, in order to improve the solution j

i=1

robustness of model (2), different techniques could be used; interested readers are referred to [12], [19], and [34]. Different DM algorithms can also be used to learn the dynamic equation to form an ensemble [33] and combine the predictions. Model (2) could be treated either as a single-objective optimization or Pareto-optimization problem, both discussed in this paper and illustrated with case studies. The concept of preference function is introduced, where domain knowledge can be used for simplifying decision making and helping design multiobjective evolutionary strategy algorithms to keep desired individuals in the elite set. The single-objective representation of model (2) is solved with an evolutionary strategy. The strength Pareto EA (SPEA) [38] is used to optimize the Pareto-optimal representation of model (2).

847

A. Incorporating Domain Knowledge in Preference Functions In a multiobjective decision-making process, domain knowledge is important in determining the optimal or satisfactory solution. In this paper, domain knowledge is incorporated in model (2) through the preference functions [10], [29], [30]. Definition 1: Preference function αi (•) is defined to transform performance variable yi into the interval [0, 1], with “0” denoting complete unacceptability, “1” standing for total satisfaction, and “0.5” denoting “not good and not bad.” i = 1, . . . , lp . Based on the previous assumption, the high value of yi (t) means better performance; thus, the derivative dαi /yi should be greater than or equal to zero, i.e., dαi /yi ≥ 0. To characterize the preference function αi (•), three characteristic points need to be defined. Definition 2: For αi (•), let yi (LB), yi (CP ), and yi (U B) stand for yi ’s lower bound, center point, and upper bound for the preference, i.e., αi (yi ) = 1 for yi ≥ yi (U B), αi (yi ) = 0.5 for yi = yi (CP ), and αi (yi ) = 0 for yi ≤ yi (LB), i = 1, . . . , lp . Model (2) can be expressed as follows by using the preference functions:

      max α1 y1 t+ dxy1 ,min , α2 y2 t+ dxy2 ,min , . . . , x(t)     yl ,min αlp ylp t+ dx p s.t. x(t) ∈ Ωx      T yl +1 ,min ylp +1 t+ dx p , . . . , yl t + dxyl ,min ∈ ΩyNP   yi t + dxyi ,min  = fi . . . , [xj (t)]xj ∈X 1,yi , 

  xj t − d + dxyi ,min xj ∈X 2,yi ,d∈Dyi , . . . xj

i = 1, . . . , l.

(3)

To continuously improve the performance of the dynamic process, yi (LB), yi (CP ), and yi (U B) need to be dynamic. If, at sampling time t, the process is not optimized (it is left alone), the performance response variables are {y1 (t+ dxy1 ,min ), yl ,min )}. Thus, yi (CP ) at samy2 (t + dxy2 ,min ), . . . , ylp (t + dx p pling time t can be defined as yi (CP ) = yi (t + dxyi ,min ); similarly, yi (LB) = yi (t + dxyi ,min ) − Δyi , and yi (U B) = yi (t + dxyi ,min ) + Δyi , where Δyi is a small positive constant. The intuitive explanation is that, if, after implementing the optimal control settings, the performance increases by Δyi , it is deemed as totally satisfactory. Otherwise, if the control settings decrease the performance by Δyi , it is totally unacceptable. For ease of discussion, αi (yi (t + dxyi ,min )) could be simplified as αi (t + dxyi ,min ), and {α1 (y1 (t + dxy1 ,min )), α2 (y2 (t + yl ,min ))} can be represented as a dxy2 ,min )), . . . , αlp (ylp (t + dx p y1 ,min ), α2 (t + dxy2 ,min ), . . . , αlp (t + vector α(t) = [α1 (t + dx yl ,min

)]T in the preference space. dx p Model (3) can be solved by different algorithms. The Paretooptimal set concept to directly solve model (3) is a valid approach. Then, a solution is selected from the solution set based on the domain knowledge. Another approach is to combine the

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

848

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

lp objectives into a single-objective function. The simplest way is the weighted sum of all the objectives. B. Pareto-Optimal Set Approach Since all preferences are between zero and one, the preference space is a hypercubic unit with lp dimensions. It can be divided into several regions with different characteristics. Definition 3: Let Ωp be the preference space, i.e., Ωp = [0, 1]lp . Let [0.5]lp be a vector consisting of lp 0.5’s. Let [1]lp be a vector consisting of lp 1’s. p = [p1 , . . . , plp ]T is a vector in the preference space. Four regions of Ωp can be defined   Region1 = p|p ∈ Ωp , pi ≥ 0.5, i = 1, . . . , lp , p = [0.5]lp ,   Region4 = p|p ∈ Ωp , pi ≤ 0.5, i = 1, . . . , lp , p = [0.5]lp ,  Region2 = p|p ∈ Ωp , p ∈ Region1 , pT [1]lp ≥ 0.5lp ,  p = [0.5]lp ,  Region3 = p|p ∈ Ωp , p ∈ Region4 , pT [1]lp < 0.5lp ,  p = [0.5]lp . Observation 1: At sampling time t, for a Pareto-optimal frontier α∗ (t) and its corresponding solution x∗ (t), if α∗ (t) ∈ Region1 , then implementing x∗ (t) will increase some of the lp preferences without decreasing any other preferences; if α∗ (t) ∈ Region4 , then implementing x∗ (t) will decrease some of the lp preferences without increasing any other preferences; if α∗ (t) ∈ Region2 , then implementing x∗ (t) will increase some of the preferences at the expense of decreasing other preferences, but the total sum of the lp preference values will increase or stay the same; if α∗ (t) ∈ Region3 , then implementing x∗ (t) will increase some of the preferences at the expense of decreasing other preferences, but the total sum of the lp preference values will decrease. Based on observation 1, the solutions leading to Region1 are most desirable. The solutions leading to Region4 are definitely not desirable. The solutions leading to Region2 or Region3 are not clear; it is difficult to determine if the process is optimized without deeper domain knowledge. The solutions leading to the boundary regions between Region2 and Region1 could also be considered for decision-making. In some cases, sacrificing one performance metric slightly is worthwhile if the improvement of the other performance metrics is significant. Until now, the input cost has not been considered. However, in any real application, every input costs some type of energy. It is desirable then to optimize the process with a low-input cost. Definition 4: Let x∗ (t) be a solution of model (3) at sampling time t. The cost associated with this input x∗ (t) is defined as C (x∗ (t)) = x∗ (t)T Rx∗ (t) + (x∗ (t) − x(t − 1))T S(x∗ (t) − x(t − 1)) where R and S are positive semidefinite matrices. Based on the cost, the Pareto-optimal solutions could be ranked in ascending order. A user may select the Pareto-optimal solution with the small input cost and desirable preference values.

C. Preference Aggregation Approach Traditional Pareto-dominance-based EAs are ineffective when the number of objectives becomes large [28], as there could be many Pareto-optimal solutions. A common practice is to reduce the multiobjectives to a single objective. The lp objectives could be appropriately aggregated by some type of aggregation function. Based on previous research in engineering design [10], [29], [30], an aggregation function is adopted due to its useful properties. Definition 5: Let β(•) be an aggregation function combining the lp objectives of model (3), i.e., β(α1 , . . . , αlp , w1 , . . . , wlp ) = (w1 α1s + · · · + wlp αlsp )1/s , where w1 , . . . , wlp are the weights, w1 , . . . , wlp ≥ 0, and w1 + · · · + wlp = 1. If the input cost is considered as another performance metric to be optimized, the aggregation function is expressed as β(α1 , . . . , αlp , C, w1 , . . . , wlp , wC ) = (w1 α1s + · · · + s wlp αlp + wC C −s )1/s , with w1 , . . . , wlp , wC ≥ 0, and w1 + · · · + wlp + wC = 1. Model (3) can be transformed into a single-objective model as follows:   max β(α1 , . . . , αlp , C, w1 , . . . , wlp , wC ) x(t)

s.t. x(t) ∈ Ωx      T yl +1 ,min ylp +1 t+ dx p , . . . , yl t+ dyxl ,min ∈ ΩyNP   yi t+ dxyi ,min  = fi . . . , [xj (t)]xj ∈X 1,yi ,  

 yi ,min y ,... xj t − d + dx xj ∈X 2,yi ,d∈D i xj

i = 1, . . . , l

(4)

where β(α1 , . . . , αlp , C, w1 , . . . , wlp , wC )     s yl ,min s = w1 α1 t + dxy1 ,min + · · · + wlp αlp t + dx p 1 +wC C (x(t))−s s . III. E VOLUTIONARY S TRATEGY A LGORITHM Since the dynamic equations fi ’s are constructed by the DM algorithms, traditional optimization algorithms cannot be applied, as they usually require fi to be in a specific form. In this paper, different evolutionary strategy algorithms are used to solve optimization models (3) and (4) at time t. Other EAs (e.g., the genetic algorithm) could also be used, and their performance could be studied. Definition 6: Let λ be the offspring size and μ be both the number of offspring selected and the initial population size. Individuals in the parent population are numbered from 1 to μ. Individuals in the offspring population are numbered from 1 to λ. For ease of discussion, μ is assumed to be divisible ⎛ by four. ⎞ xj low (t) ⎟ ⎜ .. Recall that x(t) = [xj (t)]x ∈ lp X 1,yi = ⎝ ⎠ . j i=1 xj high (t) which is a vector with index j varying from j low to j high .

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES

Definition 7: The general form of the ith individual i i in the⎛ evolutionary is defined ⎞ strategy ⎛ ⎞ as (s , σ ), where i i xj low (t) σj low .. ⎟ ⎜ ⎜ .. ⎟ i = si = ⎝ , and σ ⎠ ⎝ . ⎠. Each element of σ i . σji high xij high (t) is used as a standard deviation of a normal distribution with zero mean. The basic steps of an evolutionary strategy algorithm [13] for solving a single-objective model are shown next. Algorithm 1 1) Initialize μ individuals (candidate solutions) to form the initial parent population. 2) Repeat until the stopping criteria are satisfied. a) Select and recombine parents from the parent population to generate λ offspring (children). b) Mutate the λ children. c) Select the best μ children based on the fitness function values. d) Use the selected μ children as parents for the next generation. In an evolutionary strategy algorithm, an individual (S i , σ i ) can be mutated by following (5) and (6), with σ i mutated first and si mutated next: ⎛ N (0,τ  )+N (0,τ ) ⎞ j low e .. i i ⎝ ⎠ (5) σ =σ • . N (0,τ  )+Nj high (0,τ ) e where N (0, τ  ) is a random number drawn from the normal distribution with zero mean and standard deviation τ  . Nj low (0, τ ) is a random number drawn from the normal distribution with zero mean and standard deviation τ . Nj low (0, τ ) is generated specifically for σji low , whereas N (0, τ  ) is for all entries. “•” is the Hadamard matrix product [40] si = si + N (0, σ i )

(6)

where N (0, σ i ) is a vector of the same size as si . Each element of N (0, σ i ) is generated from a normal distribution with mean zero and the corresponding standard deviation in vector σ i . Definition 8: Let SelectedP arents be an index set that is composed of two unique randomly selected indexes from 1 to μ. SelectedP arents changes every time it is generated. To generate λ children, two parents are selected from the parent population and recombined λ times. Assume each time that two parents are selected randomly to produce one child by using     1 i 1 i s, σ . (7) 2 2 i∈SelectedP arents

i∈SelectedP arents

A discrete recombination operator [13] was applied in this research; however, it did not perform, as well as the intermediary recombination operator used in (7). In the traditional evolutionary strategy algorithm, λ children are generated, and the best μ of them are selected based on the fitness function value. However, in order to keep the diversity

849

of the population and to prevent the algorithm from converging too fast, a threshold-distance selection operator is used to solve model (4). A. Solving the Preference Aggregation Model Model (4) is a constrained optimization problem; thus, an additional constraint-handling technique has to be incorporated into algorithm 1. Although there are different techniques to handle the constraints in an evolutionary computation [9], [13], a procedure that is similar to that in [1] is incorporated in algorithm 1 to check the feasibility of all the individuals, thus resulting in algorithm 2. Algorithm 2 1) Initialize μ feasible individuals (candidate solutions) to form the initial parent population. 2) Repeat until the stopping criteria are satisfied. a) Select and recombine parents from the parent population to generate λ offspring (children). b) Mutate the λ children. c) Check the feasibility of all children. If all children are feasible, go to step d); otherwise, go to step a). d) Select μ children based on the threshold-distance selection operator. To avoid selecting similar individuals for the next generation, a threshold-distance selection operator is used in algorithm 2. Threshold-Distance Selection Operator: An individual si is j i j considered similar to another  ⎞individual s if |s − s | ≤ δ, ⎛  ⎛ ⎞  j xij low (t) − xj low (t) δj low ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎜ . ≤ i.e., ⎝ . δ is a threshold..  ⎠ ⎝ . ⎠  i  j δj high xj high (t) − xj high (t) distance vector to differentiate two individuals. When δ = 0 or δ = ∞ (i.e., large enough), the threshold-distance selection operator reduces to the traditional evolutionary-strategy (ES) selection operator in algorithm 1. The threshold-distance selection operator is formulated next. Threshold-Distance Selection: 1) Sort the λ children in descending order based on their fitness values. 2) Add the first child into the parent population. Let the current selected individual be the first child. 3) Do: a) Select the next child into the parent population if it is not similar to the current selected individual. b) Update the current selected individual. c) While the parent population is not full or all the remaining children are similar to the current selected individual or the current selected individual is the last child. 4) If the number of individuals in the parent population is smaller than μ, supplement the parent population by selecting the best children which have not been selected yet.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

850

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

B. Solving the Pareto-Optimal Set Model Major modifications need to be made to algorithm 1 to generate the Pareto-optimal set for model (3). Since not all Pareto-optimal solutions will lead to Region1 and Region2 , the Pareto-optimal solutions leading to other regions are not stored in an elite external set. For a traditional multiobjective EA, the preference functions are not considered until the Pareto-optimal set is generated by the EA. In this scenario, the Pareto-optimal solutions are scattered across different regions in the preference space, including undesirable regions. If the preference functions are utilized before designing a multiobjective EA, the preference space can be divided into desired and undesired regions. Specific techniques could be developed within a multiobjective EA so that the Pareto-optimal set contains only the solutions leading to the desired preference regions. Definition 9: Let Offspring be the set consisting of λ children and P arent be the set consisting of μ parents. Let X ∗,Region1 (t) be a set consisting of Pareto-optimal solutions leading to Region1 . Let X ∗,Region2 (t) be a set consisting of Pareto-optimal solutions leading to Region2 . Algorithm 3 is the modified SPEA [11], [38] so that different regions in the preference space are considered and the individuals leading to the desired regions are retained. Algorithm 3 1) Initialize μ feasible individuals, and save them into P arent. Each individual can be labeled into the four regions. Initialize X ∗,Region1 (t) and X ∗,Region2 (t) as an empty set. 2) Copy from P arent nondominated solutions, which will lead to Region1 or Region2 , into X ∗,Region1 (t) and X ∗,Region2 (t). 3) Remove the solutions in X ∗,Region1 (t) ∪ X ∗,Region2 (t) which are dominated by the other members in it. 4) If the number of solutions in X ∗,Region1 (t) ∪ X ∗,Region2 (t) exceeds some threshold, use clustering algorithms to reduce it. 5) Select and recombine the parents from P arent to generate λ offspring, and save them into Offspring. 6) Mutate the λ children. 7) Check the feasibility of all children. If all children are feasible, go to step 8); otherwise, go to step 5). 8) Label each child with a corresponding region. Calculate the local and global strength of each child in Of f spring. 9) Empty P arent. For each region, select for P arent the best μ/4 individuals based on their local strength. If the total number of selected individuals in P arent is smaller than μ, select the remaining individuals of Of f spring based on their global strength. 10) If the maximum number of generations is reached, then stop; else, go to step 2). Local and Global Strength: Definition 10: Let X Regioni (t) be a set consisting of individuals from Of f spring leading to Regioni , i = 1, . . . , 4.

TABLE I DATA S ET D ESCRIPTION

For an individual (sj , σ j ) in X Regioni (t), its local strength is calculated as Local_strengthj =

nlocal |X Regioni (t)|

+1

(8)

where nlocal is the number of individuals of X Regioni (t) dominated by individual (sj , σ j ). Its global strength is calculated as Global_strengthj =

nglobal |Of f spring| + 1

(9)

where nglobal is the number of individuals of Of f spring dominated by individual (sj , σ j ). | • | is the cardinality operator of a set. The proposed algorithm 3 heuristically keeps the diversity in the populations and uses local and global strength to converge quickly. The elite solutions leading to Region1 and Region2 are not lost until better ones are found. IV. I NDUSTRIAL C ASE S TUDY To validate the concepts introduced in this paper, the data from Boiler 11 at The University of Iowa Power Plant (UI PP) were collected. The boiler burns coal and biomass (oat hulls). The ratio of coal to oat hulls changes depending on the availability of the oat hulls. Two performance metrics are the boiler efficiency and the coal–limestone ratio. They are both to be maximized. Limestone is used to reduce the SO2 emission. Thus, the response variable to be constrained is the SO2 emission, which is to be controlled below some level. SO2 could be considered as another performance metric; however, in this study, two metrics are easier to visualize, and it is easier to discuss and visualize the optimization results. A. Data Set, Sampling Frequency, and Predictor Selection From the UI PP data historian, 5729 data points were sampled at 5-min intervals. Data set 1 in Table I is the total data set that is composed of 5729 data points starting from “2/1/07 2:50 A . M .” and continuing to “2/21/07 11:45 A . M .” During this time period, the boiler operations could be described as normal. Considering the noise in the industrial data, data set 1 was denoised by the moving-average approach [41] with a lag of four, and error readings of the SO2 –coal ratio were deleted. Data set 1 was divided into two data sets. Data set 2 was used to extract a model by the DM algorithms. It consisted of 4600 data points. Data set 3 was used to test the model learnt from data set 2. In this paper, the boiler efficiency, the coal–limestone ratio, and the SO2 –coal ratio (Table II) are heuristically modeled as a

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES

TABLE II P ROCESS VARIABLES OF THE DATA S ET

851

When the structures of the dynamic equations describing the combustion process are known, the DM algorithms are used to extract the actual equations. B. Learning Dynamic Equations From the Process Data

function of coal-and-primary-air ratio [coal flow (kilopounds per hour)/primary air flow (kilopounds per hour)], coal-andsecondary-air ratio [coal flow (kilopounds per hour)/secondary air flow (kilopounds per hour)], coal-to-oat-hull ratio [coal flow (kilopounds per hour)/oat hull flow (kilopounds per hour)], and coal and oat hull quality (in British thermal unit per pound). However, other variables could be considered based on the application context. Using the coal-and-primary-air ratio, the primary air flow could be determined based on the current coal flow. Similarly, the secondary air flow could be determined by using the current coal flow and the coal-and-secondary-air ratio. The coal-to-oat-hull ratio is considered a noncontrollable variable. The maximum time delays, namely, dy1 , dy2 , dy3 , dx1 , dx2 , dv1 , dv2 , dv3 , are assumed to be nine. In the context of the 5-min sampling intervals, 9 × 5 = 45 min is assumed to be the maximum time delays in the combustion process. For example, if the operator were to change the primary air flow, it would take at most 45 min to observe that this change had some effect on the boiler efficiency. After running the classification and Regression tree (C & R tree) algorithm [7] on data set 1, the importance of each predictor is calculated. Considering observations 2 and 3, the predictors are heuristically selected for each response variable. Then, the current boiler efficiency y1 (t) is expressed as y1 (t) = f1 (y1 (t − 1), y1 (t − 2), x1 (t − 5), x1 (t − 6), x1 (t − 7), x2 (t − 5), x2 (t − 6), x2 (t − 7), (10) v1 (t − 9), v2 (t − 9), v3 (t − 9)) . The coal–limestone ratio y2 (t) can be written as y2 (t) = f2 (y2 (t − 1), y2 (t − 2), y2 (t − 3), x1 (t − 1), x1 (t − 2), x2 (t − 1), x2 (t − 2), (11) v1 (t − 1), v1 (t − 2), v2 (t − 9)) . The SO2 –coal ratio y3 (t) is y3 (t) = f3 (y3 (t − 1), y3 (t − 2), x1 (t − 1), x1 (t − 8), x1 (t − 9), x2 (t − 1), x2 (t − 7), x2 (t − 9), (12) v1 (t − 3), v2 (t − 1), v3 (t − 6)) .

In this paper, the C & R tree algorithm [7] and the NN algorithm [5] are used to learn the dynamic equations (10)–(12) from data set 2. NN performed better in predicting the boiler efficiency and the coal–limestone ratio. The C & R tree algorithm performed better than NN only in predicting the SO2 –coal ratio. The extracted models were tested using data set 3. Table III summarizes the models’ prediction accuracy based on data set 3. Fig. 1(a)–(c) shows the first 200 predicted and observed values of data set 3. In summary, all the models made high quality predictions on the testing data sets and captured the system dynamics. The modeling task is simplified by using the DM algorithms. However, updating (online or offline) the learned models by new data points is necessary for a temporal process. Data filtering is used for removing the data in error. Model performance is monitored so that the updating procedure can be triggered as needed. These issues, however, are beyond the scope of this paper. Although only the C & R tree and NN algorithms are used in this paper, other DM algorithms (such as random forest [6], boosting tree [16], [17], or radial basis function [22]) could be selected to further improve the prediction accuracy. Based on the selected predictors and dynamic equations (10)–(12), model (2) is instantiated as max {y1 (t + 5), y2 (t + 1)} x(t)

s.t. [x1 (t), x2 (t)]T ∈ Ωx ; y3 (t + 1) ∈ ΩyNP ; y1 (t + 5) = f1 (y1 (t + 4), y1 (t + 3), x1 (t), x1 (t − 1), x1 (t − 2), x2 (t), x2 (t − 1), x2 (t − 2), v1 (t − 4), v2 (t − 4), v3 (t − 4)) ; y2 (t + 1) = f2 (y2 (t), y2 (t − 1), y2 (t − 2), x1 (t), x1 (t − 1), x2 (t), x2 (t − 1), v1 (t), v1 (t − 1), v2 (t − 8)) ; y3 (t + 1) = f3 (y3 (t), y3 (t − 1), x1 (t), x1 (t − 7), x1 (t − 8), x2 (t), x2 (t − 6), x2 (t − 8), v1 (t − 2), v2 (t), v3 (t − 5)) .

(13)

To solve model (13), preference functions are used. C. Preference Functions Based on domain knowledge, improving the boiler efficiency by 0.01 will be considered as a significant achievement in any power plant. The increase of the coal–limestone ratio by one would be regarded as totally satisfactory, i.e., Δy1 = 0.01, Δy2 = 1. Note that, in different applications, the degree of satisfaction and dissatisfaction could vary. Thus, Δyi should also change based on the domain knowledge.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

852

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

TABLE III P REDICTION ACCURACY OF D IFFERENT M ODELS FOR DATA S ET 3

Fig. 2. Linear dynamic preference functions for the boiler efficiency and coal– limestone ratio. (a) Boiler efficiency preference function. (b) Coal– limestone ratio preference function.

D. Solving the Preference Aggregation Model Once the linear preference functions are assumed, model (4) can be instantiated as follows: max {β(α1 , α2 , C, w1 , w2 , wC )} x(t)

Fig. 1. Predicted values and observed values of the first 200 data points of data set 3. (a) Boiler efficiency. (b) Coal–limestone ratio. (c) SO2 –coal ratio.

Thus, two linear preference functions could be defined based on y1 (CP ) = y1 (t + 5), y1 (LB) = y1 (t + 5) − 0.01, and y1 (U B) = y1 (t + 5) + 0.01, i.e., α1 (y1 ) = 50y1 + 0.5 − 50y1 (t + 5), y1 (t + 5) − 0.01 ≤ y1 ≤ y1 (t + 5) + 0.01. Similarly, α2 (y2 ) = 0.5y2 + 0.5 − 0.5y2 (t + 1), y2 (t + 1) − 1 ≤ y2 ≤ y2 (t + 1) + 1. Fig. 2(a) and (b) illustrates the shapes of the preference functions for the boiler efficiency and the coal–limestone ratio, respectively. Other nonlinear shape functions can also be considered in future research. Once the preference functions are determined, model (13) can be solved using the preference aggregation or the Paretooptimal set approach.

s.t. [x1 (t), x2 (t)]T ∈ Ωx ; y3 (t + 1) ∈ ΩyNP ; y1 (t + 5) = f1 (y1 (t + 4), y1 (t + 3), x1 (t), x1 (t − 1), x1 (t − 2), x2 (t), x2 (t − 1), x2 (t − 2), v1 (t − 4), v2 (t − 4), v3 (t − 4)) ; y2 (t + 1) = f2 (y2 (t), y2 (t − 1), y2 (t − 2), x1 (t), x1 (t − 1), x2 (t), x2 (t − 1), v1 (t), v1 (t − 1), v2 (t − 8)) ; y3 (t + 1) = f3 (y3 (t), y3 (t − 1), x1 (t), x1 (t − 7), x1 (t − 8), x2 (t), x2 (t − 6), x2 (t − 8), (14) v1 (t − 2), v2 (t), v3 (t − 5)) where β(α1 , α2 , C, w1 , w2 , wC ) 1  = w1 α1 (t + 5)s + w2 α2 (t + 1)s + wC C (x(t))−s s .

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES

853

Fig. 3. ES solving model (14) for different offspring sizes at sampling time “2/21/2007 9:55 A . M .”; the objective function value is the best individual’s fitness value at each generation.

Let w1 = 0.4, w2 = 0.58, and wC = 0.02 (other weight scenarios are  discussed  later in this paper). Let s = 1 and 1 0 R=S=I= ; thus, C(x(t)) = x1 (t)2 + x2 (t)2 + 0 1 (x1 (t) − x1 (t − 1))2 + (x2 (t) − x2 (t − 1))2 . These weights and constants are fixed in this paper to see the impact of other interesting weights, namely, w1 and w2 . In this paper,

the input cost weight wC impact is not studied. δ is set as 0.01 0.01 based on the analysis of historical data.

The impact δ on the solution is discussed later in this paper. xx12 is



0.2 limited to between 0.05 0.05 and 0.29 in this case study based on analyzing the historical data distribution of x. y3 (t + 1) should be smaller than or equal to the current SO2 –coal ratio without changing the controllable settings. The ES parameters τ  and  τ√are determined heuristically √  as τ = 1/ 2b and τ = 1/ 2 b, with b = 2, in this case, the number of actionable variables [13]. The lower and upper bounds for the standard deviation values of σ i are set to 0.005 and 0.1 based on the analysis of the historical data. Two parents are selected to generate one child. For the initial by population at sampling time t, si is generated

x1 drawing (t)+0.1 and , random numbers uniformly between 0.05 0.05 x2 (t)+0.1 which allows the algorithm to perform a local search. From an application point of view, finding the local solutions is preferred, as dramatically different solutions could make the process unstable. Similarly, σ i is generated

0.1by drawing random and numbers uniformly between 0.005 0.005 0.1 . Although the research suggests the selection pressure to be μ/λ = 1/7 [13], [24], numerous experiments were conducted, and it was determined that μ/λ = 20/120 produced satisfactory results. The algorithm has converged to a local optimum in 25 generations (see Fig. 3). A sampling time stamp (test set) was randomly selected between “2/17/07 11:45 A . M .” and “2/21/07 11:45 A . M .” Then, model (14) was solved for different values of the selection pressure. Similar patterns were observed by solving the optimization model for other test sets. To evaluate the effects of the initial parent size, the selection pressure is fixed at μ/λ = 1/6. Fig. 4 shows that the 20parent size is enough for the algorithm to converge to local optima. A smaller number of initial parent sizes lead to unstable results. To evaluate the impact of the threshold-distance selection . It can be observed operator, vector δ is varied from 00 to 0.3 0.3

from Fig. 5 that δ = 0.005 0.005 performs a little better. When δ is

Fig. 4. ES solving model (14) for different parent sizes at sampling time “2/21/2007 9:55 A . M .”; the objective function value is the best individual’s fitness value at each generation.

Fig. 5. ES solving model (14) with 20/120 for different threshold-distance selection operator at sampling time “2/21/2007 9:55 A . M .”; the objective function value is the best individual’s fitness value at each generation.

0.3 is reduced to the 0.3 , the threshold-distance selection operator

traditional ES selection process, i.e., δ = 00 . Fig. 6 illustrates the impact of different weight combinations for the aggregation function. The vertical axis is the coal–limestone ratio preference value. The horizontal axis is the boiler efficiency preference value. There are five points in Fig. 6 which correspond to five different weight combinations. From left to right, the weight combinations are w1 = 0.08

w2 = 0.9

wC = 0.02

s=1

w1 = 0.38 w1 = 0.4 w1 = 0.6

w2 = 0.6 w2 = 0.58 w2 = 0.38

wC = 0.02 wC = 0.02 wC = 0.02

s=1 s=1 s=1

w1 = 0.9

w2 = 0.08

wC = 0.02

s = 1.

It is easy to see that, as w1 increases, the best solution tends to emphasize the efficiency optimization. It is also easy to see that the coal–limestone ratio is hard to optimize. For w1 = 0.08 and w2 = 0.9, the algorithm cannot find the solutions to improve the coal–limestone ratio significantly. A possible reason is that increasing the coal–limestone ratio slightly may sacrifice the boiler efficiency too much, which may lead to a total decrease of the objective function. Preference aggregation allows the two objectives to be balanced with the weights. This approach is effective when deep domain knowledge about the process is available to determine the weights. The Pareto-optimal approach produces a solution set.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

854

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

Fig. 6. ES solving model (14) with 20/120 for different weights at sampling time “2/19/2007 7:25 A . M .”; the plotted solution is the best individual at the 25th generation.

E. Solving the Optimization Model With the Pareto-Optimal Set Approach Only the most important results are reported in this paper. Model (13) is solved by algorithm 3 for μ/λ = 20/120 without restricting the size of the two elite sets. The preference values are generated with the DM algorithms. Based on Fig. 7, it is easy to observe that the coal–limestone ratio preference (vertical axis) and the boiler efficiency preference (horizontal axis) are two competing performance metrics which are hard to optimize at the same time. The randomly generated initial population shows that increasing the coal–limestone ratio will lead to a total dissatisfaction of the boiler efficiency. Increasing the boiler efficiency will sacrifice the coal–limestone ratio slightly. Ten generations are not enough for the algorithm to find enough Pareto-optimal solutions. As the generation number increases from 25 to 50, the algorithm tends to find more similar solutions. The initial population mostly spans Region2 and Region3 . After several generations, the algorithm can converge to the Pareto-optimal front, as shown in Fig. 7. The computational results reported in Sections IV-D and IV-E show that both optimization models have the ability to provide satisfactory solutions. The preference aggregation optimization model achieves the goal by adjusting the weights and calls for domain knowledge about the process. The Paretooptimal set approach does not rely on domain knowledge in solving the optimization model and can provide a set of potential solutions. However, a decision is needed to select the final solution. V. C ONCLUSION DM algorithms and EAs were integrated within a framework to optimize a complex time-dependent process. The underlying process dynamic equations were identified by the DM approach. The equations can be updated with the current process data if the prediction accuracy of them decreases. Modified evolutionary strategy algorithms were applied to solve the optimization models either through preference aggregation or through a Pareto-optimal set approach.

Fig. 7. Pareto-optimal set approach to solve model (13) at sampling time “2/21/2007 9:55 A . M .” (a) Initial population. (b) Pareto-optimal front at the tenth generation. (c) Pareto-optimal front at the 25th generation. (d) Paretooptimal front at the 50th generation.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES

To solve the preference aggregation model, a traditional evolutionary strategy algorithm was modified by introducing a threshold-distance selection operator, which prevented the algorithm from rapid convergence due to the diversity in the population. For the Pareto-optimal set approach, a traditional evolutionary strategy algorithm was modified according to the SPEA. The local and global strength of each individual was used to select the offspring for the next generation. The diversity of the population was guaranteed by the classification of the individuals into different regions in the preference space. The solutions leading to the undesired regions were not kept in the Pareto-optimal set, which significantly reduced the size of the elite set. The industrial case study illustrated the effectiveness of the proposed approach and the possibility of applying this framework to other similar processes. The combustion efficiency and the coal–limestone ratio were optimized by adjusting two controllable variables. The computational results showed that small improvements of the coal–limestone ratio produced a significant decrease in the combustion efficiency. It is conceivable that other controllable variables that were not considered in this research could be used to optimize the coal–limestone ratio without an adverse impact on the boiler efficiency. A PPENDIX Definition 11: For response variable y1 , Dyy11 = {dyy11 ,low , . . . , dyy11 ,high } is a set that is composed of integers selected from {1, . . . , dy1 } related to y1 ’s previous values and arranged in ascending sequence, dyy11 ,low ≤ dyy11 ,high . Similarly, Dxy11 = {dxy11,low , . . . , dxy11,high } is a set selected from {1, . . . , dx1 } for predictors related to x1 . Similarly, Dvy11 = {dvy11 ,low , . . . , dvy11 ,high } is a set selected from {1, . . . , dv1 } for predictors related to v1 . In total, there are 1 + k + m individual sets for y1 : 1 . For any response variable Dyy11 , Dxy11 , . . . , Dxy1k , Dvy11 , . . . , Dvym yi , there are 1 + k + m individual sets: Dyyii , Dxyi1 , . . . , Dxyik , i , i = 1 to l. Dvy1i , . . . , Dvym Based on definition 11, for i = 1 to l, yi = fi (x, v) can be written as the following dynamic equation:  yi (t) = fi [yi (t − d)]d∈Dyyi , [x1 (t − d)]d∈Dxyi , . . . , 1

i

[xk (t − d)]d∈Dxyi , [v1 (t − d)]d∈Dvyi , . . . , 1 k  [vm (t − d)]d∈Dvyi m

where [yi (t−d)]d∈Dyyi , [x1 (t−d)]d∈Dxyi , . . . , [xk (t−d)]d∈Dxyi , 1 i k [v1 (t − d)]d∈Dvyi , . . . , [vm (t − d)]d∈Dvyi expand by enumeratm 1 ing all possible elements in the corresponding sets. yi ,min Definition 12: For i = 1 to l, let dx = min{dxyi1,low , . . . , dxyik,low } be the smallest time delay of controllable variyi ,max = max{dxyi1,high , . . . , ables x for response variable yi , dx yi ,high } be the largest time delay of controllable variables x dxk i ,low } for response variable yi , dvyi ,min = min{dvy1i ,low , . . . , dvym be the smallest time delay of noncontrollable variables v for rei ,high } sponse variable yi , and dvyi ,max = max{dvy1i ,high , . . . , dvym be the largest time delay of noncontrollable variables v for response variable yi .

855

yi Observation 2: If there exists a constant d ∈ m j=1 Dvj yi ,min and d < dx holds, there is not sufficient information about yi ,min ) the noncontrollable variables to predict future yi (t + dx at sampling time t. To explain observation 2, consider an illustrative example, y(t) = f (x1 (t − 3), v1 (t − 1)), at sampling time t; y(t + 3) is to be determined, i.e., y(t + 3) = f (x1 (t), v1 (t + 2)). As v1 (t + 2) is not known, it is difficult to predict y(t + 3); nevertheless, to optimize y(t + 3), change x1 (t). Definition 13: Let X be a set that is composed of all controllable variables. X = {x1 , . . . , xk }, and, for a response variyi ,min , j = 1, . . . , k}. It able yi , define X 1,yi = {xj |dxyij,low = dx 1,yi 2,yi ⊆ X. Let X = X − X 1,yi , and, for is obvious that X 2,yi yi ,min holds. any controllable variables xj ∈ X , dxyij,low > dx Observation 3: At sampling time t, one can only modify the controllable variables in X 1,yi to optimize (or change) yi . X 1,yi is called an actionable variable set of yi . Observation 3 is explained with the following example. For y(t) = f (x1 (t − 3), x2 (t − 4)), at sampling time t, y(t + 3) is optimized by modifying the values of the controllable variables, i.e., y(t + 3) = f (x1 (t), x2 (t − 1)). As x2 (t − 1) has already happened, y(t + 3) is optimized with x1 (t) only. To make sure that all controllable variables are used to optimize yi , observation 3 offers hints for selecting the predictors and the corresponding time delays. From now on, it is assumed that, for each response variable yi , there is enough information to predict its values at sampling time t based on observation 2. Definition 14: Let Y NP = {ylp +1 , . . . , yl } be the set of all nonperformance response variables. Y 1,NP = {yj | lp 1,yi ∩ X 1,yj = ∅, j = lp + 1, . . . , l}, and Y 2,NP = i=1 X NP 1,NP −Y . Y Y 1,NP is a set of nonperformance response variables which will be affected when changing the controllable variables to optimize the process. Y 2,NP is a set of nonperformance response variables which will not be affected during the optimization process.

R EFERENCES [1] M. A. Abido, “Multiobjective evolutionary algorithms for electric power dispatch problem,” IEEE Trans. Evol. Comput., vol. 10, no. 3, pp. 315–329, Jun. 2006. [2] J. M. Alonso, F. Alvarruiz, J. M. Desantes, L. Hernández, V. Hernández, and G. Moltó, “Combining neural networks and genetic algorithms to predict and reduce diesel engine emissions,” IEEE Trans. Evol. Comput., vol. 11, no. 1, pp. 46–55, Feb. 2007. [3] A. Benedetti, M. Farina, and M. Gobbi, “Evolutionary multiobjective industrial design: The case of a racing car tire-suspension system,” IEEE Trans. Evol. Comput., vol. 10, no. 3, pp. 230–244, Jun. 2006. [4] M. J. A. Berry and G. S. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. New York: Wiley, 2004. [5] C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995. [6] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001. [7] L. Breiman, J. H. Friedman, C. J. Stone, and R. A. Olshen, Classification and Regression Trees. Monterey, CA: Wadsworth, 1984. [8] D. Büche, P. Stoll, R. Dornberger, and P. Koumoutsakos, “Multiobjective evolutionary algorithm for the optimization of noisy combustion processes,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 4, pp. 460–473, Nov. 2002.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

856

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

[9] Z. Cai and Y. Wang, “A multiobjective optimization-based evolutionary algorithm for constrained optimization,” IEEE Trans. Evol. Comput., vol. 10, no. 6, pp. 658–675, Dec. 2006. [10] Z. Dai and M. J. Scott, “Effective product family design using preference aggregation,” Trans. ASME, J. Mech. Des., vol. 128, no. 4, pp. 659–667, 2006. [11] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. New York: Wiley, 2001. [12] K. Deb and H. Gupta, “Introducing robustness in multi-objective optimization,” Evol. Comput., vol. 14, no. 4, pp. 463–494, Dec. 2006. [13] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computation. New York: Springer-Verlag, 2003. [14] J. Espinosa, J. Vandewalle, and V. Wertz, Fuzzy Logic, Identification and Predictive Control. London, U.K.: Springer-Verlag, 2005. [15] M. Farina, K. Deb, and P. Amato, “Dynamic multiobjective optimization problems: Test cases, approximations, and applications,” IEEE Trans. Evol. Comput., vol. 8, no. 5, pp. 425–442, Oct. 2004. [16] J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, Feb. 2002. [17] J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001. [18] H. Ghezelayagh and K. Y. Lee, “Intelligent predictive control of a power plant with evolutionary programming optimizer and neuro-fuzzy identifier,” in Proc. Congr. Evol. Comput., 2002, pp. 1308–1313. [19] C. K. Goh and K. C. Tan, “An investigation on noisy environments in evolutionary multiobjective optimization,” IEEE Trans. Evol. Comput., vol. 11, no. 3, pp. 354–381, Jun. 2007. [20] J. A. Harding, M. Shahbaz, S. Srinivas, and A. Kusiak, “Data mining in manufacturing: A review,” Trans. ASME, J. Manuf. Sci. Eng., vol. 128, no. 4, pp. 969–976, 2006. [21] V. Havlena and J. Findejs, “Application of model predictive control to advanced combustion control,” Control Eng. Pract., vol. 13, no. 6, pp. 671–680, Jun. 2005. [22] S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994. [23] J. S. Heo, K. Y. Lee, and R. Garduno-Ramirez, “Multiobjective control of power plants using particle swarm optimization techniques,” IEEE Trans. Energy Convers., vol. 21, no. 2, pp. 552–561, Jun. 2006. [24] T. Jansen, K. A. De Jong, and I. Wegener, “On the choice of the offspring population size in evolutionary algorithms,” Evol. Comput., vol. 13, no. 4, pp. 413–440, Dec. 2005. [25] A. Kusiak and Z. Song, “Combustion efficiency optimization and virtual testing: A data-mining approach,” IEEE Trans Ind. Informat., vol. 2, no. 3, pp. 176–184, Aug. 2006. [26] X. J. Liu and C. W. Chan, “Neural-fuzzy generalized predictive control of boiler steam temperature,” IEEE Trans. Energy Convers., vol. 21, no. 4, pp. 900–908, Dec. 2006. [27] C. H. Lu and C. C. Tsai, “Generalized predictive control using recurrent fuzzy neural networks for industrial processes,” J. Process Control, vol. 17, no. 1, pp. 83–92, Jan. 2007. [28] F. Pierro, S. Khu, and D. A. Savic, “An investigation on preference order ranking scheme for multiobjective evolutionary optimization,” IEEE Trans. Evol. Comput., vol. 11, no. 1, pp. 17–45, Feb. 2007. [29] M. J. Scott, “Formalizing negotiation in engineering design,” Ph.D. dissertation, Stanford Univ. Press, Stanford, CA, 1999. [30] M. J. Scott and E. K. Antonsson, “Compensation and weights for tradeoffs in engineering design: Beyond the weighted sum,” Trans. ASME, J. Mech. Des., vol. 127, no. 6, pp. 1045–1055, Nov. 2005. [31] C. K. Tan, S. Kakietek, S. J. Wilcox, and J. Ward, “Constrained optimisation of pulverised coal fired boiler using genetic algorithms and artificial neural networks,” Int. J. COMADEM, vol. 9, no. 3, pp. 39–46, 2006. [32] K. C. Tan, T. H. Lee, D. Khoo, and E. F. Khor, “A multiobjective evolutionary algorithm toolbox for computer-aided multiobjective optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 4, pp. 537–556, Aug. 2001. [33] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Reading, MA: Addison-Wesley, 2006.

[34] S. Tsutsui and A. Ghosh, “Genetic algorithms with a robust solution searching scheme,” IEEE Trans. Evol. Comput., vol. 1, no. 3, pp. 201–208, Sep. 1997. [35] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques., 2nd ed. San Francisco, CA: Morgan Kaufmann, 2005. [36] T. Zhang and J. Lu, “A PSO-based multivariable fuzzy decision-making predictive controller for a once-through 300-MW power plant,” Cybern. Syst., vol. 37, no. 5, pp. 417–441, Jul./Aug. 2006. [37] H. Zhou, K. Cen, and J. Fan, “Modeling and optimization of the NOx emission characteristics of a tangentially fired boiler with artificial neural networks,” Energy, vol. 29, no. 1, pp. 167–183, Jan. 2004. [38] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach,” IEEE Trans. Evol. Comput., vol. 3, no. 4, pp. 257–271, Nov. 1999. [39] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Fonseca, “Performance assessment of multiobjective optimizers: An analysis and review,” IEEE Trans. Evol. Comput., vol. 7, no. 2, pp. 117–132, Apr. 2003. [40] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1994. [41] Accessed Jan. 28, 2009. [Online]. Available: http://en.wikipedia.org/ wiki/Weighted_moving_average#Weighted_moving_average [42] K. C. Tan, Q. Yu, and T. H. Lee, “A distributed evolutionary classifier for knowledge discovery in data mining,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 2, pp. 131–142, May 2005. [43] D. Liu, K. C. Tan, C. K. Goh, and W. K. Ho, “A multiobjective memetic algorithm based on particle swarm optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 1, pp. 42–50, Feb. 2007. [44] C. K. Goh and K. C. Tan, “A competitive-cooperative coevolutionary paradigm for dynamic multiobjective optimization,” IEEE Trans. Evol. Comput., vol. 13, no. 1, pp. 103–127, Feb. 2009.

Zhe Song (M’08) received the B.S. and M.S. degrees from China University of Petroleum, Dongying, China, in 2001 and 2004, respectively, and the Ph.D. degree from The University of Iowa, Iowa City, in 2008. He is currently an Associate Professor with the Business Administration Department, School of Business, Nanjing University. He has published technical papers in journals sponsored by IEEE, ESOR, and IFPR. His research concentrates on modeling energy systems, control and optimization, data mining, computational intelligence, decision theory, control theory, and statistics.

Andrew Kusiak (M’89) received the B.S. and M.S. degrees in engineering from the Warsaw University of Technology, Warsaw, Poland, in 1972 and 1974, respectively, and the Ph.D. degree in operations research from the Polish Academy of Sciences, Warsaw, in 1979. He is currently a Professor with the Intelligent Systems Laboratory, Department of Mechanical and Industrial Engineering, The University of Iowa, Iowa City. He speaks frequently at international meetings, conducts professional seminars, and does consultation for industrial corporations. He has served on the editorial boards of over 40 journals. He is the author or coauthor of numerous books and technical papers in journals sponsored by professional societies, such as the Association for the Advancement of Artificial Intelligence, the American Society of Mechanical Engineers, etc. His current research interests include applications of computational intelligence in automation, wind and combustion energy, manufacturing, product development, and healthcare. Prof. Kusiak is an Institute of Industrial Engineers fellow and the Editor-inChief of the Journal of Intelligent Manufacturing.

Authorized licensed use limited to: The University of Iowa. Downloaded on June 20,2010 at 02:29:43 UTC from IEEE Xplore. Restrictions apply.

Suggest Documents