Hao Ding, Matthias Rungger, Olaf Stursberg ... Email: {hao.ding, matthias.rungger, stursberg}@tum.de .... given us(tj), a feasible run Ïs of HAs is defined as.
INTELLIGENT PLANNING OF MANUFACTURING SYSTEMS WITH HYBRID DYNAMICS 1 Hao Ding, Matthias Rungger, Olaf Stursberg
Institute of Automatic Control Engineering Technische Universit¨ at M¨ unchen D-80290 Munich, Germany. Email: {hao.ding, matthias.rungger, stursberg}@tum.de
Abstract: This contribution describes a computational and model-based scheme which (locally) optimizes the behavior of manufacturing systems by making use of learned knowledge. Parameterized hybrid dynamic models of the manufacturing system and its environment together with variantly derived goal and safety specifications serve as constraints of an optimization problem. The latter is solved iteratively using the principles of model predictive control, in order to compute control strategies that are adapted to changing situations. Past evolutions computed by the optimizer are stored in a knowledge-base and are employed as learned behavior to make decisions for new situations more efficient and reliable, c as shown for an example system. Copyright 2007 IFAC Keywords: Planning, Hybrid Systems, Model Predictive Control, Learning.
1. INTRODUCTION While most existing manufacturing processes still follow a strictly fixed sequence of operation and permit only adaptation to small variations in the materials processed or the product specification, academia has proposed a variety of approaches to enhance the flexibility of production. Maybe most prominently under the label of flexible and intelligent manufacturing systems, methods were developed that maintain production (possibly with degraded performance) if certain system components cease operation due to malfunction or if the production scheme has to be changed for reasons like profitability, schedulability, etc. Common to many approaches in this context (see e.g. (Moore and Gupta, 1996; Lawley, 1999; Seidl and 1
This work was partially supported by the cluster of excellence Cognition for Technical Systems (CoTeSys), funded by the German Research Foundation (DFG).
Schmidt, 2000)) is that the mechanism of reconfiguration is computed offline during design time or, if executed online, that a decision between very simple and pre-computed options is taken. However, in order to increase the autonomy and the ability to flexibly react to unforeseen disturbances or variations online, recent approaches have aimed at including principles of learning into the automation scheme. A prominent class of such principles is that of machine learning, which aims at deducing and extrapolating rules from given data sets (e.g. (Russell and Norvig, 2003; Barto and Mahadevan, 2003)). Examples of corresponding intelligent control techniques applied to manufacturing systems can be found in (McFarlane et al., 2005; Beccuti et al., 2006; Perusich and McNeese, 2006; Watanabe, 1989). This paper aims at adding to this development by proposing a scheme that computes locally optimal control strategies (or plans) online under consideration of all rel-
evant and time-varying constraints imposed on the manufacturing system. In particular, if the production comprises joint action of human operators and (partly) autonomously operating technical equipment, it is of utmost importance that the system can react quickly to the behavior of the humans – obviously, human behavior cannot be determined completely at design-time, but efficient operation requires online adaptation of the control strategies of the technical equipment. The scheme proposed here combines machine learning with model predictive control (MPC) and hybrid dynamics for representation of behavior. Hybrid dynamic systems, which establish the interaction of continuous and discrete dynamics (i.e. differential equations and state transition systems, see e.g. (Henzinger, 1996)) are suitable to model that the manufacturing system evolves differently in distinct qualitative modes (e.g. steps of production). MPC is a well-established online control strategy that iteratively computes locally optimal control inputs by solving an optimization problem over a moving time horizon, e.g. (Carlos et al., 1989; Mayne et al., 2000). In recent years, different approaches for MPC of hybrid systems have been developed and successfully employed in different application domains, see e.g. (Beccuti et al., 2006; Cairano et al., 2006; Tometzki et al., 2006). This paper shows how MPC for hybrid systems extended by learning from past evolution can be used to flexibly plan the operation of manufacturing system, such that time-variant constraints for safety and goal-attainment are observed. The approach details the planning component which has been named strategic controller in an architecture for cognitive safety controllers in (Kain et al., 2007). 2. THE PLANNING ARCHITECTURE According to Fig. 1, the control unit for planning the behavior for the manufacturing system consists of the following parts: dynamic models of the system to be controlled and the environment (not manipulable by the controller), a block for generating constraints for safety and goal specifications, the optimization unit, and a learning algorithm including a knowledge-base for storing past behaviors including assessments. The scheme is operated in a discrete-time setting, i.e. in each sampling time the dynamic models, the safety constraints, and a goal region are updated based on measured data received from the manufacturing system and its environment. The updated information is passed to the optimizer and the learning block. If the knowledge-base of the latter already contains an appropriate control strategy for the momentary situation, this is passed to the optimizer for initialization. The optimizer com-
Control unit Dynamic models of the system φs,k φu,k and the environment, Model execution F ∗ G∗ φe,k φu,i φs,i φ∗F,k Safety constraints Optimization (MPC) Gk & goal generation φe,k φL u,k φs,k u∗ (tk ) φ∗F,k Gk Learning with knowledge-base ys (tk ) System (to be controlled) ye (tk ) Environment
ys (tk ) ye (tk )
Fig. 1. Control unit for strategic planning (k: time index, i: iteration index for the optimization). putes then an optimal control strategy over a limited prediction horizon – this strategy is stored in the knowledge-base together with the current situation of system and environment, and the first control input of the strategy is applied to the system. In detail, the following notation is used in Fig. 1: the prediction horizon is denoted by a set of discrete-time points Tp = {tk , tk+1 , · · · tk+p } starting from the current time tk ∈ R≥0 ; ys (tk ) ∈ Rnys and ye (tk ) ∈ Rnye are the current measured variables from the system and the environment; s(tk ) and e(tk ) are the hybrid states of the system and the environment (further explained in the following section); φs,k and φe,k are the predicted state trajectories of the system and the environment models, while φu,k is the control trajectory computed by the optimizer; φs,i and φu,i are intermediate results computed during the course of optimization carried out at time tk ; eventually, φL u,k denotes the predicted control inputs derived from the learning block. A sequence of state sets to be avoided φF,k and a goal region at the current time Gk are derived from φs,k , φe,k , ys (tk ), ye (tk ) as well as from a set of user-specified safety and goal specification (F ∗ , G∗ ). Finally, u∗ (tk ) is the control input to the system at time tk . 3. HYBRID MODELING As mentioned in the introductory section, the use of hybrid models is favored here for representing the behavior of system and environment, since manufacturing processes often show the combination of continuous and discrete-event behavior. For the system model, the dynamics can be assumed to be well-known such that specifying the hybrid model from first principles is possible. For the environment, the dynamics will, in
most cases, not be known during the design of the controller such that an identification based on measured data is the proper means to account for variations. Techniques for identifying hybrid models were reported in (Paoletti et al., 2007) and (Juloski et al., 2006) – for brevity of presentation, it is assumed here that the environment model can be obtained from such techniques. The dynamics of both components (system and environment) is specified by hybrid automata, using a version with inputs as introduced in (Stursberg, 2006), a hybrid automaton for modeling the system is HAs = (Xs , Us , Zs , invs , Θs , gs , rs , fs ) with: • Xs ⊆ Rnsx specifies the system state space of dimension nsx with a continuous state vector xs ∈ Xs ; • Us ⊆ Rnsu is the input space; • the finite set of system locations is denoted by Zs = {zs1 , · · · , zsnz }; • a mapping invs : Zs → 2Xs assigns an invariant set to each location zsj ∈ Zs ; • the set of transitions is given by Θs ⊆ Zs × Zs , where a transition from zs1 ∈ Zs into zs2 ∈ Zs is denoted by (zs1 , zs2 ); • a mapping gs : Θs → 2Zs associates a guard gs ((zs1 , zs2 )) ⊆ Xs with each (zs1 , zs2 ) ∈ Θs ; • a reset f unction rs : Θs × Xs → Xs assigns an updated system state x ¯s ∈ Xs to each (zs1 , zs2 ) ∈ Θs , and xs ∈ gs ((zs1 , zs2 )); • fs : Zs × Xs × Us → Rnsx defines a discretetime state equation xs (tj+1 ) = fs (zs (tj ), xs (tj ), us (tj )), in which zs (tj ), xs (tj ), and us (tj ) denote the values of the location, the continuous state, and the input of the system at time tj ∈ R≥0 , respectively. T = {t0 , t1 , t2 , · · · } is an ordered set of increasing time points tj ∈ R≥0 , and xs (tj ), zs (tj ), and us (tj ) are defined on this set. Let S denote the set of all valid hybrid states s(tj ) = (zs (tj ), xs (tj )) with zs (tj ) ∈ Zs and xs (tj ) ∈ inv(zs (tj )). For given us (tj ), a feasible run φs of HAs is defined as a sequence φs = (s(t0 ), s(t1 ), · · · ) with s(tj ) ∈ S: • Initialization: s(t0 ) is initialized to zs (t0 ) ∈ Z and xs (t0 ) ∈ inv(zs (t0 )), • Progress: s(tj+1 ) results from s(tj ) by: (1.) dynamic evolution: xs (tj+1 ) = fs (xs (tj ), zs (tj ), us (tj )); (2.) followed by a transition: (zs (tj ), zs (tj+1 )) ∈ Θs , xs (tj ) ∈ gs (zs (tj ), zs (tj+1 )), and xs (tj+1 ) = rs ((zs (tj ), zs (tj+1 )), xs (tj )) ∈ invs (zs (tj+1 )). Φs denotes the set of all feasible runs for the system. A hybrid automaton without inputs (as not manipulable) is set up for the environment: HAe = (Xe , Ze , inve , Θe , ge , re , fe ), where
• Xe ⊆ Rnex specifies the state space of the environment; • the finite set of environment locations is Ze = {ze1 , · · · , zenz }; • a mapping inve : Ze → 2Xe specifies the invariants; • the set of transitions is Θe ⊆ Ze × Ze ; • a mapping ge : Θe → 2Ze defines a guard ge ((ze1 , ze2 )) ⊆ Xe for each (ze1 , ze2 ) ∈ Θe ; • a reset f unction re : Θe ×Xe → Xe assigns a state update x ¯e ∈ Xe to each transition and to each xe ∈ ge ((ze1 , ze2 )); • fe : Ze × Xe → Rnex is the state equation of the environment: xe (tj+1 ) = fe (ze (tj ), xe (tj )). For this automaton, a feasible run φe = (e(t0 ), e(t1 ), · · · ), tj ∈ T is defined equivalently to φs as a sequence of hybrid states e(tj ) = (xe (tj ), ze (tj )) with xe (tj ) ∈ Xe and ze (tj ) ∈ inv(ze ). Φe denotes the set of all feasible runs of the environment dynamics. (Interaction HAs and HAe can be introduced by using synchronization or additional input/output variables, but is omitted here.) 4. THE PLANNING PROCEDURE 4.1 Specification of Goals and Safety As important constraints of the optimization procedure, the goal and safety specifications are defined in terms of the model HAs and HAe : for each tk ∈ T , a sequence of forbidden regions φF,k = {Fk , Fk+1 , · · · , Fk+p }, Fk ∈ S is specified for HAs along the prediction horizon. Any Fk ⊂ Zs × Xs is determined as a projection of e(tk ) (surrounded by a safety margin) onto the coordinates of the system. This construction is particularly suited for the case that system and environment operate in the same space and must not collide (e.g. a manufacturing robot must not collide with a human operator [modeled by HAe ]). Furthermore, the goal region Gk ⊂ S defines for time tk in which set the system state s(t) should be driven by the controller. The function computing the forbidden and goal regions according to (φF,k , Gk ) = Γ(F ∗ , G∗ , φs,k , φe,k , ys (tk ), ye (tk )) takes the global safety specification F ∗ (e.g. no collision between objects), the global goal region G∗ (e.g. accomplishing production), the state trajectories φs,k and φe,k obtained from simulating the system and environment models, and the current measured data ys (tk ) and ye (tk ) into account. 4.2 Optimization on Moving Horizons The MPC scheme produces a control strategy to drive the system state s(tk ) from an initial point into the goal region while avoiding that s(tk ) enters any forbidden region. The models of
the system and the environment are utilized to predict the future behavior of the system (under the effect of a control strategy) and the environment. The control strategy is computed by optimization in which the future behavior, a possibly varying range of available control inputs u(tk ), and the forbidden and goal regions serve as constraints. The solution is a sequence φu,k = (u(tk ), . . . , u(tk+p−1 )), u(tj ) ∈ U, j ∈ {k, . . . , k + p − 1}, which minimizes a suitable cost function J. The constrained optimization problem can be formulated as: min J(φs,k , φe,k , φu,k , p, Gk ) φu,k
s.t. s(tj ) ∈ / Fj ∀j ∈ {k, . . . , k + p} umin (tk ) ≤ u(tj ) ≤ umax (tk ) φs,k ∈ Φs and φe,k ∈ Φe where φs,k and φe,k are the predicted state trajectories of system and environment, which must be contained in the sets of feasible runs Φs and Φe . No state s(tj ) contained in φs,k must be in a forbidden region F (tj ). umin (tk ) and umax (tk ) are the limitations for the control inputs u(tj ) and thus for the control trajectory φu,k . A possible solution technique for the above optimization problem is to use the principle of control vector parametrization (executed iteratively): (1) the optimizer suggests a trajectory φu,k , (2) the models of system and environment are simulated for this choice leading (possibly) to feasible φs,k and φe,k , (3) the cost function J is evaluated for these trajectories, and (4) the results reveals if φu,k should be further modified for improvement of the costs or if the optimization has sufficiently converged 2 . Among many alternatives, one possible choice for the cost function is to minimize the time for reaching Gk , i.e. J = tf , where tf is the time in the prediction horizon in which Gk can be reached. However, when the goal cannot be reached within p steps the distance between the last hybrid state within the prediction horizon sk+p and the current goal set Gk may be a suitable alternative.
passes it to the optimizer as initialization. The strategies in the knowledge-base are formulated as tuples δ = (s(tk ), φs,k , φe,k , Gk , φF,k , φu,k , α, J). α ∈ [0, 1] denotes a safety indicator describing the suitability of the control strategy φu,k for the given situation. This indicator is computed as a scaled minimum distance between φs,k and φF,k . Based on the associated value of the cost function J and on α, φL u,k is selected for a given situation by a function λ. The latter computes φL u,k = λ(s(tk ), e(tk ), Gk , φF,k ) by comparing the current situation with entries in the knowledgebase and selecting the most appropriate control strategy stored for previously encountered similar situations. Similarity is here defined by small distances of the quantities specifying a situation in the underlying hybrid state space.
5. APPLICATION TO A MANUFACTURING SCENARIO The presented method was applied to a simple manufacturing scenario, in which a mobile robot operates in a car assembling factory and has to observe human operators and other moving obstacles in its range of operation. Fig. 2 sketches the scenario (in 2-D) in which the transportation robot T R picks up a workpiece from the supply area, moves it to the assembling region G, fixes it in cooperation with the human operator H, and then drives back to the supply area to repeat the procedure. In addition, a collision with a moving obstacle M O (i.e. a transportation portal) must be avoided in any case. Around the workspace of the human operator, a safety region is defined as the invariant of a different location zloc2 , in which the robot is restricted to a lower velocity than outside (invariant of the location zloc1 ). The following subsections describe the hybrid modeling of this process, the formulation of the control task (safe and efficient operation of T R), and the computation results. 5.1 Modeling
4.3 Learning of Preferable Behavior The objective of the learning unit is to find a proper strategy by comparing the current situation of the system and the environment with similar situations stored in the knowledge-base. For a given situation at the current time tk , i.e. a combination of s(tk ), e(tk ), Gk , and φF,k , the learning unit infers a proper control strategy φL u,k from the knowledge-base (if contained) and 2
Alternatively, the model dynamics can be considered as algebraic constraints and the optimization be solved by mixed-integer programming.
The setting is modeled by three hybrid automata, one for the transportation robot (the system) HAs and two for the moving objects HAe1 and Workpiece zloc1 supply area
MO Car G
TR x2
H MO x1
zloc2 Working space
Fig. 2. Sketch of the car assembling process.
HAe2 . An area around the states of the latter two automata account for safety margins, and they determine no-go zones for T R, i.e. the state of HAs . 5.1.1. Robot Model: The position of the transportation robot is denoted by a state vector: xs := (x1 , x2 )T and the input is the velocity in these coordinates u := (vx1 , vx2 )T with U = [−5, 5] × [−5, 5]. The safety region around the assembling area is specified as a polyhedron such that the invariants of the two locations Zs = {zloc1 , zloc2 } result in: invs (zloc1 ) = {xs ∈ R2 | [0, 0]T ≤ x ≤ [100, 100]T ∧ C · x > c} and invs (zloc2 ) = {xs ∈ R2 | C ·x ≤ c} with an appropriate matrix C and a vector c (and scaled dimensionless units). The set of transitions is Θs = {(zloc1 , zloc2 ), (zloc2 , zloc1 )} with gs ((zloc1 , zloc2 )) = {x | C · x ≤ c} and gs ((zloc2 , zloc1 )) = {x | C · x ≥ c}. The reset functions are identity mappings in this case. Eventually, the simple state equations of the two locations are as follows (with ∆j+1 = tj+1 − tj ):
TR Initial pos.
zloc1 with workpiece
zloc2
MO
G
H x2
MO without workpiece x1
Fig. 3. System trajectories with learning for p = 3, 10 runs. tk = 0.3 is chosen. The similarity between current and past situations is defined in terms of the Euclidean distances of the continuous parts of the current state of the system, the moving obstacle, the operator, and the goal state.
In order to examine the generation of strategies for the cases with and without learning, different fs (zloc1 , xs (tj ), us (tj ), tj ) = xs (tj ) + us (tj ) · ∆j+1 sets of runs between the supply area and the goal fs (zloc2 , xs (tj ), us (tj ), tj ) = xs (tj ) + 0.5us (tj ) · ∆j+1 region were carried out and investigated. Fig. 3 shows exemplary the 10 first runs of a test series, 5.1.2. Environment Model: It is assumed that in which the initial point in the supply area and the dynamics of the moving obstacle and the opthe goal point in the set G are chosen randomly erator are known, and can be represented by one each time. The paths of the transportation robot hybrid automaton each as follows: both automata computed by the MPC scheme is shown by dotted contain two locations which represent the two lines. The crosses on the trajectory mark the cases directions in which the obstacle and the operator in which the learning block could make use of a move. For each automaton, the state vector deprevious similar run, and the corresponding confines the position xe := (xe1 , xe2 )T . The obstacle trol strategy from the knowledge-base was used for moves up and down in direction of xe2 with velocinitialization of the corresponding optimization. ity |vxe2 | = 8 between 20 ≤ x2 ≤ 80 (determining Of course, all runs do not lead to a collision of invariants and guard sets). The operator moves the transportation robot with the obstacle or the diagonally within the safety region with velocities human operator. |vxe1 | = 5 and |vxe2 | = 2. The safety margins In order to assess the performance of the scheme, considered around the states of the two objects the tables 1 and 2 summarize the results for two are assumed to account also for the uncertainty in test series. For two different prediction horizons the motion of obstacle and operator; they define (p = 3, p = 5) and a number of 50 or 500, respecthe time varying forbidden regions Fe1 and Fe2 . Table 1. Overall computation time (OCT) and average cost per run (CPR) 5.2 Planning and Performance for a test with 50 runs. The control task is to drive the robot from an initial position s(t0 ) in the supply area into a desired goal position sg contained in G without entering the forbidden regions Fe1 and Fe2 . The appropriate control input sequence φu,k is calculated by solving the optimization problem described in Sec. 4.2. The cost function is formulated as the distance between the goal position sg and the position of the robot at the last predicted state s(tk+p ) within the prediction horizon: J = kxs (tk+p ) − xg k2 For the computation, a step size of ∆k = tk+1 −
Horizon length OCT, no learning OCT, with learning CPR, no learning CPR, with learning
3 82 57 11.772 11.772
5 202 109 12.138 12.276
Table 2. Overall computation time (OCT) and average cost per run (CPR) for a test with 500 runs. Horizon length OCT, no learning OCT, with learning CPR, no learning CPR, with learning
3 829 549 11.656 11.686
5 1846 1049 12.018 12.624
tively, runs, they contain the total computation time and the average cost per run (measured as real time required to reach the goal region, or the supply area on the way backwards, respectively) for the two cases of using or not using the learning step. The tables demonstrate that the use of learning reduces the computation time by 30 to 46%, while the average cost per run was only very moderately affected. In all cases, the optimization was carried out quickly enough to enable online computation.
6. CONCLUSIONS AND FUTURE WORK In this paper, a flexible planning scheme for manufacturing systems was presented which computes suitable and locally optimal control strategies online. The strategies take into account time-varying safety restrictions from the environment and goal specifications, thus allowing that production is flexibly adapted to new situations. The storage of data on past behavior in a knowledge-base is an efficient means to quickly adapt a strategy to varying conditions, either by substituting the optimization in the MPC scheme or by properly initializing it (leading to faster convergence). The extension of control strategies to dynamic evolutions that are similar to previously encountered ones corresponds to the principle of reinforcement learning. Current work is on extending the scheme to action sequences (i.e. concatenations of different runs of a system), considering additional mechanisms known from machine learning, and on embedding identification techniques and stochastic control for hybrid systems into the approach.
7. REFERENCES Barto, A.G. and S. Mahadevan (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Systems 13(4), 341–379. Beccuti, A. G., G. Papafotiou and M. Morari (2006). Explicit model predictive control of the boost DC-DC converter. In: 2nd IFAC Conf. on Analysis and Design of Hybrid Systems. 315–320. B¨ ose, F., K. Windt, and M. Teucke (2006). Modelling of Autonomously Controlled Logistic Processes in Production Systems. In: 8th Int. Conf. on Modern Inf. Techn. in the Innovation Processes of Ind. Enterprises. 341–346. Cairano, S.D., A. Bemporad, I. Kolmanovsky and D. Hrovat (2006). Model predictive control of nonlinear mechatronic systems: An application to a magnetically actuated mass spring damper. In: 2nd IFAC Conf. on Analysis and Design of Hybrid Systems. 241–246.
Carlos, D.M.P., E. Garcia and M. Morari (1989). Model predictive control, theory and practice - a survey. Automatica 25, 335–348. Henzinger, T. A. (1996). The theory of hybrid automata. 11th Annual IEEE Sym. on Logic in Computer Science 170, 278–292. Juloski, A.L., S. Paoletti, and J. Roll (2006). Recent techniques for the identification of piecewise affine and hybrid systems. In: Current Trends in Nonlinear Systems and Control, Birkhaeuser. Kain, S., H. Ding, F. Schiller and O. Stursberg (2007). Controller architecture for safe cognitive technical systems. Accepted for: 26th Int. Conf. on Computer Safety, Reliability and Security. M.A. Lawlwy (1999). Deadlock avoidance for production systems with flexible routing. IEEE Trans. on Robotics and Automation 15(3), 497-509. Mayne, D.Q., J.B. Rawlings, C.V. Rao and P.O.M. Scokaert (2000). Constrained model predictive control: Stability and optimality. Automatica 36, 789–814. McFarlane, D., V. Marik, and P. Valckenaers (2005). Intelligent Control in the Manufacturing Supply Chain. IEEE Intelligent Systems 20(1), 24–26. Moore, K.E, and S.M. Gupta (1996). Petri net models of flexible and automated manufacturing systems: a survey. Int. J. on Production Research 34(11), 3001–3035. Paoletti, S., A. Juloski, G. Ferrari-Trecate, and R. Vidal (2007). Identification of Hybrid Systems: A Tutorial. European Journal of Control, 13(2-3), 242–260. Perusich, K. and M.D. McNeese (2006). Using Fuzzy Cognitive Maps for Knowledge Management in a Conflict Environment. IEEE Trans. on Systems, Man, and Cybernetics 36, 810–821. Russell, S. and P. Norvig (2003). Artificial Intelligence: A Modern Approach. Prentice Hall. Seidl, M. and G. Schmidt (2000). Avoiding deadlocks in flexible manufacturing systems. In. Discrete Event Systems: Analysis and Control, 149–158. Stursberg, O. (2006). Supervisory control of hybrid systems based on model abstraction and refinement. Journal on Nonlinear Analysis 65(6), 1168–1187. Tometzki, T., O. Stursberg, C. Sonntag and S. Engell (2006). Optimizing hybrid dynamic processes by embedding genetic algorithms into MPC. In: IFAC Symp. Advanced Control of Chemical Processes, 977–982. T. Watanabe (1989). Intelligent control in the hierarchy of automatic manufacturing systems. In: Journal of Intelligent and Robotic Systems, 2(2-3), 171–186.