Submitted to Applied Mathematical Modelling (Paper No. **** dated April 10, 2003)
On-line Identification of Language Measure Parameters for Discrete Event Supervisory Control 1 Xi Wang Asok Ray Amol M Khatkhate
[email protected] [email protected] [email protected] Department of Mechanical Engineering The Pennsylvania State University University Park, PA 16802 Corresponding Author: Asok Ray Keywords:
Discrete-Event Supervisory Control; Formal Languages; Parameter Identification; Performance Measure
Abstract The discrete-event system (DES) behavior of physical plants is often modelled as regular languages that can be realized by deterministic finite state automata (DFSA). The concept and construction of quantitative measures of regular languages have been recently reported in literature, and major applications of the language measure are: quantitative evaluation of the discrete-event dynamic behavior of unsupervised and supervised plants; and analysis and synthesis of optimal supervisory control algorithms in the discreteevent setting. This paper formulates and experimentally validates an on-line parameter identification procedure for the language measure based on a DFSA model of the physical plant. The recursive algorithm of this identification procedure relies on observed simulation and/or experimental data. Efficacy of the language parameter identification procedure is demonstrated for real-time supervisory control by experiments on a mobile robotic system. 1 Introduction The discrete-event system (DES) behavior of physical plants is often modelled as regular languages that can be realized by deterministic finite state automata (DFSA) [2] [6]. The (regular) sublanguage of a controlled plant could be different under different supervisors that are constrained to satisfy different specifications [7]. Such a partially ordered set of sublanguages requires a quantitative measure for total ordering of their respective performance. Two techniques for language measure computation have been recently reported. While the first technique [10] leads to a system 1 This work has been supported in part by Army Research Office (ARO) under Grant No. DAAD19-01-1-0646; and NASA Glenn Research Center under Grant No. NAG3-2448.
of linear equations whose (closed form) solution yields the language measure vector, the second technique [8] is a recursive procedure with finite iterations. A sufficient condition for finiteness of the signed real measure has been established in both cases. The language measure serves as a common quantitative tool to compare the performance of different su˜ pervisors and is assigned an event cost Π-matrix and a state characteristic X-vector. Event costs (i.e., ele˜ ments of the Π-matrix) based on plant states, where they are generated, are physical phenomena dependent on the plant behavior, and are similar to the conditional probabilities of the respective events. On the other hand, the X-vector is chosen based on the designer’s perception of the individual state’s impact on the system performance. In the performance evalation of both the unsupervised and supervised plant behav˜ ior, the critical parameter is the event cost Π-matrix. For example, Fu et al [3] [4] have proposed state-based optimal control policies by selectively disabling controllable events to maximize the language measure of the controlled plant language. Furthermore, since the plant behavior is often slowly time-varying, there is a need for on-line parameter identification to generate up-to˜ date values of the Π-matrix within allowable bounds of errors. This paper complements our earlier work [10] that reported theoretical formulation of a language measure without specifically stating how to obtain the underly˜ ing (e.g., event cost matrix Π-matrix) parameters. The main contribution of the present paper is development ˜ of a recursive procedure to identify the Π-matrix parameters of the language measure based on real-time experimentation and/or simulation. The recursive parameter estimation scheme permits on-line synthesis of optimal and/or robust control [3] [4] of discrete-event systems (DES) based on the language measure. The
p. 1
parameter identification procedure is supported by a stopping rule that determines the number of experiments to be conducted for a given bound of allowable estimation error. The paper is organized in five sections including the present section. The language measure is briefly reviewed in Section 2 including introduction of the notations. Section 3 presents the main result - formulation ˜ of the Π-matrix parameter identification algorithm and the associated stopping rules. Section 4 validates the algorithm through learning by simulation as well as by experimentation on a mobile robotic system. The paper is concluded in Section 5 with a brief discussion on robustness of the identification procedure. 2 Review of the Language Measure Concept Let Gi = (Q, Σ, δ, qi , Qm ) be a DFSA model that represents the discrete-event dynamic behavior of a physical plant. Let n denote the cardinality of the state set Q, i.e., |Q| = n, and I ≡ {1, . . . , n} the index of Q; Σ the (finite) alphabet of events; Σ∗ is the set of all finitelength strings of events including the empty string ²; δ : Q × Σ → Q is a (possibly partial) function of state transitions and δ ∗ : Q × Σ∗ → Q is an extension of δ; the state qi is the initial state; and Qm is the set of marked states, ∅ ⊂ Qm ⊆ Q. Definition 2.1 The language L(Gi ) generated by a DFSA G initialized at the state qi ∈ Q is defined as: L(Gi ) = {s ∈ Σ∗ | δ ∗ (qi , s) ∈ Q}
(1)
The language Lm (Gi ) marked by the DFSA G initialized at the state qi ∈ Q is defined as: Lm (Gi ) = {s ∈ Σ∗ | δ ∗ (qi , s) ∈ Qm }
(2)
Definition 2.2 For every qj ∈ Q, let L(qi , qj ) denote the set of all strings that, starting from the state qi , terminate at the state qj , i.e., L(qi , qj ) = {s ∈ Σ∗ | δ ∗ (qi , s) = qj ∈ Q}
(3)
The set Qm of marked states is partitioned into Q+ m and + − + − Q− , i.e., Q = Q ∪ Q and Q ∩ Q = ∅, where Q+ m m m m m m m contains all good marked states that we desire to reach, and Q− m contains all bad marked states that we want to avoid, although it may not always be possible to completely avoid the bad states while attempting to reach the good states. To characterize this, each marked state is assigned a real value based on the designer’s perception of its impact on the system performance. Definition 2.3 The characteristic function χ : Q → [−1, 1] that assigns a signed real weight to state-based sublanguages L(qi , q) is defined as: [−1, 0), q ∈ Q− m {0}, q∈ / Qm ∀q ∈ Q, χ(q) ∈ (4) + (0, 1], q ∈ Qm
The state weighting vector, denoted by X = [χ1 χ2 · · · χn ]T , where χj ≡ χ(qj ) ∀k, is called the Xvector. The j-th element χj of X-vector is the weight assigned to the corresponding terminal state qj . In general, the marked language Lm (Gi ) consists of both good and bad event strings that, starting from the − initial state qi , lead to Q+ m and Qm respectively. Any event string belonging to the language L0 = L(Gi ) − Lm (Gi ) leads to one of the non-marked states belonging to Q − Qm and L0 does not contain any one of the good or bad strings. Based on the equivalence classes defined in the Myhill-Nerode Theorem [2], the regular languages L(Gi ) and Lm (Gi ) can be expressed as: [
L(Gi ) =
L(qi , qk ) =
qk ∈Q
Lm (Gi ) =
[
n [
L(qi , qk )
(5)
− L(qi , qk ) = L+ m ∪ Lm
(6)
k=1
qk ∈Qm
where the sublanguage L(qi , qk ) ⊆ Gi having the initial state qi is uniquely labeled by the terminal state qk , k ∈ I and L(qi , qj ) ∩ L(qi , qk )S = ∅ ∀j 6= k; and L+ m ≡ S − + L(qi , q) and L − L(qi , q) are good and ≡ m q∈Qm q∈Qm bad sublanguages of L (G ), respectively. Then, L0 = m i S 0 + − q ∈Q / m L(qi , q) and L(Gi ) = L ∪ Lm ∪ Lm . Now we construct a signed real measure µ : 2L(Gi ) → R ≡ (−∞, +∞) on the σ-algebra K = 2L(Gi ) . Interested readers are referred to [10] for the details of measure-theoretic definitions and results. With the choice of this σ-algebra, every singleton set made of an event string ω ∈ L(Gi ) is a measurable set, which qualifies itself to have a numerical quantity based on the above state-based decomposition of L(Gi ) into L0 (null), L+ (positive), and L− (negative) sublanguages. Conceptually similar to the conditional transition probability, each event is assigned a state-dependent cost. Definition 2.4 The event cost of the DFSA Gi is defined as a (possibly partial) function π ˜ : Σ∗ ×Q → [0, 1) ∗ such that ∀qi ∈ Q, ∀σj ∈ Σ, ∀ω ∈ Σ , P ˜ij < 1; (1) π ˜ [σj , qi ] ≡ π ˜ij ∈ [0, 1); jπ (2) π ˜ [σj , qi ] = 0 if δ(qi , σj ) is undefined; π ˜ [², qi ] = 1; (3) π ˜ [σj ω, qi ] = π ˜ [σj , qi ] π ˜ [ω, δ(qi , σj )]. Definition 2.5 The state transition cost of the DFSA Gi is defined as a function Pπ : Q×Q → [0, 1) such that ∀qi , qj ∈ Q, π[qi , qj ] = σ∈Σ:δ(qi ,σ)=qj π ˜ [σ, qi ] ≡ πij and πij = 0 if {σ ∈ Σ : δ(qi , σ) = qj }= ∅. The n × n state transition cost Π-matrix is defined as: π11 π12 . . . π1n π21 π22 . . . π2n Π= . .. .. .. .. . . . πn1 πn2 . . . πnn p. 2
Definition 2.6 The signed real measure µ of every singleton string set Ω = {ω} ∈ 2L(Gi ) where ω ∈ L(qi , q) is defined as µ(Ω) ≡ π ˜ (ω, qi )χ(q). It follows that the signed real measure of the sublanguage L(qi , q) ⊆ L(Gi ) is defined as X (7) µ(L(qi , q)) = π ˜ [ω, qi ] χ(q) ω∈L(qi ,q)
And the signed real measure of the language of a DFSA Gi initialized at a state qi ∈ Q, is defined as: X µi ≡ µ(L(Gi )) = µ(L(q, qi )) (8) q∈Q
The language measure vector, denoted as µ = [µ1 µ2 · · · µn ], is called the µ-vector. It is shown in [10] that the signed real measure µi can be written as X µi = πij µj + χi (9) j
In vector form, µ = Πµ + X whose solution is given by µ = (I − Π)−1 X (10) 3 Estimation of Language Measure Parameters This section focuses on identification of the parame˜ (see ters (i.e., elements) of the event cost matrix Π Definition 2.4) which, in turn, allows computation of the state transition cost matrix Π (see Definition 2.5) and the language measure µ-vector (see Definition 2.6). We propose a recursive algorithm for identification of ˜ the Π-matrix parameters. It is assumed that the underlying physical process evolves at two different time scales. In the fast-time scale, i.e., over a short time period, the system is assumed to be an ergodic, discrete Markov process. In the slowly-varying time scale, i.e., over a long period, the system (possibly) behaves as a non-stationary stochastic process. For a slowlyvarying non-stationary process, it might be necessary to redesign the DES supervisory control policy in real ˜ time. In that case, the Π-matrix parameters should be periodically updated. 3.1 Probabilistic Interpretation Since the plant model is an inexact representation of the physical plant, there exist ununmodelled dynamics to account for. This can manifest itself either as ununmodelled events that may occur at each state or as unaccounted states in the model. Let Σuk denote the set of all ununmodelled events at state k of the DFSA Gi ≡ hQ, Σ, δ, qi , Qm i. Let us create a new unmarked absorbing state qn+1 called the dump state [7] and extend the transition function δ to
δe : (Q ∪ {qn+1 }) × (Σ ∪ (∪k Σuk )) → (Q ∪ {qn+1 }) as follows: δ(qk , σ) if qk ∈ Q and σ ∈ Σ qn+1 if qk ∈ Q and σ ∈ Σuk δe (qk , σ) = qn+1 if k = n + 1 and σ ∈ Σ ∪ Σuk P Therefore the residue θk = 1 − j π ˜kj denotes the probability of the set of unmodelled events Σuk conditioned on the state qk . The Π-matrix is accordingly augmented to obtain a stochastic matrix Πaug as follows: π11 π12 . . . π1n θ1 π21 π22 . . . π2n θ2 .. .. .. .. Πaug = ... . . . . πn1 πn2 . . . πnn θn 0 0 ... 0 1 Since the dump state qn+1 is not marked, its characteristic value χ(qn+1 ) = 0. The characteristic vector then augments to Xaug ≡ [X 0]T . With these extensions the language measure vector µaug = [µ1 µ2 · · · µn µn+1 ]T =[µ µn+1 ]T of the augmented DFSA Gaug ≡ (Q ∪ i {qn+1 }, Σ ∪ (∪k Σuk ), δe , qi , Qm ) can be expressed as: µ ¶ µ µ ¶ T¶ µ Πµ + µn+1 [θ1 · · · θn ] X = + (11) µn+1 µn+1 0 Since χ(qn+1 ) = 0 and all transitions from the absorbing state qn+1 lead to itself, µn+1 = µ(Lm (Gn+1 )) = 0. Hence, Equation (11) reduces to that for the original plant DFSA Gi . Thus, the event cost can be interpreted as P the conditional probability, where the residue ˜kj accounts for the probability of all unθk = 1 − j π modelled events emanating from the state qk . 3.2 A Recursive Parameter Estimation Scheme Let pij be defined as the transition probability of event σj on the state qi , i.e., ( P [σj |qi ], if ∃q ∈ Q, s.t. q = δ(qi , σj ) pij = 0, otherwise and its estimate be denoted as: pˆij that is to be determined from the ensemble of experiment and/or simulation data. We introduce the indicator function It (i, j) to represent the occurrence of event σj at time t if the system was in state qi at time t − 1 where t represents a generalized time epoch, for example, the experiment number. Formally, It (i, j) is expressed as: ( 1, if σj is observed at state qi It (i, j) = 0, otherwise
Let Nt (i), denoting the number of incidents of reaching the state qi up to the time instant t, be a random process mapping the time interval up to the instant t into p. 3
the set of nonnegative integers. Similarly, let nt (i, j) denote the number of occurrences of the event Pσj at the state qi up to the time instant t and Jt (i) ≡ j It (i, j). We formulate the recursive algorithm of learning pij as a stochastic approximation scheme at the starting time instant t = 0 with the initial conditions: pˆ0 (i, j) = 0 for all qi ∈ Q, σj ∈ Σ; and n0 (i) = 0, J0 (i) = 0 for all qi ∈ Q. The algorithm runs for t ≥ 1. Upon occurrence of the event σj at the state qi , the recursive algorithm is incremented as:
Based on the above steps, pij is the asymptotic value of the estimated probabilities pˆNt (i) (i, j) as if the event σj occurs infinitely many times at the state qi . However, dealing with finite amount of data, the objective is to obtain a good estimate pˆij of pij from independent Bernoulli trials of generating events. Let Σu = ∪i Σui , ∀i ∈ I ≡ {1,P . . . , n}, at each state qi , P[Σu |qi ]= θi ∈ (0, 1) and ˜ij = 1 − θi . Thereiπ ˜ fore, an estimate of the (i, j)th element in Π-matrix, ˆ˜ij , is obtained by denoted π
Nt (i) = Jt (i) = pˆNt (i) (i, j) =
Further experiments on a more detailed model is necessary to identify the parameters θi ∀i ∈ I. Since θi ¿ 1, an alternative approximation approach is taken for ease of implementation. We set θi = θ ∀i ∈ I where the parameter 0 < θ ¿ 1 is selected from the numerical perspective based on the fact that the sup-norm kµk∞ ≤ θ−1 [10]. Moreover, the fact that each row ˜ sum P in the Π-matrix being strictly less than 1, i.e., ˜ij < 1, is a sufficient condition for finiteness of jπ the language measure [10].
nt (i, j) =
Nt−1 (i) + It (i) Jt−1 (i) + It (i, j) pˆNt (i)−1 (i, j) +Jt (i)Nt (i)−1 (It (i, j) − pˆNt (i)−1 (i, j)) nt−1 (i, j) + It (i, j)
The main result of this section is summarized below. The recursive stochastic approximation scheme of pˆNt (i) (i, j) is the frequency estimator pˆij (t) of independent Bernoulli trials at time t, i.e., pˆNt (i) (i, j) =
nt (i, j) Nt (i)
and limNt (i)→∞ pˆNt (i) (i, j) = pij . These results can be derived in the following steps. Step 1: Jt (i) = 0.
⇒ Nt (i) = Nt−1 (i); nt (i, j) = nt−1 (i, j); pˆNt (i) (i, j) = pˆNt (i)−1 (i, j).
Step 2: Jt (i) ≥ 1, It (i, j) = 0
⇒
nt (i, j) = nt−1 (i, j); Nt (i) = Nt−1 (i) + 1.
and usage of the above identities yields pˆNt (i) (i, j) = =
nt−1 (i, j) 1 nt−1 (i, j) − Nt−1 (i) Nt (i) Nt−1 (i) nt (i, j) Nt (i)
Step 3: Jt (i) ≥ 1, It (i, j) = 1
⇒
nt (i, j) = nt−1 (i, j) + 1; Nt (i) = Nt−1 (i) + 1.
and usage of the above identities yields pˆNt (i) (i, j) = =
ˆ˜ij = pˆij (1 − θi ) π
3.3 A Stopping Rule for Recursive Learning We propose a stopping rule to find a lower bound of ˜ the number of experiments to be conducted for the Πmatrix parameter identification. This section presents such a stopping rule for an inference approximation having a specified absolute error bound ε with a probability δ. The objective is to achieve a trade-off between the number of experimental observations and the estimation accuracy. A robust stopping rule [9] is presented below. An upper bound on the required number of samples is estimated using the Gaussian structure for the binomial distribution that is an approximation of the sum of a large number of independent Bernoulli trials. Given ˆ˜ij has the mean π that π ˜ij and variance π ˜ij (1 − π ˜ij )/n for n independent Bernoulli trials, ˆ˜ij − π π ˜ij π ˜ij (1 − π ˜ij )/n
zn ≡ p
has zero mean and unit variance, and is approximately Gaussian for n sufficiently large. The probability of estimation error is ( ) √ n o ε n ˆ˜ij − π P |π ˜ij | ≥ ε ' P |zn | ≥ p π ˜ij (1 − π ˜ij ) ˆ˜ij is significantly less conservative than The estimate π that obtained from the Chebyshev inequality. 4 Language Parameter Identification in a Robotic System
nt−1 (i, j) 1 1 nt−1 (i, j) + − ˜ Nt−1 (i) Nt−1 (i) + 1 Nt (i) Nt−1 (i) This section presents on-line identification of the Πmatrix in a behavior-based mobile robotic system, Piont (i, j) neer 2 AT, controlled by a discrete-event system (DES) Nt (i) p. 4
supervisor. The scenario for the robot mmovement is as follows. 1. The robot explores an unknown environment by random search; 2. The robot approaches a recognizable object upon detecting it by its camera; 3. The robot grabs object and searches for a recognizable goal; 4. The robot approaches the destination upon detecting it and releases the object at the destination; 5. The robot keeps on repeating the same procedure unless commanded to stop. Figure 1 shows the deterministic finite state automaton (DFSA) model of the plant (i.e., the robot sys˜ tem). The task is to identify the Π-matrix for regular language generated by the DFSA plant model. The corresponding alphabet of discrete events are listed in Table 1. To demonstrate how the on-line learning of ˜ the Π-matrix works, we start with the Matlab simulation of a typical scenario of the robot system operation in the next subsection. s s
1
C
2
a
s
s
15
a l
14
o
X
4
s
T l
5
v
W
A
s
o
6
s
3
d
a
s 7
a
x
8
c
13
d w
y
p
12
Table 1: Events for DES plant model of the robot Symbol a A b c C d g i K L n o O p q T s S v w W x X y Y
Description approach avoid success return to home reach goal with target find target but gripper full drop grab idle (not moving but observing) distance to target is in good range lost faulty robot stopped obstacle ahead out of the range of faulty robot drop success grab success find goal with an target search faulty robot, target, or goal stop avoid obstacle reach target without target find target 1 lost goal find target 2 lost target find faulty robot
Controllabe √ √
√ √ √
√ √ √
are linear state space model of robot motion dynamics, obstacle avoidance using a simple sonar model, object detection by a simplified laser scanner model, and object recognition by a simple camera model. The simulation is initiated with robot’s random walk. The robot dynamic model was obtained by system identification of experimental data collected from the robot system. The experiments for system identifica-
P
10
q g
11
b
9
g Q
16
Figure 1: The DES plant model for the robotic system
4.1 Language Measure Parameter Identification by Robotic System Simulation This subsection presents a graphical design and development environment for hybrid system simulation using the Matlab simulink and stateflow that is eventdriven and is based on the finite state machine theory. In this setting, the robot dynamics remains unaffected when the DES supervisory controller changes its internal states. In other words, the continuous dynamics and discrete state space are decoupled. The test facility allows on-line synthesis and verification of DES supervisory control policies using the language measure. Figure 2 shows the top level functions of the simulation block diagram, in which the major blocks
Figure 2: Matlab robot simulation block diagram tion of the robot dynamics were conducted with pseudorandom inputs in both time and speed under the following constraints: (i) 2 sec ≤ td ≤ 10 sec; (ii) −300 mm/s ≤ v ≤ 300 mm/s; (3) −80 deg/s ≤ ω ≤ 80 deg/s, where td is the time duration in which the robot runs at constant speed; v and ω are the linear and angular velocities of the robot, respectively. A multivariable ARX model was found to be inaccurate to capture the nonlinear dynamics. Therefore, the subspace technique for system identification of MIMO systems was applied using the n4sid routine in Matlab IDENT toolbox. The resulting state space model, having the state vector p. 5
x = [v ω v˙ ω] ˙ T , is given as: xn+1 yn
= Axn + Bun = Cxn
0.8579 −0.0012 = 0.1415 −0.0893 −0.3081 −0.0032 = 0.4281 −0.1809 · −0.4808 = 0.0138
A
B
C
0.0958 −0.3488 0.6882 −0.1532 0.3104 0.6644 0.2350 0.3603 −0.0037 0.6524 −0.2781 −0.4719 0.2368 −0.5129 0.4664 0.0547
(12) (13) −0.0118 0.0540 −0.5791 0.6374
object home
motion w/ DES control
robot
obstacle
goal
Figure 4: A typical trajectory of the simulated robot
−0.2716 0.2910
¸
Figure 3 shows that the results of the random walk experiment on the robot system are in good agreement with those predicted by the model. Note that, in Figure 3, the variations of the model predictions from the actual measured data are largely due to environmental disturbances (e.g., floor friction).
tured environment. We have not yet attempted to simulate environment disturbances. For example, as the lighting condition in the actual scene may vary over time and space, it is possible that robot loses the detected object during approaching. The next section investigates the impact of noise and disturbances ef˜ fects on the convergence of Π-matrix learning based on experiments.
500 input signal simulated model response actual measured response
linear velocity in mm/s
400 300 200 100 0 −100
0
50
100
angular velocity in deg/s
60
150
200 250 time in sec
300
350
400
450
300
350
400
450
input signal simulated model response actual measured response
40 20 0 −20 −40 −60 −80
0
50
100
150
200 250 time in sec
˜ Figure 5: Some elements of the Π-matrix
Figure 3: System identification of the mobile robot Figure 4 shows a typical trajectory of the supervised robot in a single mission cycle. A mission cycle is defined as the completion of the first four steps of the scenario. Four positions of the robot are shown: mission start point(top left corner), the place where the robot grabs an object, the place where the robot drops the object, and the place where the robot returns to home. Note that when the robot sequentially detects an object, a goal, and its home, then its motion is further regulated by its supervisor to avoid potential collisions and to improve its performance. Figure 5 presents the results of on-line identification ˜ pertinent to some of the non-zero elements of the Πmatrix. It is seen that the transition probabilities indeed converge, possibly at different rates, in this struc-
4.2 On-Line Identification of Language Measure Parameters by Robotic Experiments This section describes experiments on language measure identification for the robot automaton model. Figure 6 shows the supervisory control system of a Pioneer 2 AT robot. We have adopted a network device server, called Player, to manage the robot’s sensors and actuators. The main Pioneer 2 AT sensors include a Sony EVI D30 camera, a SICK LMS200 laser scanner, and Polariod ultrasonic sonar sensors. The camera recognizes different targets, including the object, destination, and home base. The laser scanner is used to measure its distance from the target. The sonar and laser scanners together provide robust obstacle avoidance. The continuous controller, provided in p. 6
to/from Coordinator
Process Noise
D/C
e(t)
+
Continuous−time
+
Controller
LADAR Camera P2 AT Base
Player
y(t)
Sensor Server
z(t)
Player Measurement Module
C/D
n m (t)
+ + Measurement Noise
Figure 6: Robot DES control structure
search at q1, π2, 1
probability
1
0.5
0
obstacle ahead at q1, π2, 3
1
0.5
0
200 400 600 800 find a target at q1, π2, 6
probability
1
0.5
0
5 Summary, Conclusions, and Recommendations for Future Work
− Controller
Discrete Event Supervisory
r(t)
+ n (t) p
0
probability
1
200 400 600 800 lost target at q4, π5, 12
0.5
0
find goal w/ a target at q1, π2, 5 1
0.5
0
1
200 400 600 800 reach goal at q4, π5, 8
0
0.5
0.5
0
0
0 200 400 600 800 avoid successfully at q9, π10,15 1
0.5
0
1
1
200 400 600 800 ready to grab at q4, π5, 9
0 200 400 600 800 avoid obstacle at q9, π10,16
This paper presents a recursive procedure for on-line identification of the language measure parameters in ˜ that is critical for supervithe event cost matrix Π sory control system analysis and synthesis. Given a bound on parameter error within a specified confidence ˜ are shown to level, the elements of event cost matrix Π converge for a stationary process, based on a stopping rule. It is demonstrated by both simulation studies and experimentation on a mobile robotic system that ˜ elements of the Π-matrix converge in a structured environment. For slowly varying non-stationary processes, ˜ the Π-matrix should be periodically updated for online synthesis of optimal control policies [3] [4]. This is recommended as future research. ˜ The Π-matrix learning depends on the event generation mechanism. Different event generators may cause ˜ the elements of Π-matrix converge to different transition probabilities. Ongoing efforts in this direction include formulation of event generation mechanisms using particle filters [1] [5] that could potentially estimate conditional probability density distributions propagating over time.
0.5
6 Acknowledgements 0
0
200 400 600 800 Number of events
0
0
200 400 600 800 Number of events
0
0
200 400 600 800 Number of events
˜ Figure 7: Some non-zero elements of Π-matrix
The authors would like to thank Peter Lee for his assistance on the robot experiment. References
the Pioneer 2 AT software system, sends desired speed signals to the robot through the Player while the sensor readings are relayed by Player’s measurement module. Conceptually, the continuous-to-discrete (C/D) interface acts as the discrete-event generator and the discrete -to-continuous (D/C) interface converts controllable discrete events sent by the DES supervisory controller into continuous signals. The details of the DES robotic control system that are not presented here to space limitations will be reported in a forthcoming publication. Convergence of some of the non-zero elements of the ˜ Π-matrix is shown in Figure 7. The convergence rates in the experimental identification is slower than that obtained in simulation due to the presence of environmental noise (e.g., floor friction and light intensity). Moreover, these elements may occasionally converge to different values because of the presence of spurious biased disturbances. Due to the noisy nature of the actual sensor readings, some of the events are generated more often than those in the simulation. While the simulation efforts serve to prove the concept, it is the real-time experimental data that provides the information to the DES supervisory control system.
[1] A. Doucet, S. Godsill, and C. Andrieu, On sequential monte carlo sampling methods for bayesian filtering, Statistics and Computing 10 (2000), no. 3, 197–208. [2] J. E. Hopcroft, R. Motwani and J. D. Ullman, Introduction to automata theory, languages, and computation, 2nd ed., Addison-Wesley, 2001. [3] J. Fu, A. Ray and C.M. Lagoa, Unconstrained optimal control of regular languages, IEEE Conference on Decision and Control, Las Vegas, Nevada (2002). [4] , Optimal control of regular languages with event disabling cost, Preprints American Control Conference, Boulder, Colorado (2003). [5] Jun S. Liu, Monte carlo strategies in scientific computing, Springer-Verlag, New York, 2001. [6] J. C. Martin, Introduction to languages and the theory of computation, 2nd ed., McGraw-Hill, 2001. [7] P. J. Ramadge and W. M. Wonham, Supervisory control of a class of discrete event processes, SIAM J. Control and Optimization 25 (1987), no. 1, 206–230. [8] A. Ray and S. Phoha, Signed real measure of regular languages, Proceedings of the International Federation of Automatic Control (IFAC) World Congress b’02, Barcelona, Spain, 2002 (also an expanded version to appear in Applied Mathematics Letters). [9] A.N. Shiryayev, Optimal stopping rules, Springer-Verlag, New York, 1978. [10] X. Wang and A. Ray, Signed real measure of regular languages, Proceedings of the American Control Conference (also an expanded version under review in Applied Mathematical Modelling) 5 (2002), 3937–3942.
p. 7