TRITA-MMK 2000:11 ISSN 1400-1179 ISRN KTH/MMK--00/11--SE
Timing Problems in Distributed RealTime Computer Control Systems
by Martin Sanfridson
DAMEK
Stockholm 2000
Technical Report Mechatronics Lab Department of Machine Design Royal Institute of Technology, KTH S-100 44 Stockholm Sweden
Mechatronics Lab Department of Machine Design Royal Institute of Technology S-100 44 Stockholm, Sweden
TRITA-MMK 2000:11 ISSN 1400-1179 ISRN KTH/MMK/R--00/11--SE Document type
Date
Technical Report
2000-05-09
Author(s)
Supervisor(s)
Martin Sanfridson
[email protected]
Jan Wikander, Martin Törngren Sponsor(s) NUTEK through the DICOSMOS project.
Title
Timing Problems in Distributed Real-Time Computer Control Systems Abstract
The timing problems are in this report defined as: control delay, choice of control period, jitter and transient error. The timing problems define a temporal interface between control engineering and computer engineering in the design of distributed real-time computer control for safety-critical motion systems. The control delay changes the dynamics of the closed loop and should be taken into account in the design process. The control period is one of the primary design parameters when implementing a digital controller. Jitter is time variation (often small compared to the control period) of a nominal constant control period or a nominal constant delay. A transient error is a bounded time interval during which the controller ceases to output the intended control signal. The timing problems can be attacked both from a control perspective and from a computer engineering perspective. The additional requirements on a distributed real-time computer control system are fault tolerance and flexibility from an end-user’s perspective. During the life-time of a system, it should enable redesign, upgrade and enhancement with small effort and without jeopardizing its predictability and robustness; this is defined as scalability. The goal for a system designer should be to build such a fault tolerant and flexible system. This report is a survey of research related to the timing problems. The selection of a sampling period should be governed by the dynamics of the closed loop system. There is no exact rule for this selection, thus the designer has some freedom when scheduling the controller together with other tasks. In case a set of control tasks is to be scheduled, it can be done in a way which optimizes the use of a scarce resource with regards to the dynamics of all control loops in the system. Multirate and non-uniform sampling are two ways to differentiate control periods to reduce the network traffic accordingly. The impact of control period jitter and delay jitter are analysed together with some countermeasures. The delay should always be as short as possible in a control loop. The end-to-end scheduling issue is highly relevant in distributed control and a few approaches to this area are treated. The analysis of the effects from a transient error is not extensively treated in the literature. The maximum deadline for a dynamic failure from a control point of view and fault tolerant scheduling and services are treated. Keywords
Language
real-time, distributed, control, feedback, automatic control, scheduling, end-to-end, control performance, control delay, sampling period, jitter, safety-critical, fault tolerance, transient error
English
Contents 1 Introduction 5 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Aim and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Notation and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Problem description 9 2.1 Basic models of a distributed computer control system . . . . . . . . . . . . . 9 2.1.1 Computer and control architecture . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Process model and linear feedback controller. . . . . . . . . . . . . . 10 2.2 Characteristics of timing problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Control delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Control period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Transient error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Task model and triggering strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 End-to-end scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Timing problems — analyses and solutions 21 3.1 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Control delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Analysis of delay and synthesis of the control algorithm . . . . . 22 3.2.2 Latency in the loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Control period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1 Loss of reachability and observability through sampling . . . . . 26 3.3.2 Choice of sampling period . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 Optimal choice of control period for several loops. . . . . . . . . . 29 3.3.4 Multirate, asynchronous and event triggered sampling schemes31 3.4 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 Delay jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.2 Control period jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Transient errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.1 Analysis of the impact of transient errors . . . . . . . . . . . . . . . . . 43 3.5.2 Reconfiguration of control algorithm . . . . . . . . . . . . . . . . . . . . 44 3.5.3 Fault tolerant scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Conclusions 47 5 Acknowledgement 49 6 References 50
3
“A process control computer performs one or more control and monitoring functions. ... Each of the tasks must be completed before some fixed time has elapsed following the request for it. Service within this span of time must be guaranteed, categorizing the environment as ‘hard-realtime’...” [Liu and Layland, 1973]
“Traditional realtime systems are informally dichotomized as ‘hard’ versus ‘soft’ primarily according to their ‘determinism’ or ‘predictability’... Because the traditional realtime viewpoint and its terminology is imprecise, oversimplified and unrealistic, it can — and does — limit the kinds of realtime systems that can be built, and the cost-effectiveness of those that are built.“ [Jensen, 1992]
1 Introduction The timing problems are here defined as: control delay, choice of control period, jitter and transient error. This report is a description of published material, in the research fields of control and computer engineering, which have relevance to the timing problems. The timing problems serve as the common denominator between the two research disciplines in what regards the temporal properties of a control application implemented on a real-time computer control system. The report is outlined as follows. The background to the research on timing problems and the aim and delimitations of the report are given in this introduction. In the following section timing problems are described in more detail. A simplified model of a distributed real-time computer control system will be presented in 2.1 and it will often be referred to in the sequel. The purpose of the model is to serve as a basic architecture in the study of time-related problems. The timing problems are very much related to computer hardware and software and the section also contains a short treatment on the most relevant characteristics of computer systems and control systems. Fault tolerance is often a vital requirement of a control system and is given a subsection of its own. Section 3 is the main chapter and describes various solutions and approaches found in the literature. The section is divided into one subsection for each timing problem: control delay, control period, jitter and transient error. The timing problems can be approached either from a control perspective or from a computer engineering perspective, or from both perspectives at the same time. The report is concluded with a summary. The reader is assumed to have a basic knowledge in linear control, sampled data systems, data communication and real-time computing. 5
1.1 Background A mechatronic system is a computer controlled mechanical device. Industrial robots, aircraft, road vehicles, autonomous walking robots and laser printers can all serve as examples of mechatronic systems. In the synthesis of mechanics and computers, the control engineering plays an important role. All modern control applications are implemented on computers. It is therefore necessary to develop a good understanding of the possibilities and problems introduced by the real-time computer system. In this report the focus is on problems related to the timely behaviour of the computer system. Modelling and analysis of real-time processing and communication will be covered. Modelling and analysis of control systems with respect to the timing problems will be covered too. Ideas and proposals found in the literature and aimed at improving the processing and communication of control applications will be described. It is here appropriate to enlighten the existence of the gap between the two fields. This gap serves as a useful interface between the two disciplines, because it isolates the problems and complexity in one field from the other field. In the analysis and design of controllers, control delay and jitter are often assumed to be negligible and it is further assumed that there is no transient error. The interface is rather simple: the sampling period is constant, the delays are small, jitter is negligible, deadlines are hard, and there is never any transient error, see e.g. [Wittenmark et al., 1995]. The digital control theory is actually evolved from these premises. It seems likely that a change of this agreed cut between the disciplines can lead to an improvement to the application as a whole, hopefully without making the design process too complex. The “timing problems” is another expression for this cut. A distributed system consists of nodes interconnected by links to form a network. The nodes are loosely coupled to each other. They do not share any memory but communicate via some channels by passing messages to each other. The topology of the network is arbitrary; there is a vast number of ways to interconnect nodes. There are different levels of distribution, from fully centralized to fully distributed [Wikander and Törngren, 1994]. A common and straightforward topology to use is a bus, which all the nodes in the system are directly connected to. A much more complicated structure can be composed if e.g. multiple networks, replication and gateways are used. The choice of topology or architecture for a distributed system affects e.g. the end-to-end transmission delay, the minimum period which is possible to assign to a task chain, the characteristics of jitter and the behaviour in case of a hardware error. An embedded computer control system is here defined as a system which interacts with its environment through sensors and actuators and is dedicated to perform certain tasks. At least one of the tasks is to control a dynamic system. It is not always obvious to the user that the application contains a processing element: the computer system and the other parts are seen as a single unit. The aim of an implementation does not, as a general rule, change over the life time of an application. The dedication to perform specific tasks leads in the design process to a computer system which is more tailored to fulfil a mission, compared to e.g. an office local area network. A real-time system is popularly defined in the real-time community as a system where the correctness of an outcome not only relies on the correctness of a computation, but also on the point in time it is delivered. A real-time system should in other words be able to interact with its environment in a predictive temporal fashion. A real-time sys6
tem does not necessarily have to react quickly on a stimuli, it only has to be sufficiently fast; CPU speed and bit rate are not all. A lot of the research in the real-time community deals with the problem of a limited bandwidth on a processing element. The ever raising ceiling of the processing capacity is a paradox: the “vacuum” is soon filled with functionalities resulting from renewed requirements on the system. A definition of a scheduling activity is: to assign tasks to limited resources in order to meet deadlines. With unlimited resources, scheduling becomes unnecessary, of course. A safety-critical system is defined as a system in which a failure can lead to a catastrophe, e.g. loss of human life or severe economical damage. Almost any vehicle today is an example of a safety-critical mechatronical embedded computer control system. For a safety-critical application, the computer system must maintain timely and dependable services in spite of failures, i.e. it must be fault tolerant. Throughout the design of safety-critical motion systems, the dependability must be the leading star. From a control engineering viewpoint, analytical redundancy can be applied to increase the fault tolerance. Thus, control engineering also plays a role in order to obtain a higher degree of fault tolerance.
1.2 Aim and scope The aim of the report is to investigate the timing problems. The method is to summarize work in the area of computer and control engineering with relevance to the timing problems. One of the aims of the author’s research so far has been to search for predictable, yet flexible computer system solutions for distributed control applications. The problem can be approached both from a computer engineering and a control engineering perspective. It is the author’s belief that a synthesis which in a stronger way involves both disciplines will lead to improvements in the design and performance of computer controlled systems. Both on-line and off-line issues are considered. An off-line issue is the specification of temporal properties when moving from a control design to the implementation. In the development of a real-time system a main goal is to find an implementation which satisfies the temporal requirements on the control application. The first step is to identify and quantify the temporal requirements. The aim for a system designer is to build a fault tolerant and predictable, yet flexible distributed computer control system. The flexibility is primarily meant for the end-user, not for the designer. Scalability is defined as the ability of a computer system to support changes to a system during its lifetime. In [Stankovic, 1996] two research challenges to build scalable open systems are mentioned: • To develop scheduling algorithms which give predictable functionality, timeliness and fault-tolerance. • To develop schemes to compose large predictable systems out of smaller predictable subsystems. The reuse of real-time components is a similar problem. To set the scope of this report, some delimitations are appropriate to point out. Although distributed control is in focus, local control loops on a uniprocessor exhibit similar kinds of timing problems and are thus not excluded from this survey. The relation between the timing problems and architecture will not be exclusively discussed in this report. It is however an intention here, that the creation of an architecture will be guided by intimate knowledge of the timing problems. An architectural work involves 7
a lot more than satisfying the temporal requirements of an application. A modelling of applications or computer systems, with the intention of putting forward a system wide design methodology, is also beyond the scope of this report. For work on modelling consult [Törngren, 1995], [Redell, 1998] and [Sandström, 1999]. It is an intention here that any such modelling effort will benefit from knowledge of the timing problems. Further, clock synchronization is not treated at all and numerical problems and the discretization of the analog signals are not of interest either.
1.3 Notation and terminology To simplify for the reader, the original notation in the referenced material has in many cases been adjusted to be consistent with table 1. Table 1. Notation used throughout the report. h
Sampling or control period.
τ, τc, τca, τsc
Any delay, Computational delay, Controller-to-Actuator delay and Sensor-to-Controller delay.
A, B, C, D
State space representation of a continuous time dynamic system. The matrices are assumed to have appropriate dimensions.
Φ, Γ, C
Do. discrete time.
Φs
Closed loops system matrix in discrete time.
Q1c, Q2c, Q12c
Weight matrices of appropriate dimensions and properties (non-negative definite or positive semi-definite) for setting up a continuous time quadratic cost function.
Q1, Q2, Q12
Do. for discrete time.
J
Cost index.
E
Expected value operator.
λ
Eigenvalue.
L
Linear feedback gain.
P
Variance.
A/D, D/A
Analog-digital-analog converters.
u(t), u(k), uk
Control output, time continuous or time discrete.
y(t), y(k), yk
Process output (=measured value).
x(t), x(k), xk
The state of a system.
v(t), v(k), vk, w(t), w(k), wk
State noise and measurement noise.
XA, UA
Allowable state space and allowable control output space.
T, P, c, a, d, p
Task T consisting of period, worst case execution time (WCET), arrival time, deadline and priority.
U
Utilization of a processing element.
ET, TT
Event triggered, time triggered.
The terminology in computer systems, especially scheduling, and in automatic control share many terms, but they mean different things. For example: system, process, control, period, state, response time, bandwidth, just to mention a few words. The confusion is hopefully dispersed when words are read in a context.
8
In some places when the difference between sampling period and control period is void, they have been used interchangeably.
2 Problem description This section starts by setting up a fairly basic model of a distributed control system. In section 2.2 the four timing problems are defined in detail. Thereafter some scheduling principles are treated in section 2.3. End-to-end scheduling is dealt with in section 2.4. Building blocks for fault tolerant systems are described in section 2.5.
2.1 Basic models of a distributed computer control system 2.1.1 Computer and control architecture A distributed embedded control system has at least one control loop closed over a set of communication channels. A typical distributed architecture for a control loop will here be drawn to serve as a basic model, see figure 1.
actuator hold
link
plant
sensor
Gprocess
sample
τca
τsc
hs
link
hc Gcontroller controller Figure 1. A model of a control loop seen from a control perspective. hs is the sampling period and hc is the control period.
The basic model consists of five processing elements, see figure 2: • Sensor node (task T1). The sensor node is time triggered. The period is denoted sampling period, hs. An analog sensor is usually sampled by a zero-order-hold circuit. Signal processing, anti-aliasing filtering, sensor monitoring and similar services can be hosted at the sensor node. It is rare that the model of a distributed system includes delay and jitter in the sensor node. • Link to the controller (task T2). The link can be time triggered or event triggered. It can be a dedicated link or a bus. Low level fault detection and error correction are usually the services provided by the communication. The link is often modelled to be the source of delay and jitter τsc, created by e.g. contention in the communications or by transient faults. This is a kind of disturbance model.
9
• Controller node (task T3). The controller can be time triggered or event triggered. The control law is computed once every control period, where in many cases h c = h s . The controller node often runs other tasks, which can be modelled as sources of delay and jitter, i.e. a disturbance model. • Link to the actuator (task T4). This is similar to the link from the sensor to the controller. The difference is that information of time delay and jitter τca lies in the future, seen from a perspective of calculating the control law. • Actuator node (task T5). The actuator can be time triggered or event triggered. It is rare that the model of a distributed system adds delay and jitter in the actuator node, but delay and jitter is often inherited from a preceding task.
T1
T3
T5
controller
sensor
actuator
T2
T4
Figure 2. A model of a control loop seen from a scheduling perspective. The data flow from task T1 to T5 makes up a trivial task chain.
This basic model of a distributed control system is easily complexified e.g. by extending it with more nodes or adding dependencies across control loops. The complexity of this model is however enough to study the timing problems. The basic reference model will be used on several occasions later in this report. Figure 1 and figure 2 visualize a reference model for studying temporal behaviour from both a computer and a control engineering perspective: • Computer engineering perspective: A set of tasks {T1..T5} form a task chain, which is to be scheduled together with other tasks. The execution of the task chain is usually periodic and it usually has a subset of the following task attributes: period, relative release time, relative deadline, worst case execution time and priority. The period of each task is usually the same for the whole chain. The performance of the resulting system is measured in temporal properties by e.g. the number of missed deadlines or the maximum jitter for any task. • Control engineering perspective: A controller Gcontroller. for the process Gprocess and with the delays τsc and τca associated with the communication links, is to be designed or analysed. In this basic model the dynamics of the actuator and sensor can be incorporated into Gprocess and the computational delay τc of the controller can be included into τsc or τca. The performance of the control system is ultimately measured relative to the specifications given in continuous time. 2.1.2 Process model and linear feedback controller A process model and a brief description of optimal control are covered here. They will be referred to on several occasions in the sequel. The state space description of a continuous system with a discrete controller can be written:
10
x˙( t ) = Ax ( t ) + Bu ( t ) + v ( t ) y ( t ) = Cx ( t ) + w ( t ) u ( kh ) = – Lxˆ ( kh )
(1)
v(t) and w(t) are noise processes, L is a linear gain vector, xˆ ( kh ) is the estimated state and u(kh) is the sampled version of the control signal u(t). A, B, C and L are in most cases not time varying. The direct term for y(t) has been omitted here. Sampling the open loop system gives (see table 1 for the notation): x ( kh + h ) = Φ ( h )x ( kh ) + Γu ( kh ) y ( kh ) = Cx ( kh )
(2)
where the noise terms have been omitted, and with Φ ( t k + 1, t k ) = e
A(tk + 1 – tk )
(3)
( t k + 1 – t k ) As
∫0
Γ ( t k + 1, t k ) =
e dsB
The sampled system in (2) is not strictly time-invariant and does not describe the continuous time system (1) between the sampling instants, see also section 3.3.1. A closed loop system matrix Φs can be written Φ s ( t k + 1, t k ) = Φ ( t k + 1, t k ) – LΓ ( t k + 1, t k ) Φ s ( h ) = Φ ( h ) – LΓ ( h )
with
or
h = tk + 1 – tk
(4)
Optimal control is popular in the control community to derive the linear gain vector L. L is found by minimizing a loss function J, which is a performance index for a controller: the lower the better and when J → ∞ it means that the closed loop system becomes unstable. A commonly used and general loss function is found in e.g. [Åström and Wittenmark, 1997] (see table 1 for the notation): J C = E ∫ ( x ( t )Q 1c x ( t ) + 2x ( t )Q 12c u ( t ) + u ( t )Q 2c u ( t ) ) dt 0 Nh
T
T
T
(5)
T + x ( Nh )Q 0c x ( Nh )
for continuous time and with a unique conversion: N–1
JD
T T T = E ∑ x ( kh )Q 1 x ( kh ) + 2x ( kh )Q 12 u ( kh ) + u ( kh )Q 2 u ( kh ) k = 0
N–1 x k T + x ( Nh )Q 0 x ( Nh ) = E x N Q N x N + E ∑ u k=0 k T
T
Q
(6)
xk uk
for discrete time control design, where QN and Q are defined by the other weight matrices. The minimum value of the loss function is given by: T
min J = x ( 0 )P ( 0 )x ( 0 )
(7)
Those equations are general and are sometimes modified. The statement of the loss function defines what is optimal. A steady state variance P(k) is found by solving a Riccati equation. This gives both the minimum variance and the optimal feedback gain, 11
L. The use of linear feedback gain control design is more general than it appears at first. By constructing the state space equation appropriately it can be shown that a feedback gain is similar to e.g. a PID controller common in commercial applications. Optimal control is often used in the academical world, hence this will be encountered several times in the following. A typical compound loss function to optimize, found in the literature, is a sum of n number of weighted wi loss functions, Ji: n
J tot =
∑ wi J i
(8)
i=1
where the minimum of Jtot is searched for. The complexity of this search is generally more complex than the calculation of Ji. It can be wise to avoid a weight vector for three reasons: it is subjective, it is difficult to relate weights to cost and components of the weight vector might not be mutually independent.
2.2 Characteristics of timing problems As mentioned before, the timing problems can be divided into control delay, jitter and transient error as in [Wittenmark et al., 1995]. This enumeration of time-related properties is completed by adding the choice of control period. The definition of timing problems used here incorporates four temporal properties of a computer system, as seen from an automatic control point of view. The report is structured according to this division, but other ways to group the timing problems are possible, e.g. into three groups: control delay and its jitter, control period and its jitter, and transient error. Timing problems in distributed control are characterized in e.g. [Ray, 1989], [Törngren, 1995] and [Törngren, 1998]. control period control delay transient error jitter set point
Controller
D/A
Process
process value
A/D
Figure 3. The timing problems seen as disturbances acting on the controller affecting the temporal properties of the closed loop.
Timing problems can be seen as disturbances acting on the computer system, which in turn affect the whole system, see figure 3. This viewpoint is however not particularly fruitful and will not be developed any further here. Performance, robustness, controllability, observability and stability of the closed loop depend, among other things, on the control delay, the control period, jitter and transient error. This motivates the study of the timing problems. The delay in a loop can also be in the process, a typical example is dead time caused by mass transportation. Each of the four timing problems will be defined in section 2.2.1 to 2.2.4. The impact of each item on the controlled system will also be indicated. In section 3 analysis and approaches to the timing problems found in the literature are recited.
12
2.2.1 Control delay Control delay is the time delay between a sampling instant and its corresponding actuator response. The control delay τ c ≥ 0 comes from e.g. the execution delay of the control algorithm, from the A/D-D/A conversions and from the communication delay in the network, see figure 4. The sampling of y(t) and the subsequent output of u(t) to the process can be written: y ( kh ) = y ( t k ), t k = kh s
(9)
u ( t ) = u ( t m ), t m = mh c + τ c
where k, m = 0, 1, 2, … and τ c ≥ 0 . The sampling is uniform if k = m and with equal and constant periods, h s = h c = h . It must hold that t m ≥ t k, ∀( k, m ) for related instances. The control delay can vary from one sample to the next, i.e. there is jitter in the delay, see further section 2.2.3. hs
Control system
A/D
y(t)
y(tk)
τc
hc D/A
u(tk)
u(t)
Figure 4. Control delay τc in a sampled data system.
The constant delay should be as small as possible because of the decrease in phase associated with a delay. A decrease in phase deteriorates in turn the stability margin and should thus always be avoided. This is a rather trivial statement, but not so readily handled from a computer point of view, see further in section 3.2. 2.2.2 Control period The control period, the rate of actions, is from a control engineering perspective determined based on the dynamics of the closed loop. In the control design of the closed loop, the open loop dynamics and the disturbance dynamics are taken into account. The choice of control period can e.g. be guided by a rule of thumb: for a first order process the number of sampling instances per rise time should be between 4 and 10, [Åström and Wittenmark, 1997] When moving from control system design to implementation in a computer system the choice of control period hc becomes the choice of task period Pi. With a liberal choice of hc, exemplified by the rule of thumb above, the trivial upper limit on the utilization U, U =
ci
∑ ---Pi i
U ≤ U lim = 1 of a processing unit, given the execution times ci, becomes easier to satisfy.
13
(10)
In a multirate system the sampling hs and the control period hc in a single control loop (as in figure 4) do not have to be equal, i.e. uniform. The case with more than one sampling rate in a system is usually divided into two categories: 1) the quotient between the longest and the smallest periods is a strictly positive natural number, and 2) the quotient between the longest and the smallest periods is a strictly positive rational number. For a system with many sampled signals this definition is more useful than in the case of dual rate sampling, which is a special case of multirate sampling. The differentiation of periods can facilitate a computer implementation initially hampered by the limit in (10). The control period should always be as short as possible to improve control performance. This rule is the common belief, but it is not perfectly true, as will be seen in section 3.3. 2.2.3 Jitter Jitter is defined in [IEEE dictionary, 1992] as “time-related, abrupt, spurious variations in the duration of any specified related interval”, and arises e.g. due to clock drift, branching in the code, the scheduling algorithm or the use of specific hardware (e.g. cache memory). Jitter is often an unintentional result for example when scheduling a set of tasks. It is possible to talk about different kinds of jitter depending on the context; in scheduling there are e.g. input jitter, output jitter and queuing jitter. Here the type of jitter is related to control: • control delay jitter • control period jitter Jitter on a microscopic time scale (i.e. variations considerably less than the sampling period) is not of interest here. In a properly designed computer control system, timevariations are not by default stochastic in nature on a macroscopic level, [Törngren, 1998]. Jitter is nevertheless often modelled as a stochastic process. The control delay τ c ≥ 0 influenced by jitter can e.g. be written as: τc = τ f + τk where the varying part of the time delay τk, typically is modelled as a stochastic process, k = 1, 2, … and τ f ≥ 0 is the constant part of the delay. The period P = t k – t k – 1 > 0 influenced by jitter can be written in two ways, clock synchronized, t k = kh + τ k
(11)
t k = k ( h + τ k' ) = t k – 1 + h + τ k''
(12)
or drifting,
where tk is a sampling instant and tk-1 is the previous one, and τ k , τ k' , τ k'' are three different stochastic processes. The magnitude of τ k'' is typically small compared to the nominal period h. The drift of P is easily seen in the recursive formulation in (12), which looks like a convenient way to implement the advance of time in a piece of code. In the synchronization of periodic tasks a producer which delivers a little late because of jitter can cause a time triggered consumer to miss the delivery of the data. This missynchronization is called vacant sampling and amplifies the negative impact of jitter 14
even if it is has a small magnitude compared to the period. If the consumer does not fetch a message before the producer delivers a new message, the old data is usually overwritten in case there is no buffering. This is called sample rejection. Vacant sampling and sample rejection are harmful to all computer systems, not only control systems. From a control point of view, jitter degrades performance and causes instability even for the case when vacant sampling or sample rejection does not occur. The jitter distorts a signal in a control system. The degradation depends very much on the dynamics of the process and on the type of jitter. Jitter should always be as small as possible not to decrease control performance, but as will be seen in section 3.4 there are conditions under which jitter has a stabilizing effect. 2.2.4 Transient error A timing error caused by a fault with a transient duration is here called a transient error. A transient error can occur if a message on a network is lost or corrupted, e.g. due to an electromagnetic disturbance. As a result from a transient error it can be that, e.g. information get lost, the control delay increases, the global state becomes inconsistent or a calculated output differs from correct value due to the resulting vacant sampling or sample rejection. A transient error can of course persist for more than one control period. If the (sub)system does not recover quickly from the transient failure, sequential samples become vacant. The control loop is effectively run in open loop and it depends on e.g. the dynamics, the current state, the control actions and disturbances in the near future, how long time it is possible to avoid a system failure. A system failure can be defined as instability due to loss of control output or if the state of the process leaves the allowable state space and the final target can not be reached. Redundancy in space or time is applied to increase the fault tolerance of a system. Much effort has been devoted to construct fault tolerant real-time computer systems for safety-critical applications. Less effort has been spent on the control side, see section 3.5. Any transient error should of course be avoided if possible. However, if the drift of the state during a failure is small, the loss of control signal might not be noticed seen from the perspective of the process being controlled.
2.3 Task model and triggering strategy Following this introducing description of task models and their attributes, scheduling of distributed systems will be treated in the next section (section 2.4). See e.g. [Krishna and Shin, 1997] for an extensive treatment of many kinds of scheduling algorithms for real-time systems. A task model is always the starting point for a scheduling algorithm. The task model should match the characteristics and requirements of an application. It is convenient to have more than one task model in a system to reflect the nature of the application better, e.g. one periodic and one sporadic model. The similarity between messages and tasks, as far as scheduling concerns, allows the term task model to encompass a model for messages too. A task model for a typical control system is periodic: 15
T i = { P, c, r, d, p }
(13)
where P is the period, c is the worst case execution time, r is the relative release time, d is the relative deadline and p is the priority. An aperiodic task model has similar attributes as the periodic, but the meaning of the period P changes to be the lower bounded interarrival time, and the relative release time is not defined. The use of a lower bounded invocation interval transforms an aperiodic task to a sporadic one. The task model for static (off-line) scheduling is also similar to (13) but the priority is not used. What kinds of tasks and messages are there in a distributed control system? Measurement, control output and setpoint are inevitable. The measurement is almost always periodic; very few sensors for control are event driven. The control output is usually periodic, but event based sampling described in section 3.3.4 is an exception to this. The setpoint is periodic in servo control and trajectory following but aperiodic otherwise. There are several other tasks and messages in a control system which are vital for the operation, maintenance, and dependability, e.g. mode changes (aperiodic), logging of data (periodic), events (aperiodic) and information to be presented to the operator (both periodic and aperiodic). What kinds of data are there? A controller can be implemented with absolute values or incremental variables. This has some implications if a message is lost: incremental values will be lost for ever, but an absolute value will be corrected at the next period. Measurement, control output and setpoint are often real typed values. Jitter or vacant sampling of a real valued periodic message can be seen as a distortion and will generally not cause any errors in logic computation. On the other hand, a vacant sampling of, for example, a (periodic) mode change message, which data is of a binary typed value, is more likely to lead to a failure. The most inevitable tasks (measurement, control output and setpoint) from a control point of view, are not necessarily the most time critical or the ones most sensitive to vacant sampling. The worst case execution time, c or WCET, is necessary to have in the task model to ensure schedulability. c is hard to estimate because it depends on e.g. branching in the code and on hardware, such as cache memory. The execution time e of a control algorithm is generally invariable from one execution instance to the next ( e ≈ constant ), for the most common control algorithms, e.g. PID or linear feedback gain. The application is not well behaved if c » e because it generally makes scheduling with hard deadlines resource inefficient. For some control algorithms, the execution time can be varying. Algorithms for observers and adaptive controllers involve updating parameters of a process model. In case the process is changing slowly, a single update does not add any new significant piece of information. Hence, an update can be skipped. A model predictive controller, MPC, has the characteristic that the more computation time it gets, the more accurate the result will be. The release time can be defined as the point in time when data necessary for the execution of a task is ready. The deadline is defined as the point in time when the execution should (soft deadline) or must (hard deadline) be finished. Usually it holds that r < P, d < P, r + c < d for a practical control application. The attributes release time and deadline can be used to control jitter, see [Klein et al., 1993] and section 3.4.1. The more the interval between release time and deadline is narrowed, the less jitter in the finishing of a task (output jitter). The drawback is that it becomes harder to find a feasible schedule when the allowable time window is narrowed. The use of a hard deadline in the task model can be questioned: if a vacant sample does not lead to a catastrophe
16
(possible due to the well-behaved nature of a dynamic system) why make scheduling less feasible by adopting hard deadlines? A task in the task chain, see figure 2, can be time triggered (TT) or event triggered (ET). Assume here for simplicity that there is a global clock, i.e. all nodes have the same perception of the current time. If two successive tasks are TT then a skew is typically inserted for the result or data to be delivered by the producer before the successive consumer task is released. The skew is a delay which renders a phase lag. An example of node synchronization policy of the task chain is: TT-ET-TT-ET-TT, meaning that the sensor is time triggered, the controller is time triggered, the actuator is time triggered and the links are ET. The sequence TT-ET-ET-ET-ET does not have two consecutive TT and needs no skew, hence this could (it depends on other scheduled tasks) represent the triggering policy with the shortest possible end-to-end delay. The TT sensor generates all subsequent activities in the task chain. A system with time triggered nodes interconnected by a TDMA network have the synchronization policy TT-TT-TTTT-TT. The notion of priority is used in scheduling algorithms based on event triggering, for the scheduler to select which task to send to the dispatcher. The argument between event triggered (ET) and time triggered (TT) scheduling paradigms will here be represented by two fundamentally disparate papers. In [Jensen, 1992] the deadline attribute is questioned, the terminology is alleged to be imprecise and oversimplified and does not model the need of an asynchronous decentralized computer system for mission management adequately. A hard deadline does not equal deterministic computation. Modern decentralized mission critical system are too complex to be programmed and analysed in a deterministic way. In a priority based scheduling strategy the use of priority makes the operating system kernel simpler, but the priority says less than timing constraints. A so called task value function or benefit function in figure 5 can be used instead. benefit
benefit
time
time
Figure 5. Left: A general benefit function. Right: A hard deadline expressed as a benefit function. The dashed vertical line is a traditional deadline.
The shape of the benefit function is left to the designer’s discretion. The on-line scheduler considers all benefit functions in the nearest future. In case of dynamic tasks, realtime performance guarantees are impossible, but some work has been devoted to get a higher so called competitive factor if some task parameters are known to the scheduler. It is argued that urgency is orthogonal to importance, i.e. the need for timeliness is independent of the functional criticality. Importance is modelled as a vertical shift of the benefit function, without changing its shape (which is a model of its urgency). The time triggered approach in embedded distributed control is represented by the time triggered architecture, TTA, [Kopetz and Grünsteidl, 1992] and [Kopetz, Nossal et al., 1997]. TTA represents a very strict and detailed approach to all the timing problems; the requirements from the application are mapped into hard deadlines. The primary goal of TTA is fault tolerance in embedded systems. The distributed synchronization
17
protocol is called TTP, the time triggered protocol. The time triggered protocol is used to solve the issues: predictable message transmission, atomic broadcast, fast crashdetection, clock synchronization, membership service, blackout handling and replication of nodes. Static schedules (TADL, task description list) execute on every node and static schedules (MEDL, message description list) drive the interconnecting bus from each node in a TDMA fashion. The physical border between local (node) and global (network) is implemented using dual-port memory. Fault tolerant scheduling is further described in section 3.5.3. The connection between the two viewpoints is to some extent clarified in [Audsley et al., 1993], which analyses fixed priority scheduling extended by offset constraints. The introduction of offsets between e.g. a producer and a consumer means that the critical time instant (which is a basic assumption in the analysis of FPS) may never occur. The response time for an FPS algorithm extended with offsets is derived and subsequently called the optimal priority ordering algorithm. For this, the rate monotonic (RM) and deadline monotonic (DM) are not optimal and schedulability tests become pessimistic. A task is further divided into subtasks, all with the same period and each with an offset less than the period. This forms a task chain. It is argued that when many tasks are scheduled using offset constraints, the fixed priority schedule changes to a static schedule. The introduction of offsets in a task chain is not uncommon in scheduling papers.
2.4 End-to-end scheduling When scheduling a distributed system, the scheduling of tasks on nodes and the scheduling of messages on the interconnecting network links must be considered simultaneously. This holistic view is usually adopted in the end-to-end scheduling problem. In the end-to-end scheduling problem, one concern is the response time of a task chain from T1 to T5, see figure 2. To only look at the end-to-end delay from T1 to T5 via T3 is however not really adequate for the basic model in section 2.1. If e.g. the transmission delays in tasks T2 and T4 are varying, the calculation of the control output will in general not be correct. A countermeasure is at least to use information of τsc in the control algorithm. There are a multitude of global scheduling algorithms, e.g. heuristic methods, branch-and-bound, simulated annealing, genetic algorithms, tabu search, constraint programming, see [Redell, 1998] for an overview from an aspect of distributed control. End-to-end scheduling in distributed systems is treated from different perspectives in [Bettati and Liu, 1992]. A typical flow-shop job or a task chain has five subtasks as in figure 2: sensor, the link to the controller, the controller, the link to the actuator and finally the actuator. The task model is as follows: a set of task chains is to be scheduled on m number of processors. Every task in the chain has the same period. It is stated that as long as the end-to-end constraints are met, the intermediate deadlines for the subtasks do not matter. The complexity is very high for this kind of scheduling algorithm: NP-completeness characterizes almost all flow-shop scheduling algorithms, but there exist polynomial time algorithms for scheduling tasks with identical execution times on one and two processors. If the constraint with equal execution time is released, the complexity increases. The dependability constraints between periodic task chains make complexity high too, and there is no known schedulability test in this case. It is thus argued that processors must be scheduled independently in order to come to a solution and a scheduling method for periodic task chains is proposed. The processors are scheduled by a rate-monotonic algorithm. A subtask waits until its predecessor (on 18
another processor) surely has finished. The precedence dependability is in other words substituted by the introduction of delays into the system. An approach to the problem of distributed real-time design which avoids intermediate deadlines is found in [Kang et al., 1997]. A set of task chains with requirements only on the end-to-end performance is used to model the system. A task in a task chain can use a processor or a network connection. Every task in the chain has the same period. The real-time requirement of a task chain is that the maximum propagation end-to-end delay must not be violated and at the same time the average throughput should be sufficient. If the end-to-end delay requirement is not met, the execution is terminated and the result is not used (buffering is not used). The end-to-end failure rate is bounded by a minimum allowed number of successfully executed instances of a task chain per time unit. The execution time of a task is modelled by a probability distribution. The scheduling of the distributed system is performed in three steps: 1) partitioning, 2) selecting period for each chain and 3) verifying the behaviour by simulation. Some details of the heuristic approach are worth mentioning. The solution is limited to discrete time domain analysis for the throughput. The blocking time of a chain is modelled by a Markov chain where the states correspond to discrete time delays and where the processing time is the random variable. In the schedulability analysis, the success probability of a task is of interest to estimate. In [Choi, 1998] the scheduling of end-to-end deadlines is addressed by linear discrete programming. A set of task chains were tasks and messages are partitioned on processors and network resources is given. End-to-end deadlines with jitter constraints are specified for every task chain. The problem is nonlinear, but is transformed, with some additional assumptions, to a linear programming problem, for which e.g. the simplex algorithm can be applied. The technique is called separate programming. A task chain is characterized by a period, end-to-end deadline with a jitter constraint. A task within a task chain is characterized by a maximum execution time and a local deadline. Jitter is coped with by buffering. The objective of the proposed method is to find feasible optimal local deadlines for each task. In an example, nodes are scheduled by an FPS algorithm and the token ring network is scheduled by EDF. A value for the TTRT for the token ring is derived. It is shown that if strict convexity can be assumed for the linearized functions, linear programming can be applied. The complexity is related to the coarseness and the gradual narrowing of the so called grid space (resolution) for the linear programming. The simultaneous allocation of tasks and messages by the well known simulated annealing technique to find a feasible schedule is treated in [Tindell et al., 1992]. The nodes are scheduled by the rate monotonic algorithm and the network is a token ring for which the worst case response time can be calculated. The allocation of tasks to processors is random in each step of the algorithm. Infeasible allocations must be reflected in a higher energy value, i.e. for tasks allocated to wrong processors, for task replicas allocated to the same processor, for processor utilization U > 1 , and for tasks which do not meet their deadlines. The total energy is a linear combination of these punishments. As the temperature decreases the solutions rendering lower energy values will survive. It might happen that an infeasible solution renders a lower energy than a feasible solution; exception rules are then applied to prevent that an infeasible solution is selected.
19
2.5 Fault tolerance Fault tolerance must be provided by a safety-critical distributed system. A safety-critical computer system is defined as not relying on any mechanical, pneumatical or similar backup; if a safety-critical application can fall back on e.g. mechanical backup when the computer fails, the computer does not need to be fail-operational. This can also be expressed by stating that the application is not allowed to be fail-safe for the first hardware failure. To accomplish fault tolerance means to continue operating on a limited subset of hardware, thus redundancy must be provided. There are three types of redundancy: hardware, time and information. A hardware fault is a physical defect which causes the component to malfunction, [Krishna and Shin, 1997]. An important property is graceful degradation, i.e. the performance of the application does not drop suddenly, but gentle, for an increasing number of subsystem failures. When a fault occurs it can be manifested by an error after a fault latency time. The error can in its turn cause a failure after an error latency time. The time from the detection of an error or a failure until the recovery has been finished is important for a realtime system. The interesting questions are, 1) how quickly must the computer system recover the application to an operational consistent state and 2) what services are needed for this purpose? An error should be detected as early as possible. Error detection is a vital function in a fault tolerant system: if an error can be detected before the fault causes a failure, it is possible that the failure can be avoided or the damage of the failure decreased. In the backward error recovery method the system is rolled back to a consistent state and the result is re-computed from this point. The state must be saved to stable memory periodically; this is called check pointing. The backward recovery takes time to perform and there is no guarantee that the saved state is free from errors. In forward error recovery, knowledge of the application and the type of error is used to correct the failure. It is necessary to model possible failures, because assumptions for a recovery will be based on such models. One part of the model is the temporal classification: a fault is either transient, intermittent or permanent. Another dimension is the nature of the errors, i.e. byzantine or non-malicious behaviour. A third classification is the correlation between failures. Correlated failures are more difficult to deal with than independent failures. A commonly imposed limitation in the model is to allow only single faults. A further limitation is to only consider hardware failures when the hardware is correctly designed and that the software is assumed not to be able to fail. This assumption of not having any design errors is not however very realistic.
20
Fault tolerant services
{
Continued service under design faults Continued service under node failures Consistency under node failures
Basic building blocks
Fault tolerant software resiliency, { Process Data resiliency actions, { Atomic Consistency state recovery
{
Reliable, atomic and causal broadcast, Fail-stop processors, Stable storage, Byzantine agreement, Synchronized clocks, Failure detection, Fault diagnosis, Reliable message delivery
Figure 6. Levels in fault tolerant distributed system according to [Jalote, 1994]
Basic building blocks for fault tolerant systems are identified in [Jalote, 1994] as: byzantine agreement, synchronized clocks, stable storage, fail-stop processors, failure detection and fault diagnosis, reliable message delivery and reliable- atomic- and causal broadcast, see figure 6 . Upon these building blocks a higher level hierarchy of services is built for recovering a consistent state, for atomic actions, for data replication and for resilience. These techniques are discussed primarily for database systems and are not all automatically applicable to a real-time control application. A time triggered system has some advantages over an event triggered system when it comes to fault tolerance, [Bridal, 1997]. A more deterministic system leads to a higher error detection coverage (the probability that an error is found) and can use simpler detection mechanisms. Replica determinism in real-time is a lot simpler to apply in a time triggered off-line scheduled system. The number of possible states of the system is less and verification and testing can thus more successfully reduce the number of latent faults. The static scheduling enables temporal encapsulation, in that tasks are isolated from each others by execution slots.
3 Timing problems — analyses and solutions 3.1 Approaches The timing problems can be addressed from a computer or a control engineering point of view. A few papers in the following treatment combine not only the knowledge of the problems in the other field, but also the possibilities of both fields. A large system tends to be labelled a complex system. There are basically two conceptual directions here: the distributed system can be regarded as stochastic, i.e. too complex to be treated as deterministic [Jensen, 1992] [Stankovic, 1996], or it can be treated as being decomposable into smaller, simpler and deterministic units, [Kopetz and Grünsteidl, 1992] [Kopetz, Nossal et al., 1997] [Rostamzadeh, 1995]. This is coupled to the view of how static the environment is: the starting point, i.e. the requirements of the application, is addressed differently. Many scheduling papers focus on how to obtain a maximum utilization of limited resources (and ensure stability of the scheduling algorithm in overload situations). It is 21
questionable if this focus in utilization always is relevant. Scheduling papers relevant and interesting for control purpose are among others: in [Liu et al., 1991] a number of schemes for imprecise computation are investigated, in [Audsley et al., 1993] fixed priority scheduling with offset to fulfil constraints is analysed, the slack stealing algorithm [Thuel and Lehoczky, 1994] runs an acceptance test on each arriving aperiodic task, and scheduling of benefit functions (IRIS) in [Krishna and Shin, 1997]. A popular control performance index is the quadratic loss function. It can be derived from physical properties [Seto et al., 1996], or it can be a weighted sum of squared signals [Kim, 1998], or a sum of squared control errors [Cervin, 1999]. In [Ryu et al., 1997] control performance is specified by rise time, maximum overshoot, settling time and steady state error, and any period and latency that satisfy these requirements are good for a feasible schedule. The deviation of poles from their nominal locations has also been tried as a performance index, [Shin and Cui, 1995].
3.2 Control delay If the delays τsc and τca, in the basic model in figure 1 are constant, they can be added in the control design and analysis. If the delays vary it is however not correct to add them. From a control point of view, performance and stability are adversely affected by a long (constant) delay. This is due to the phase lag introduced by a delay. When implementing the control law, the designer should not introduce unnecessary delay. As a rule of thumb the actuation should follow immediately after a sampling instant, there should not be a delay e.g. equal to the sampling period, h, in between. There are two reasons for adding a time delay purposely: • A delay can be introduced to decrease the amount of jitter; the jitter is cancelled by the buffering. The introduction of this time delay also decreases the probability of vacant sampling. • A skew between a producer and a consumer task is often seen in the scheduling theory. Delays are added to substitute the precedence and exclusion constraints to simplify scheduling algorithms, cf. the introduction of offsets in section 2.3. In the following subsections the compensation of a constant delay from a control viewpoint and the scheduling to decrease the latency in the task chain are described. 3.2.1 Analysis of delay and synthesis of the control algorithm A time delay can be taken into account in the control design and analysis. In e.g. [Åström and Wittenmark, 1997] it is shown how the state of a sampled system (2) with constant delay is augmented with old control signals. The augmented system can be analysed and control synthesis can be performed exactly as for a non-delayed system. From an analytical point of view, a drawback is that the size of the system matrices (Φ, Γ) increase with the time delay. In [Lennartson, 1985] it is pointed out that zeros in the left half z-plane originating from fractional time delay should not be compensated for (see also section 3.3). It is found that the level of the average output variance is increased when the constant time delay is increased. It is also found that the state variance is less sensitive to the control period when the time delay is longer; a time delay gives a more open loop behaviour.
22
3.2.2 Latency in the loop An approach to investigate the influence by constant control delay and vacant sampling in the continuous time domain is made in [Shin and Cui, 1995]. The control delay is represented by a time delay in series with the controlled plant. The closed loop system is represented by: x˙( t, τ ) = f ( t, x ( t, τ ), τ ) 0d
The cost function J is a suitable continuous non-decreasing function of the response time, cf. the idea of a benefit function exemplified in figure 5. The trade-off between output quality and computation time is discussed in [Liu et al., 1991], which calls this imprecise computation. There are three approaches to construct an imprecise computation and these depend on the application. If the quality of a task is monotonically non-decreasing as a function of the execution time, the result can be recorded at an appropriate interval and finally accepted when a desired quality is achieved. This is called the milestone method. Another approach is to skip computation steps to save time. The third approach, which is claimed to work almost always if the two former do not, is the multiple version method: A primary version computes the precise result but it is allowed to have a longer execution time than the alternate version. In case of overload the alternate version is executed. A generalised imprecise model is then described which encompasses the imprecise computation models. The task model has the following attributes: release time, deadline, execution time and weight (tasks are not assumed to be periodical). A task is divided into two subtasks: one mandatory which is responsible for producing a result and one optional which refines the result obtained by the mandatory task. The optional task can be completed at any time. If both tasks complete this is called precise computation, but if the optional task is terminated prematurely this is called imprecise computation. If all tasks are mandatory this will result in a task model with a hard deadline. Release times and deadlines are modified to handle precedence constraints. A mandatory task has a higher priority than any optional task. A natural approach to improve a schedule, i.e. to minimize response time and jitter, is to use a quadratic loss function for the control loop as a performance index and iterate to a solution. In [Kim, 1998] a performance index as a function of control period and feedback latency is calculated. An exponential function, which depends on control period and feedback latency, is then fitted to the index. A weighted sum of feedback latencies is used as a system-wide performance index to schedule a set of tasks. An iterative search approach which extends the period calibration method [Gerber et al., 1995] in section 3.3.3, is outlined in [Ryu et al., 1997]. The control performance is specified with the characteristic parameters of a step response: steady state error, rise time, overshoot and settling time. A control loop is divided into a suitable number of subtasks, e.g. sampling, computation and actuation. Given a model of the dynamic process, effective characteristic parameters can be calculated as a function of period and latency. With an initial guess of the period, the latency is computed for a task chain, see section 3.3.3. The next step is to apply the period calibration: the relative deadline is computed and a schedule is computed with some suitable optimization algorithm. If no solution is found the iteration continues with a new guess of control period. Fixed priority scheduling is compared with static cyclic scheduling, regarding the response time of time and event triggered execution in [Lönn and Axelsson, 1999]. Control delay and its jitter are given algebraically for a number of possible combinations of node and network configurations. Worst and best case delays are the concern. The tasks are assumed to be independent of each other and a task switch takes a negligible time. A control loop is closed over the network. A time triggered sensor node sends measurement values to a time triggered combined control and actuator node which in turn calculates the control output. The control period is the same for the whole task chain. The actuation task is scheduled independently of the computation
24
task with the aid of an offset; this cancels jitter introduced by the computation task. The sampling and actuation tasks are assumed to have constant execution times. The response time depends on release jitter, worst case execution time, blocking (by lower priority tasks) and interference (by higher priority tasks). The control delay and its jitter are derived for the following choices: the communication network is either fixed priority scheduled or statically scheduled, the processors are either fixed priority scheduled or statically scheduled and there is either a global clock or not. In case of a global clock, the offset is counted from the start of the sampling task and does not inherit jitter as is the case when a global time triggered schedule cannot be made up. It is concluded that if the timing requirements are strict, a static cyclic schedule will be preferable. In [Kim and Shin, 1997] integrated real-time control and scheduling is governed by a performance index which is a weighted sum of control latency, command latency and monitoring latency. A branch-and-bound algorithm is applied to solve the NP-hard problem of allocating and scheduling a multi-processor system. It is argued that a graph-theoretic approach is not good for real-time constraints, and integer programming can not handle precedence constraints of tasks. The slack stealing algorithm [Thuel and Lehoczky, 1994] schedules two sets of preemptive tasks: 1) one periodic task model represented by a period time, a worst case execution time, an arrival time, a priority and a hard deadline, and 2) one aperiodic task model represented by a processing time, a priority level and a hard deadline. The set of periodic tasks is scheduled off-line. When an aperiodic task request arrives the scheduler decides if the request will be accepted. A rejected task is not queued. Slack comes from two sources: 1) what is left over when the periodic task set is scheduled and 2) reclaimed unused worst case execution time. The slack value associated with an accepted task shows how much extra time can be used by the other tasks and still meeting the deadlines. Slack values can be computed off-line and stored in a table. It is argued that there is no optimal choice of priority level. Based on the slack stealing algorithm an incremental rate monotonic scheduling algorithm is built in [Binns, 1997]. Instead of using the slack to accept aperiodic tasks, either design-to-time processes (a longer version of a task) or incremental processes (more execution time appended) are executed. This fits with the notion of imprecise scheduling by [Liu et al., 1991].
3.3 Control period The control period or sampling period is the most fundamental design parameter to choose when moving from a continuous time description to a computer implementation. A shorter control period means in general, but not always, improved control performance. A trivial lower limit (10) of the choice of period is set by the limiting CPU speed or the bit rate. A problem with short periods and high utilization is the increase in response time and jitter in priority driven scheduling. This section starts by explaining what specific control periods should be avoided due to the implication of the sampling process itself (section 3.3.1), then goes on by describing the choice of control period as depending on the dynamics of the system (section 3.3.2), followed by ideas of how to choose control period when several tasks are running simultaneously (section 3.3.3). Non-uniform and event triggered sampling are alternatives to time triggered equidistant sampling and multirate sampling also has its benefits (section 3.3.4).
25
3.3.1 Loss of reachability and observability through sampling The intersample behaviour is lost in the discrete time model for certain sampling periods. Observability or reachability can be lost if the pulse transfer function has common poles and zeros. The poles and zeros are determined by the sampling period and a change in the period can make the system observable or reachable again, [Åström and Wittenmark, 1997]. If the rules of thumb in section 3.3.2 are used for selecting the sampling period, the cancelling of zeros which leads to hidden oscillation, is avoided. The effect of sampling period on time-optimal control is treated in [Karbassi et al., 1996]. The objective of time-optimal control is to drive the state of a process below a limit in the shortest possible time. Time-optimal control often leads to a control output which is a sequence of pulses with same amplitude but with an alternating sign. One example of application is the control of flexible mechanical links. The sampling period is a critical design parameter for a time-optimal controller (it is also very critical for the similar deadbeat controller).The effect of sampling period on controllability was investigated by Kalman and is also found in [Chen, 1984]. A sufficient condition for controllability is that if Re ( λ i – λ j ) = 0 then 2πk Im ( λ i – λ j ) ≠ --------h
k = ± 1, ± 2, ± 3, …
(15)
must hold, where λi and λj are two different eigenvalues of a controllable continuous open loop system matrix, A. This makes it clear that certain sampling periods, h, should be avoided. It is however easier to determine the controllability after a discretization by checking that the controllability matrix has full rank, than using (15). 3.3.2 Choice of sampling period According to a rule of thumb in [Åström and Wittenmark, 1997] the sampling period h, should be related to the rise time of the closed loop, Trise: T rise ----------- ≈ ( 4 – 10 ) h
(16)
For a first order system Trise is equal to the time constant. For a second order system Trise is proportional to the natural frequency of the closed loop and (16) can be written as: ω 0 h ≈ ( 0.2 – 0.6 )
(17)
for a damping coefficient of ζ = 0.7 . This can also be expressed in the frequency domain: a reasonable choice of sampling frequency is 10 to 30 times the bandwidth. It could be argued that the choice of sampling period should depend on the dynamics and the nature of the disturbances acting on the process, [MacGregor, 1976]. Consider a transient disturbance. The longer the sampling period is, the more time will the disturbance get to propagate and increase the control error. The instant in time when the disturbance enters between two sampling instants is arbitrary. The dynamics of the disturbance the controller is designed to reject, is of course considered in the control design. If the disturbance is measurable its sampling period must be chosen. The speci26
fied dynamics of the closed loop should be related to the known dynamics of the disturbances. Thus, it is more general to state that the choice of sampling period should be governed by the closed loop dynamics, possibly found in the continuous time specifications. Linear control design for a process (A, B) gives the closed loop system in (4), Φs = Φ(h)-LΓ(h), [Karbassi et al., 1996]. When h is decreased it follows that lim Φ ( h ) = I
h→0
lim Γ ( h ) = 0
h→0
and that the feedback gain matrix L has an infinite norm for asymptotically stable, marginally stable and unstable processes. If h on the other hand is increased then, for an asymptotically stable open-loop continuous time system lim Φ ( h ) = 0
h→∞
which means that the feedback gain must be a null matrix. In a marginally stable system with eigenvalues on the imaginary axis, Φ(h) and Γ(h) are bounded. For an unstable process with all eigenvalues in the right half plane, Φ(h) and Γ(h) are unbounded and the norm of L depends on the relative norms of Φ(h) and Γ(h) even for a stabilized closed loop system. The norm of the closed loop matrix Φs can be used to investigate the size of the control signal when L is given by some control design. It is concluded that, for an asymptotically stable open-loop system, there is a trade-off between control amplitude and sampling period: for a long sampling period, the system is slowly driven e.g. to zero state, but with a small control output amplitude. The opposite is true for a short sampling period. For unstable systems a relation between sampling period and amplitude could not be established. The choice of sampling period is treated in [Zeng-Qi, 1981] from the set-out of investigating the effect of long sampling periods on disturbance rejection. The idea is that a long sampling period could do as well as a short one, but with the benefit that the implementation would be facilitated if the computational resource is scarce. The process is described by a continuous time state space equation (1). To the discrete time optimal controller with continuous time weight matrices, an optional Kalman estimator is added. As a measure of control performance, the average expected loss during a period of stationarity is chosen to be: 1 ( k + 1 )h T T J C = --- E ∫ ( x Q 1c x + u Q 2c u ) dt h kh where E is the expected value operator. The weight matrices Q1c and Q2c let the user choose parameters for what is optimal. The intersample behaviour of the system is taken into account when deriving a corresponding discrete time loss function for the control performance, cf. (6): 1 J D = --- tr ( QP ) + J V h where Q is a polynomial combination of the discrete time weight matrices corresponding to Q1c and Q2c, P is the stationary loss which is found by a Riccati equation, tr is the trace operator and JV is the impact of intersample behaviour. In general, when the 27
sampling period is decreased, the magnitude of the control output increases. A maximum value for the control output is very reasonable to have in a practical system. A method for bounding the control output by changing the weight matrices according to the sampling period is developed. By changing the weights and sampling period, the dynamics of the system is changed somewhat. In the derivation of the method, the worst case disturbance and the maximum control output value is modelled as an initial disturbance. It is claimed that this is a good approximation for most processes except for those which are badly damped. The choice of sampling period for a number of discrete time controllers are compared in [Lennartson, 1987] and more exhaustively in [Lennartson, 1985]. In order to get a fair comparison every controller is tuned such that the variance of the control output is held fixed, Pu=Eu2=constant, and independent of the sampling period, see figure 7. This is done by changing an appropriate tuning parameter; it is either the weight matrix of the control output, the dominating time constant or the natural frequency depending on control algorithm in question. As a measure of performance, a continuous time average output variance is used. A discrete time performance index can not be set up directly because the behaviour between the sampling instances must be taken into account too. The continuous performance measure P y = E y 2 , can now be converted to a discrete one: 1 T P y ( kh ) = --- tr ( Q 1 P x ( kh ) + 2Q 12 P xu ( kh ) + Q 2 P u ( kh ) + QP v ) h where Px is the variance of the state x, Pxu is the co-variance of x and u and Pv is the variance of the disturbance on the state equation. The integrals for obtaining the discrete versions Q1, Q12, Q2 are computed numerically. The equation is independent of the control strategy. E y2(t) = Py U(s) G(s) E u2(t) = Pu
e-sTd
process
Y(s) h zoh
W(s) + E e2(t) = Pe
Figure 7. A SISO process with time delay Td, control output U(s), measurement signal Y(s), and measurement noise W(s).
The SISO systems studied are extended by allowing time delay in the process and integral action in the controller. A time delay in the process is handled by a prediction, which preserves the order of the state matrix. Three optimal controllers and six pole placement controllers proposed in the literature are investigated. The controllers are tested on four processes: a damped 2nd order, a resonant 3rd order, a 4th order and a 2nd order non-minimum phase system. Process and measurement disturbances are added. For the optimal controllers the performance measure is monotonically decreasing for decreasing sampling period. Pole placement controllers which have a continuous time correspondence are close to the optimal controllers. However, for a controller that do not have a continuous time counterpart, e.g. has a pole in the origin of the zplane, the output variance is minimum for a specific value of the sampling time but
28
increases for shorter sampling periods. Zeros in the right half z-plane can be compensated and poles are preferably placed in the right half plane. Compensating (cancelling) zeros in the left half z-plane should be avoided. The general recommendation for the choice of sampling period follows [Åström and Wittenmark, 1997] but remarks that it is often beneficial to sample a little bit faster. If the level of measurement noise is increased the level of the output variance becomes higher for shorter sampling periods, because more disturbed measurements are used. Sampling period sensitivity around a small period (h close to zero), with respect to the solution of the discrete Riccati-equation, P(h), is investigated in [Melzer, 1971]. A system is given in continuous time (1) and an optimal control law is investigated. A quadratic performance index is applied (5). The first order term of the Taylor-expansion of P(h), is shown to equal zero. The second term is positive semidefinit and is determined by solving a discrete time algebraic Lyapunov equation. By inspection of the maximum eigenvalue λmax(h) of P(h), a measure of the cost of sampling period for a small sampling period is defined as: 2
2
h ∂ ----λ (h) λ max ( h ) – λ max ( 0 ) 2 ∂ h 2 max --------------------------------------------- ≈ ----------------------------------λ max ( 0 ) λ max ( 0 )
(18)
where the numerator of the right hand side is given by the eigenvector of P(h). The drawback with this method is that the convergence of the Taylor-expansion always must be investigated. It can be seen on the right hand side of (18) that the cost is quadratic in the sampling period, h. The choice of measure is justified by knowing that the J(h) always is: T
J ( h ) ≤ λ max ( h )x ( 0 )x ( 0 ) which can be compared to the cost of the quadratic performance index in (7). 3.3.3 Optimal choice of control period for several loops When several controllers are sharing the same resource the partial utilization for each controller is adjusted with the period (or execution time) in order to optimize the behaviour of the whole system, under the constraint that the maximum total utilization should be upper bounded. A number of papers describes optimization methods where the tasks do not interfere with each other, i.e. the scheduling is disregarded. In [Seto et al., 1996] a model to select control periods for a set of control loops is proposed. The periods can be altered within a specified range, to optimize the use of a limited computational resource. The optimization is based on a monotonically decreasing convex time value function which is the difference between the performance index for the digital controller (depends on the sampling period) and the performance index for a corresponding continuous time controller. A weighted sum as in (8) is formed to relate the importance of the tasks to each other. A quadratic loss function based on physical properties of the process is used to transform a mathematical description of the process behaviour to an exponential decay function. This need not be in a strict control theoretical sense as e.g. (5), because the optimal control law is not searched for. The transformation is in most cases non-trivial and involves several approximations. Tasks for noncontrol purposes which do not have performance indices or cannot change periods can not take part in the optimization. A schedule (earliest deadline first) is then assigned 29
and it is proposed that the schedule can be changed in case of a partial failure. A feasible schedule is found by altering the periods and at the same time optimizing the weighted sum. A scheduler running n optimal control tasks is proposed in [Eker, 1999]. It handles varying execution times Ci = {c1, c2, ... cn}, i.e. a set of n execution times for each task i, and has a reference utilization load Uref. The scheduler uses the criteria in (8) with Ji = Ji(h) to find suitable sampling periods to maximize the performance with respect to the whole system, keeping the utilization at a specified limit. A derivation of the first and second derivative of the cost function J(h) is determined. An optimization criteria is set up. The approach is modified to be run recursively at every scheduling decision point, and Ci and Uref are allowed to be time varying. The cost for each optimal controller is found to be convex, or more precisely quadratic with respect to the control period. As the optimization is computationally intensive a simpler quadratic model is tried: J i ( h ) = ai + bi h
2
where ai and bi can be calculated, and they do not depend on h. In the subsequent result, the parameter a is not used, which is in line with the results from [Melzer et al., 1971]. An explicit expression is given as an input to the optimization algorithm. bi can be interpreted as the relative importance of a task. An algorithm with the objective to handle overload, called the elastic task model, is outlined in [Buttazzo et al., 1998]. If a task needs a shorter or longer period or if a new task arrives, other tasks can accommodate this execution request to guarantee schedulability. The task model consists of a computation time, a nominal period, a limit on minimum period, a limit on maximum period, and a deadline which equals the nominal period. A schedulability condition in the case of the earliest deadline first algorithm is proved for the model, but other fixed or dynamic scheduling algorithms are also claimed to work. The accommodation of load is done in analogy with a linear spring system. Every task is a spring and the springs are connected in series. The length xi of a spring i, is equivalent to the task’s utilization factor and the stiffness, ki, to the inverse of a coefficient of elasticity. At every scheduling instant the periods are calculated in analogy with a spring system as: K ∀i x i = x i0 – ( L 0 – L d ) ---ki 1 K = 1 ⁄ ∑ ---- k i where Ld is the desired total length, L0 the initial total length at rest, and xi0 initial length of spring i at rest. When the maximum period limit is reached the task becomes rigid, i.e. it can not contract further, because the utilization can not be compressed any more. The decompression algorithm applied to handle under-load when a task is removed, is similar to the compression algorithm. The designer has to specify the elasticity coefficients for every task, but no guidelines for this is given. This simple linear weighting scheme does not demand any computationally intensive search algorithm. Algorithms for adaptation of task periods for autonomous robots are studied in [Beccari et al., 1999]. The reason for the rate modulation policy proposed, is the dynamic environment a robot is exposed to: variations in the load must be coped with. The original task set is divided into hard and soft tasks. A hard task is characterized by period, deadline and worst case execution time. A soft task, on the other hand, has a range of allowable period times. As the soft task set is scheduled by the rate monotonic algo30
rithm, the priority must be preserved and the actual period time of one task is thus influenced by that of other tasks. The change of periods of a task is assumed to be performed only at well defined time instants when the situation admits it. Six algorithms for adaptation are compared. Three are based on the idea of re-scaling periods for all or some tasks, and three are based on linear programming. The interdependence of tasks is addressed in [Shin and Meissner, 1999]. The scheduling of a multi-processor system is considered. A task period is changed up or down step wise. A performance index, which is convex with respect to the period, is created for each task as in [Seto et al., 1996] and is weighted to a total performance index for the system, see (8). As the problem is NP-complete, two heuristic algorithms are proposed: one for dependent and one for independent tasks. For a dependent task the change of periods of one task forces a change of periods of another task. The algorithms are based on heuristic methods for solving the well known so called knapsack problem. A simple periodic scheduling strategy is used for each processor. The performance index is used for finding task periods and to determine if backup tasks are necessary to cover up a transient processor failure. The average penalty for not updating the control signal, estimated recovery delay and failure rate, are used to decide if it pays off to have a backup task. The so called period calibration method is developed in [Gerber et al., 1995]. The endto-end requirements are constraints usually specified for a real-time control system, hence constraints for intermediate tasks can be chosen in a way which fully utilizes the computational resource. The objective of this off-line method is actually to minimize resource utilization. From an asynchronous task graph which defines the dependencies such as precedence constraints, the original task graph is transformed to a new task graph using the timing constraints: freshness (latency), correlation (synchronization) and separation (jitter tolerance). An optimization algorithm is then applied to solve the intermediate constraints. With the assumption that the control performance increases for shorter periods, optimal periods in rate monotonic scheduling is studied in [Seto, Lehoczky and Sha, 1998]. The cost function is assumed to increase monotonically with the period time. A maximum allowable period gives a lower constraint for each control task. Two algorithms are described: one with a given priority order solved by a branch and bound method, and one without this constraint, also solved with a branch and bound scheme to limit the complexity of the number of permutations of periods, P. The complexity of the algorithm depends on the constraints of the periods, and not so much on the number of tasks. 3.3.4 Multirate, asynchronous and event triggered sampling schemes The introduction of non-uniform sampling is primarily based on the idea that the scheduling effort could be eased if the sampling periods are better adjusted to the needs of the closed loop system. Multirate sampling is natural for a distributed system since subsystems typically have different dynamics, see e.g. [Törngren and Wikander, 1996]. Computer control systems with multiple data sampling devices with different periods are called multirate sampled data control systems. When the ratios between the sampling periods are positive rational numbers, the discrete time model of the entire system becomes periodical. The length of the so called basic time period is the least common multiple of the sampling periods. The so called shortest time period (or greatest common divisor) defines a tick length. The period length of a sampling scheme is 31
the basic time period divided by the tick length. Because of the multiple rates, a ztransform for the entire system can not be set up directly. First, the discrete systems must be transformed to the common tick length. This is done by adding switching logic to the state matrices, in that a state or input signal can be held constant for longer time than one tick. This technique is called lifting. A z-transform can now be set up, which enables the multi-input-multi-output control synthesis and analysis to become similar to single-input-single-output systems. Stability in multirate systems is the concern in [Araki and Yamamoto, 1986]. A discrete time state space equation and a multirate transfer function are set up from a continuous time representation. The Nyquist criterion for a multirate controller is then derived. In [Berg et al., 1988] three design methods for multirate systems are laid out: • An ad hoc method for designing a multirate system is to design the single rate control loops one by one. Two simplifications are then made: a) a fast loop, i.e. a high bandwidth and a corresponding short sampling period, is assumed to respond instantaneously (i.e. G ( s ) ≈ 1 ) and b) the decoupling or feedforward actions between controllers are simplified or neglected. • Another multirate design method is to formulate the entire plant as an LQ optimal control problem, which then can be worked out with the same methods as for a single rate LQ problem. This is a lifting technique. The linear feedback matrix and the Kalman estimator will be time-varying with the basic time period. The reason for this is that the steady-state solution to the Riccati equation is time-varying. The advantage of the LQ design is that it accounts for the coupling between control loops. • The periodic nature of the solution can be tackled by reformulating the control law with binary switching functions to become a stationary gain vector; the disadvantage is that the parameters then must be searched for. In the paper the three design methods are compared. An objective is that the computational load should be the same for the three controllers. For a system with physically independent control loops, the first method with the successive closing of control loops is preferable. However, for a robot arm, which has physically coupled control loops, the other two design methods are shown to outperform the first approach, because it is very difficult to apply due to the complex coupled structure. In [Voulgaris, 1994] asynchronous sampling is investigated, where the sampling period, hs, related to the control period, hc, is an irrational number. In this case the lifting technique is impossible to apply as in [Araki and Yamamoto, 1986]. An optimal controller is derived using a Kalman filter to estimate the state at the times when the controller is to calculate an output. The cost function is the usual time continuous one (5) and the system is given in continuous time on a state space form (1). Measurements are given at: y m = Cx ( mh s ) + v m, m = 0, 1, …
32
and the optimal control law L is calculated the usual way by solving a discrete Riccati equation and in this case using past information Ik: u k = – Lxˆ k xˆ k ≡ E { x k, I k }, with I k ≡ { y m ;mh s ≤ kh c } ∪ { u 0, u 1, …, u k – 1 } and an estimation of the state, xˆ , is given by a Kalman filter up to the time: m k = max { m ;mh s ≤ kh c } The optimal cost J and the variance P for this dual sampling scheme are then derived for the control and filter combination. In [Salt et al., 1993], the dual rate problem with a longer sampling period hs than control period hc is the reason to set up a discrete model of the process with a shorter actuation period. The long period has to be an integer multiple of the shorter. The control output is constant over its sampling period. A continuous time transfer function is converted to a discrete time multirate transfer function and compared to the transfer functions based on the shorter periods. Stability in a distributed computer control system without buffers is studied in [Walsh et al., 1999] and sample rejection is allowed. Scheduling algorithms for both on-line and off-line scheduling are proposed. The goal is to find the longest period for which the performance specifications still can be satisfied. The dynamics of the process and of the controller are described with time continuous state spaces equations. The originally given controller is assumed to be designed without considering the computer system. The general global exponential stability analysis used is fairly mathematical and is based on Lyapunov theory. An event driven communication strategy is adopted based on the control error, which can be computed at every node. The error for any node is bounded by the error growth rate and the number of nodes. The node hosting the controller with the greatest weighted error wins the contention for the bus. A maximum possible sampling period for a static off-line schedule is calculated. The idea is based on a token passing protocol, with a round robin time division of the bus. A dynamic scheduler is also proposed. This is also based on token passing, but the time between two tokens received by a node is unknown. The paper develops a reasoning about the delay that does not use the theory for maximum delay developed for the token passing protocol. Event based sampling is an alternative to time triggered sampling if the sensor is of an even triggered type, if the actuator is of the type on-off, or if the execution or communication time is limited, [Åström and Bernhardsson, 1999]. The control strategy is to drive the system to a desired state whenever it exceeds a specified limit of drift. The event triggered sampling is here called Lebesgue sampling as opposed to the time triggered which is called Riemann sampling. The time it takes for the process to drift to the limit is called the mean exit time, D. The variance P of the Lebesgue sampling is the integral: D
P =
∫x
2
f ( x ) dx
–D
33
where x is the state of the system and the stationary distribution f(x) is given by a so called Kolmogorov forward equation for a Markov process. It is shown that Lebesgue sampling improves the control both for stable and unstable stochastic first order systems. It can be noted in the simulated example that the magnitude of the control signal is approximately five times higher, but with 12% fewer sampling instances. An event triggered PID controller is proposed in [Årzen, 1999]. An event is triggered for three reasons: a measurement signal exceeds a limit, a change in a signal exceeds a threshold, or the setpoint is changed. It is concluded that the change in the error signal e current – e old should be used to trigger the event which drives the state to the desired setpoint. The code for an event based PID algorithm is given. The calculation of the change in error is time triggered and the output calculation is event triggered. The parameters of the PID control law is adjusted based on the resulting sampling period, i.e. the time to the last actuation. An example is given where the number of actuations is reduced to half of the number used for an equivalent time triggered periodic controller. In [Choi et al., 1997] the cost of computation is incorporated in the loss function, J: M–1
J =
∑
M–1 T
T
T
[ x ( k )Qx ( k ) + u ( k )Ru ( k ) ] + x ( M )Qx ( M ) +
k=0
∑
d ( k )m
k=0
where d(k) is a binary value, equal to one if there is a control action and zero otherwise, m is the cost associated to the control action and M is the time horizon. This is called temporal control by the authors. The idea is to save bandwidth for higher level control functions by actuating only at time instances when it is appropriate. The schedule is periodic, but any instance can be skipped. The loss function is not solved by a Riccati equation, but with a search method like the steepest descent or simulated annealing. The feedback gain matrices depend on the initial state, i.e. the use of the control algorithm is very limited.
3.4 Jitter Delay jitter and control period jitter both affect control performance and stability. Jitter is often thought of as being considerably shorter than the sampling period τ « h , and the severity of jitter depends on the closed loop dynamics and if vacant sampling will occur or not. The jitter is popularly modelled as a stochastic variable, e.g. uniformly distributed. In other words, the computer system is seen as a source of stochastic variations; this is a way to isolate the control from computer engineering. The following treatment of jitter is divided into two subsections: control delay jitter in section 3.4.1 and control period jitter in section 3.4.2. 3.4.1 Delay jitter An early study of stability of delay jitter, caused by e.g. mis-synchronizing, interrupts and buffering, is found in [Belle Isle, 1975]. The basic setup in figure 1 is used. It is argued that to assume a worst case delay which equals one sampling period to avoid jitter is a too costly control design solution. A solution is proposed, where non-linearities and delays are allowed. The approach is to set up a stochastic Lyapunov candidate function and show that it is valid. Further, due to the “well behaved nature” of the process, formed by augmenting the parameters of a time continuous process as in (1) plus a delay, an existence of a Lyapunov function will guarantee an almost-sure stability. The 34
parameters of this new process are random processes, but they may, according the to the paper, not be governed by white noise. Almost sure means that almost all realisations of a so called completely deterministic random process are predictable, see [Åström and Wittenmark, 1997]. The paper is fairly mathematical and the theory is hard to apply. Time varying delay is studied in [Halevi and Ray, 1988] and [Ray and Halevi, 1988] in the case the delay is deterministic and periodic. Both the process and controller are time invariant and a discrete time state matrix with varying network delay is proposed. The system has a time triggered sensor and a time triggered controller which communicate over a bus. It is proposed that in order to balance vacant sampling with sample rejection, the sampling period could be slightly different from the control period (asynchronous loops). Another item discussed is the importance of the time skew between sensor and controller when the sampling period equals the control period, cf. section 2.3. For both cases, loss functions are constructed to find the optimal skew relative to the difference in periods. The assumptions and simplification for the modelled system are: 1) the transmission and queuing delay is deterministic 2) the sampling period equals the control period 3) a time skew is used between the sensor and the controller 4) the control processing delay is constant 5) there are no transient errors and no overload, and 6) the input buffer is overwritten by a new message. It is argued that this approach is simpler to use for analysis than the continuous time approaches, e.g. [Belle Isle, 1975]. The assumptions are that both the controller and the process are time-invariant. The time varying delays are represented in the augmented state vector which includes the controller, the process and all delayed input signals. The magnitude of the time delays are thus assumed to be known or they have been discretized from a continuous valued variable. The two time delays, τsc and τca, can be treated separately in the model. Vacant sampling and sample rejection are defined in the paper, cf. section 2.3. The augmented time varying discrete time state space vector can be written as a closed loop state description, which then consists of: 1) the closed loop system state (which is time varying), 2) the current and past process outputs, 3) the current state of the controller and 4) the past control outputs. The size of the state is finite, but it can become very large. An LQG- controller with an optimal state estimator is proposed in [Ray, 1994], but the separation theorem does not hold because of “multiplicative uncertainties”. The assumptions are: 1) the sensor and controller have the same sampling period but shifted a constant skew which is less than a sampling period, 2) the actuator is event triggered, 3) the computational delay of the controller is constant, 4) the network delays (τsc and τca) are independent but with identical and known probabilities, 5) τ sc + τ ca < h and the 6) sensor and equation noise are both Gaussian. The model with a TT senor, a TT controller and an ET actuator configuration implies that during a sensor sampling period exactly one sensor signal arrives at the controller, but 0, 1 or 2 actuator signals can arrive to the actuator. An ordinary discrete time loss function as in (6) is used with the linear control law uk = –L
xˆ k uk – 1
where xˆ is the estimated state, uk-1 is the previous output, and the feedback gain L depends on the state and penalty matrices.
35
Random delays in a network, stochastically independent or governed by a Markov chain, is studied in [Nilsson, 1998] and a controller with a minimum variance state estimator is derived for each of the two cases. The distributed computer setup is the same as in figure 1; a TT sensor, a TT controller and an ET actuator. The total delay has to be less than a sampling period, i.e. τ sc + τ ca < h . Case 1. The two stochastically independent network delays (τsc and τca) have known probability distributions. All old time delays are assumed to be known for the state estimation. The time delay in the messages are calculated by time stamping with a global clock time. The LQG-controller is written as: xk u k = – L ( τ sc ) uk – 1 where the feedback gain L depends on the delay from the sensor to the controller, and is preferably pre-calculated and stored in a table. Case 2. Discrete time jump linear systems have parameter changes modelled at discrete instances, e.g. x k + 1 = A ( r k )x k + B ( r k )u k where rk is a Markov state with a delay determined by a certain frequency distribution, e.g. rectangular distributed. The random jumps can be modelled by a Markov chain, see figure 8. p 1-p
low delay r1
high delay r2
1-q
q Figure 8. p is the probability of jumping from a state with a low delay (in state r1) to a state of high delay (state r2). q is the probability of a transition in the opposite direction.
The Markov chain makes a transition from one state to another between the two time instants k and k+1. Varying network load and queues are preferably modelled by a Markov chain; the delays are different at each Markov state. The current state is known to the controller. The controller is found by minimizing the loss function in (6) which renders a feedback gain L, and control law: xk u k = – L ( τ sc, r k ) uk – 1 where τsc is measured and rk is the Markov state. To handle many different delays τsc and τca, the number of states can be increased. If instead two state transitions are used between successive samples the number of states will only increase by a factor two.
36
Optimal control over a network with queues is studied in [Chan and Özgüner, 1995]. A FIFO queue is modelled at the sampler, see figure 9. yk
Queue (FIFO)
Process τsc uk Controller
wk
Figure 9. The network delay is τsc. The control signal uk is not sent over the network, thus τca=0.
The network traffic makes the transmission delay τsc time varying and non-deterministic. The sampling and control periods are the same for the producer and consumer of the queue. There is a time skew between the sampling and control instance. The delayed input to the controller is wk. The queue has a maximum length of m positions, see figure 10. wk
yk
i=
1
2
...
3
m
Figure 10. Queue with length m. If i > m then vacant sampling occurs and wk = wk-1.
The multiplexing of the network is described with a switching function and the switching is governed by a Markov chain to which a probability matrix of state transitions must be supplied. The process modelled is time discrete and linear. An optimal controller for this communication link and jump system is derived from a standard discrete time performance index as in (6). A non-switching linear control law is derived, i.e. only one gain and no information of old time delays is used. Random delay control output in optimal control is studied in [Davidson, 1973], see also section 3.4.2 for control period jitter. The random delays are of two types: 1) integer delay, where the control output can be delayed k = {0,1,2...} sampling periods, and 2) non-integer delay, where the control output is delayed a fraction of h. A random delay is described by a frequency distribution f(τ). In the case of integer delay, the state must be augmented by old control output. The system matrix, the input matrix, the loss matrix and the noise matrix are all augmented. A rather complicated cost function to be minimized is given by: τ T h ˆ 1c xˆ ( t + σ ) + u T ( t )Q ˆ 2c u T ( t ) dσ J ( xˆ , t ) = E ∫ f ( τ ) ∫ xˆ ( t k + σ )Q k k k 0 0 h T T T + f ( τ ) ∫ x ( t k + σ )Q 1c x ( t k + σ ) + u ( t k )Q 2c u ( t k ) dτ + J ( xˆ , t k + 1 ) τ
ˆ are the augmented versions of x and Q respectively and J ( xˆ , t where xˆ and Q k + 1 ) is a prediction. Minimization gives the optimal control law and the variance P, but the equations are omitted here. A similar but even longer cost function is given for the case
37
of impulse control output. The change in variance is defined as the difference between the variance with jitter and the variance in the jitter free case: ∆P = P jitter – P deterministic The change of variance in P and in noise p, are given by: T
∆P – Φ s ∆PΦ s =
τ max
∫0
τf ( τ ) dτ
(19)
h T h T T T T T T ( Φ s – I ) Φ s PΦ s + ∫ Φ Q 1 Φ dτ ΓL + L Γ Φ s PΦ s + ∫ Φ Q 1 Φ dτ ( Φ s – I ) 0 0
where Φs is the closed loop system and τmax is the maximum delay, and the change of noise variance ∆p = tr ( W ∆P ) where W represents an integral which expresses the magnitude of the noise. ∆P is proportional to the first integral on the right hand side of (19). The result of jitter in (nonoptimal) linear feedback control is visualized in figure 11 by plotting the closed loop against the open loop for a scalar system (without the noise term v(t) below): dx = ax ( t ) + u ( t ) + v ( t ) dt
(20)
. Φs 1 optimal 0
-1
0
1
2
Φ
Figure 11. Φ (with A = a) is the system matrix, Φs is do. for the closed loop, see (4). In the shaded zones delay jitter decreases p and P for a noise free 1st order system. The dashed line represents the optimal control law.
The magnitude of the impact of delay jitter is calculated with a scalar system as in (20) used as an example. The noise is rectangular distributed ± 10 % around a nominal delay and the period is h = 1.0 s. For a = 1, ∆p/p is only approximately 0.6%, which is fairly small. An example with sample rejection is also given in [Davidson, 1973] but it is omitted here. The distribution of time delays in distributed computer systems is investigated in [Wittenmark et al., 1998] and [Andreff, 1994]. The model of the computer system is a set of
38
layers (shells) that a measurement signal has to pass to reach the controller in the centre. The layers corresponds to the tasks in the basic model in section 2.1. sensor controller
actuator
1 i L
Figure 12. The controller is in the centre and the sensor and actuator is off the centre. Messages has to pass through other nodes i, (i.e. layers) to get from sensor to controller and further to actuator.
The control output in turn has to pass every layer again on its way out to the actuator. If the layers are badly synchronized, irregular time delays will result; an example shows a worst case delay three times longer than the sampling period. Three types of synchronization types are defined: definite synchronous - two layers have the same period and no skew, partly synchronous - two layers have the same period but a skew and asynchronous - two layers have different sampling periods. Time delay on each layer makes the model more realistic. Delays and their probabilities are found by inspection of plots from schedule simulation of the synchronous models. The maximum possible number of time delays is 2L for a partly synchronous model and 2L – 1 for a synchronous model, where L is the number of layers. The maximum possible delay in the loop is twice the sum of the delay for every layer plus the controller delay. In [Andreff, 1994] jitter is investigated by simulation. Two polynomial controllers (called RST in [Åström and Wittenmark, 1997]) are designed for a DC-motor: one for computational delay and one with no delay. In case of delay, the degree of the desired denominator must be increased as well as the degree of the observer polynomial in the polynomial control design. The stochastic delay is modelled as being rectangular distributed, either as variation around a mean value or rectangular from zero up to a maximum value. The observer polynomial has initially its poles in the origin. For the controller without delay compensation, it is concluded that for random delay, the average delay determines the behaviour of the stable system and the maximum delay must be sufficiently low not to cause instability. The stability margin decreases for increasing constant delay when a delay compensating controller is used. A static delay compensating controller compensates for static or random delays which are close to the expected delay. If a long delay is expected in the design the decreased stability margin can however make the system unstable for short delays. Real double observer poles are shown to increase the stability margin a lot, compared to having the poles in the origin, cf. [Lennartson, 1987]. On the other hand load disturbance rejection will be slowed down. The stability limit (measured as a maximum constant delay) as a function of the control period is calculated: the stability margin increases with increasing control period (which is trivial for a stable process which thus becomes open-loop) but the behaviour for small periods is not linear; within a range of periods, the stability limit does not change. To handle delay jitter a number of implementations for fixed priority scheduling are proposed in [Klein et al., 1993]. A task chain consists of an input function to read data (from a sensor), a function to perform computations with the data, and an output func39
tion (for actuation). The jitter requirement on the output function is described. There is an additional latency requirement between input and output. The output function has a constant execution time, but the computation function has a variable execution time. Four sources of output jitter variability are: • Jitter in the interrupt that starts the job has a large influence. The input jitter is assumed to be small due to the fact that a high priority interrupt starts the job. • Tasks with higher priority, e.g. timer interrupts and tasks at higher priority levels. • Temporary overload, i.e. an instance of a task completes after that a succeeding instance of the same task has started. • Variability in the execution time of the job itself. Two of the implementations proposed are based on the idea of executing the output function at a constant offset from the input interrupt, i.e. to add a skew. In one implementation, the whole job is executed at the highest priority and in the other only the output function is executed at the highest priority.
loop (highest priority) Do_Input_Action;
loop (low priority) Do_Input_Action;
Do_Computation_Action;
Do_Computation_Action;
Set_Priority(highest); Sleep_Until(Next_Output);
Sleep_Until(Next_Output);
Do_Output_Action;
Do_Output_Action;
Next_Start:= Next_Start + Period;
Next_Start:= Next_Start + Period;
Next_Output:= Next_Start + Offset;
Next_Output:= Next_Start + Offset;
Set_Priority(medium); Sleep_Until(Next_Start); end loop;
Sleep_Until(Next_Start); end loop;
Figure 13. Left: Execution of the whole task chain with the highest priority. Right: Execution of the task chain output function with the highest priority. Remark: note how the value for “sleep until” is incremented to save the loop from period jitter: Next_Start := Next_Start + Period. Cf. (11) and (12).
The difference in response time and jitter are none, but to execute the whole output function at the highest priority becomes a question of what the rest of the system can tolerate. Note that the Set_Priority(medium) could cause a timing failure in the next loop, one improvement is to put Set_Priority(medium) after Sleep_Until(Next_Start). In [Cervin, 1999] it is shown by a simulated example that it is possible to improve performance by using a more suitable scheduling approach which reduces scheduling jitter and latency in the control loop. The influence on delays originating from fixed priority scheduling of a control loop is studied. The task model consists of period, worst case execution time and deadline. Tasks are scheduled using a heuristic algorithm to minimize the allowable execution window for the task which calculates the control output. The performance index is the integrated sum of the square of the control error.
40
3.4.2 Control period jitter By using the same approach on investigating delay jitter in section 3.4.1, [Nilsson, 1998] extends the analysis and optimal control design to sampling period jitter, and it is shown that the separation principle holds for this kind of jitter too. h, τsc and τca are all assumed to be stochastic independent processes with known probability distributions. Time-out and vacant sampling are also treated. In [Davidson, 1973] the effect of period jitter in optimal control is studied, see also section 3.4.1. The delay is acting on the control output, i.e. τca. The random sampling is thought of as occurring when several tasks share a computational resource. Two types of sampling jitter are studied: random free sampling jitter and clock synchronized sampling jitter. The free running sampling jitter is defined with, cf. (12): t k + 1 = t k + τk where τk is a random delay period which is described by a frequency distribution f(θ) with an expected mean value equal to the nominal sampling period, i.e. Ef(θ)=h. The clock synchronized sampling is defined by the non-recursive relation, cf. (11): t k = kh + τ k where τk is the random deviation from the nominal sampling period h, and described by a frequency distribution f(θ) with a zero mean Ef(θ)=0. The free running sampling can thus drift relative to the clock synchronized sampling. Two different types of control outputs are considered: step (the output is held constant until the next sample) and impulse (the output is reset to zero again after an initial change). The process is described by the ordinary continuous time state space difference equation with state noise modelled as white noise with zero mean and constant variance (1). The different types of jitter and control outputs are studied in the case the jitter is small (which is called parasitic jitter in [Davidson, 1973]). The relative change in the variance of the Riccati steady-state solution and in the noise is given for each case. The cost index functions are different depending on the problem studied. For free running sampling a somewhat elaborated cost function taken over the time T is: T –t T T J ( x, t ) = E ∫ F ( τ ) { x ( t + τ )Q 1c x ( t + τ ) + u ( t + τ )Q 2c u ( t + τ ) } + 0 T T T f ( τ )E ∫ ( x ( σ + τ )Q 1c x ( σ + τ ) + u ( σ + τ )Q 2c u ( σ + τ ) ) dσ dτ τ where the first integral is the cost up to the point when the sampling has not yet occurred and the second integral is the cost for the remaining time up to T and σ is bounded t < σ ≤ T . In the stationary case T goes to infinity. F ( τ ) is the probability that the sampling has not occurred at time τ according to the frequency distribution f(θ): τ
F ( τ ) = 1 – ∫ f ( θ ) dθ 0
J(x,t) can be written in the usual quadratic form (7) and P(t) is solved by a Riccatiequation, which in turn gives the stationary linear feedback gain. The intersample behaviour is captured in a noise term which derivative is assumed to be constant when T goes to infinity. The resulting equations for the linear feedback gain and the noise term contains both f(τ) and F ( τ ) . A similar equation is also derived for the case of 41
impulse control. In order to numerically compute P(t) efficiently for a reasonable quantization level, a number of differential equations are set up. Those are based on the assumption that the random sampling period is governed by a Markov process with a finite number of states. At one of the states sampling occurs. By defining the transition from other states to the absorbing sampling state and by defining other transition probabilities the frequency distribution f(τ) is constructed. The corresponding control law is derived for the case of clock synchronized jitter. The cost function differs in the way the integration interval and arguments are defined. (Also done for the case with impulse control output.) For parasitic (i.e. small) jitter the change ∆P(t) in P(t) in stationarity for the general non-optimal expression is derived. The jitter also affects the noise term, hence the change in variance because of noise, p(t) is derived too. For a step control output and for a certain discrete frequency distribution (not recited here), the equations become: T
T
∆P – Φ s ∆PΦ s = εΦ s
h
∫0 Φs ( τ, 0 )Q1 Φs ( τ, 0 ) dτ + Φs PΦs – P T
T
Φs + o ( ε )
h ∆p = tr W ( h, 0 ) ∆P + εW ( h, 0 )P + ε ∫ W ( τ, 0 )Q 1 dτ + o ( ε ) 0
where ε is the relative “size” of the frequency distribution, Φs is the closed loop system (4) matrix and: W ( t, 0 ) =
t
∫0 Φ ( t, σ )XΦ ( t, ρ ) dσ
in which Φ is the discretized system matrix A (3) and X is the variance of the white noise. With these equations an investigation of how the cost function depends on the jitter is done. The noise free scalar process in (20) is sampled with h = 1.0 s and is plotted in figure 14. Φ 1 s
optimal 0
-1
0
1
2
Φ
Figure 14. Φ (with A = a) is the system matrix. The feedback gain L is chosen (non-optimal control) to obtain Φs, which is the closed loop system matrix. In shaded areas free running sampling jitter has a stabilizing effect. The dotted line denotes the optimal control law.
In the shaded areas of figure 14, jitter renders a decrease in the cost function, thus jitter can have a stabilizing effect. The magnitude of the impact of jitter in free running sampling is exemplified with a scalar system. The noise is rectangular distributed ± 10 % around the nominal sampling period h = 1.0 s. For a = 1, ∆p/p is only approximately 0.6%, cf. similar result with delay jitter. Impulse control output is also investigated and it is stated that scalar systems with step control output (assuming an optimal control law) is more sensitive to jitter than a system with impulse control. Similar equations are derived for clock synchronized sampling and it is concluded that clock synchronized sampling has a wider stability area than free running sampling, cf. figure 13. 42
Control jitter in time triggered scheduling is treated in [Lin and Herkert, 1996]. The time triggered tasks are scheduled in a harmonic rate fashion, i.e. the original periods are modified to be exponentially related to the base 2 (the base can be any positive integer). The goal is to avoid jitter in the time triggered tasks by the so called distance constraint task model. The distance between two tasks of a task chain must be less than a specified time interval. The tasks have specified execution times and are preemptable. Schedulability tests are stated in the paper, but there is no discussion of how to find an optimal schedule. The aim is instead to introduce more realism: 1) support for resource sharing, 2) non-distance constrained tasks, and 3) aperiodic tasks. A non-jitter solution to resource sharing is to always expect the worst case blocking and incorporate this in the execution time. A solution which leads to of course leads to jitter is to assign priorities to tasks and use the priority inheritance protocol. In the non-distance constraint problem, a task has a fixed period, which can not be subordinated in the harmonic rate execution scheme. The solution here is to guarantee jitter free execution of the other tasks, since there is no way jitter can be avoided for all tasks. The last extension is to allow for aperiodic tasks. For this, several approaches are suggested: the sporadic server, the priority exchange protocol and the deferrable server, with a favour for the last one.
3.5 Transient errors The impact on dynamic systems when the task chain in figure 1 and figure 2 breaks for a moment, has not received much interest in the literature from a control theoretic point of view. The probability of failure depends among other things on the state of the dynamic system at the moment when the error occurs. The question is, how long time does it take before the system becomes unstable? In this case performance is not interesting, only stability. If the process allows the computer system to take some time to recover for a partial failure, the requirements and cost of the services in fault tolerant systems can become less demanding. First analysis of maximum delay from control point of view is recited (section 3.5.1), then two types of reconfiguration of control algorithms are described (section 3.5.2) and finally fault tolerant scheduling is treated (section 3.5.3). 3.5.1 Analysis of the impact of transient errors A distinction is made between a static failure, e.g. a hardware failure which renders a utilization factor greater than one, and a dynamic failure, [Shin and Krishna, 1985], see also section 3.2.2. A dynamic failure means that a hard deadline is missed, which is the case if the controller does not respond fast enough for the environment. Note that the notion of deadline is used differently here: instead of being an attribute in the task model, it denotes the maximum delay from the applications point of view. The allowed state-space is divided into two sets: one set where the terminal constrains must be fulfilled to accomplish the mission of the system, and one set where the “immediate” constraints must be fulfilled not to cause a catastrophic failure. In the latter set, the constraints of the first set are not taken into account unless they come prior to a mission termination. It is argued that a performance measure can be used for task scheduling, task allocation, specification and evaluation of controllers, to find the optimal number of checkpoints for rollback or for finding out the optimal management of redundancy. The drawbacks of a performance index are three [Shin and Krishna, 1985]: exhaustive computations might result if the process is complex, the choice of index is vital — it is 43
difficult to set up a performance index which is natural to the controlled object, and sets for mission critical and immediate constraints must be defined. A hard deadline, defined as the maximum duration of a failure before the system becomes unstable, is used in [Kim et al., 1994] see also section 3.2.2. The deadline is derived from the dynamic properties of the closed loop, and not from the computer system, i.e. the deadline is not originally an attribute of the task model (13) in section 2.3. The fault is assumed to be transient, the error is detectable and the computer can recover from the failure and resume control of the process. The failure can also be the result of an erroneous calculation of the control output. The process model is allowed to be nonlinear, in which case the system is linearized. A time line: a failure occurs at time at k=k0, it is detected at k=k0+n1 and the system is recovered at k = k 0 + n 1 + n 2 . The control signal is during the period n1 modelled by a random sequence and after detection, during n2 it is held constant u ( k ) = u ( k 0 + n 1 + 1 ) . For a static failure (as opposed to the one-shot described later on) the deadline can be quantified in terms of drifts of the poles. The deadline D can be stated as a function of N = n 1 + n 2 and the initial state: D( N, x(k0)) =
inf sup { N ;φ ( k, k 0, x ( k 0 ) ), u a ( k ) ∈ X A ( k ), k 0 ≤ k ≤ k f } ua ( t ) ∈ U A
where φ is a state transition map, XA is the allowable state space, ua the control output and UA the allowable control output space: x ( k ) = φ ( k, k 0, x ( k 0 ), u a ( k ) ) u a ( k ) = u ( k )P 1 + u ( k + n 1 )P 2,
k0 ≤ k ≤ k0 + N
where P1 and P2 model the control output before and after the error detection respectively. Assuming 1) the probability of a failure, 2) a conditional probability on a successful detection of the error, 3) a conditional discrete probability of the recovery duration, 4) a conditional discrete probability of input disturbance and 5) a probability density function of the control signal, the evolution of the state (x(k+1) = ... ) is elaborated to depend on all these assumed probabilities. The state must not go outside its allowable space, XA. The state evolution can be analysed in two ways: a deterministic approach which looks at the movements of the poles, and a stochastic approach which is a statement of almost-sure stability based on Lyapunov theory. A one-shot model is described in [Kim and Shin, 1994] and [Shin and Kim, 1992], cf. section 3.2.2. The hard deadline is defined to be the case when the system leaves the allowable state space eventually causing a catastrophe. The hard deadline is not constant with time. Stationary delays for LQ-control are studied. The allowable state space is divided into two sets: 1) a set in XA1 which the system must stay to avoid immediate dynamic failure and 2) a set XA2, which is a necessary (but not sufficient) to meet the terminal constraints (the final setpoint). It is shown in an example how the set XA2 can be derived from the initial state but it is in practice hard to obtain XA2. The size of XA2 depends e.g. on the time to recovery. 3.5.2 Reconfiguration of control algorithm Control reconfiguration in fault tolerant control systems is a method based on analytical redundancy (information redundancy) to cope with subsystem failures. The issue
44
will only be exemplified with a frequency domain approach to this problem proposed in [Zhenyu et al., 1999], in which the loss of an actuator is studied. u(t) Control mixer
uf(t)
+
Actuators
xu(t)
y(t) Process
Sensors
Nominal control law
ur(t) Figure 15. The control mixer changes the output from the nominal control law.
The process in figure 15 can be written as x˙( t ) = f ( x, t ) + x u ( t ) y(t ) = g(t ) x u ( t ) = B ( t )u ( t ) where B is the actuator matrix for normal operation. The closed loop becomes: x˙( t ) = f ( x, t ) + B ( t )K ( t ) ( u f ( t ) + u r ( t ) ) y ( t ) = g ( x, t ) with K being the control mixer matrix, f and g are piece wise continuous and Lipschitz functions, and ur the reference signal. A control mixer matrix Kf which satisfies B f ( t )K f ( t ) = B ( t )K 0 ( t ) is to be used after the failure. K0 denotes the control mixer matrix during fault free operation and define it as the unit matrix. Kf denotes the control mixer matrix to be used after a failure and Bf denotes the dynamics after the failure of an actuator. A method which involves pseudo inverse matrices, Fourier transforms and numerical solutions to a constraint optimization problem for finding Kf, is laid out. It is claimed that the state space or transfer function descriptions of nominal and failed systems are not needed, only the frequency response. The non-linear problems when switching controllers are tackled by the so called heterogeneous control method, where a weighted sum of control laws is used to make a transition smoother. The proposed method has the objective to follow the nominal design closely. This might not be necessary for all control systems. The purpose of the so called simplex architecture, [Sha et al., 1995] and [Seto, Krogh and Sha, 1998], is primarily to support on-line modification of a computer control system during its life time. Both software and hardware units should be possible to remove and insert on-line. An evolving system will lower the cost and risk to add new improved functionalities. The operating system, or a fault tolerant “middleware”, should provide a range of services: on-line upgrade, fault tolerance to hardware and software failures, scheduling (preferably generalized rate monotonic), system configuration and status monitoring. There should be no need to modify application code when the non-functional requirement changes such as dependability and performance. A voting scheme in a redundant fault-tolerant system suffers from the upgrade paradox: if a minority of components are changed it will take no effect and if a majority of components are changed, a fault will make the system fail. To overcome the upgrade paradox, 45
the concept of analytic redundancy is introduced. Replication and functional redundancy both deliver the same output from the same input. Analytical redundancy is allowed to deliver non-identical results, provided the requirements of the application are still satisfied. The analytical redundancy admits model based voting. The state of the process is divided into three regions: a large safety region, an experimental region with unknown limits and a narrow region for normal operation. Three controllers with different purposes are designed, see figure 16. The baseline controller is a well functioning controller and it is intended for the normal operation mode. The experimental controller is to be inserted or modified on-line. Finally, the safety controller must be able to stabilize the system in case the other controllers fail to do so. The baseline controller renders a better performance during normal operation, than does the safety controller. The state transition from one controller to the next is governed by the state region the process is in. Decision logic determines which controller is active. The controllers can run in parallel, but exactly only one controller will be used for the actuation. A supervisor management function forms a closed loop. It is alleged that the switching of controllers protects from both semantic and timing faults. By using separate addressing spaces resource sharing faults can be tackled. Safety region
Safety 4
1,4 3
Baseline region
3 Baseline
2
Experimental region
1
Experimental 2
Figure 16. A typical scenario of in the simplex architecture of an activated experimental controller which fails to stabilize the process.
A typical scenario to illustrate the state transitions depicted in figure 16 would be: 1) The state of the system is within the baseline region. The baseline controller is disconnected and the experimental controller is activated. 2) The experimental controller moves the state of the process outside the safety region. 3) The experimental controller is disconnected and the safety controller takes over the control. 4) When the safety controller has driven the state of the process into a narrow region, the safety controller is disconnected and the base line controller takes over. 3.5.3 Fault tolerant scheduling The impact of transient errors can be reduced by selecting an appropriate scheduling strategy and by considering the architecture of the whole system. In the strictly time triggered approach, e.g. TTA in [Kopetz, Nossal et al., 1997] or DACAPO [Rostamzadeh et al., 1995] the static scheduling makes detection and fault tolerant services as e.g. replica determinism easier to implement, see section 2.3 and section 2.5. The systems have a fault recovery delay of only one TDMA round. Static schedules must have spare capacity to deal with hardware failures, which means that they are used inefficiently under error free circumstances. Dynamic schedules are more flexible in this respect. In order to tolerate transient and intermittent faults and still meet hard deadlines, a slack reserving scheme is proposed in [Ghosh et al., 1995]. Starting with a queue of
46
tasks representing a non-fault tolerant schedule, a number of backups with the same execution times as the original tasks, are inserted into the queue. A task is represented by arrival time, worst case execution time and deadline. A task graph is constructed to find out possible positions of backups. The shortest weighted path from the root to any leave will render the shortest schedule. Two solutions to find the shortest schedule are described: one using dynamic programming and one non-optimal greedy algorithm (which may lead to an infeasible schedule) with a lower complexity and hence more suited for on-line insertion of backups. The difference between the algorithms (the number of misses the greedy algorithm does) depends on the load; for some reason the difference is highest at medium load. Fault tolerant scheduling is based on ghost or backup copies of a task, see e.g. [Krishna and Shin, 1997]. The ghost copy need not be an exact copy of the primary task, an analytically redundant version is enough. The ghost task is always scheduled before the primary task. The schedule can be divided in two: a ghost schedule and a primary task schedule. If a processing unit fails, processing must continue on other processing units. Each primary task must have n number of ghost tasks on exactly n different processing units (not including the processing unit used for the primary task) to sustain n number of permanent failures. The overlap in execution time of ghost tasks and primary tasks must be resolved in a way that allows for a feasible schedule for all applications when a processing unit fails.
4 Conclusions Scalability is defined as the ability of a system to endure and support changes as a result of the end-user’s new demands. It can be decomposed into: replaceability, extensibility, reducability and configurability. It is not enough to look at the early design phase only. During the lifetime of an embedded computer control system, the computer system is likely to be redesigned. A computer control system has to be dependable and maintain a predictable execution in case of a failure. From a control point of view, a failure is when the loop is broken for an extended period of time which endangers the stability of the system. Because of the cost involved by using additional hardware for redundancy, time redundancy or information redundancy are preferable. In a control system it is possible to explore all three ways. The number of missed samples or missed control outputs before a system, in the worst case becomes unstable, is an important piece of information for the system designer. The first step in scheduling is to identify a suitable task model. It is natural for a control application to employ both a periodic and a sporadic task model. Tasks and messages can be categorized according to the information contents they convey and how the information is used. The most inevitable tasks and messages from a control synthesis point of view, i.e. for measurement, control output and setpoint, are not necessarily the most time critical in the computer system. Precedence and exclusion constraints in the task chain from sensor to actuator via controller, are typical for a control application. This can be explored in the end-to-end scheduling. The nature of pre-runtime scheduling gives the designer theoretically an infinite amount of time to minimize jitter and computational delay. Because of a more predict-
47
able execution, services for fault tolerance, e.g. fault detection and replica determinism, are easier to implement. However, the predictability of a time triggered system conflicts with the flexibility requirement. A combination of off-line and on-line scheduling is of course an alternative. The control delay should of course be as small as possible. To this rule there is no exception. The process often has a delay in itself and the classical analysis and control design of a system with constant delay is well known. Because the compensation of constant delay is readily made in the synthesis, an additional time delay, a skew, between a producer and its consumer, is frequently added between two time triggered tasks or messages. There are two reasons for inserting a skew: 1) it decreases jitter and minimizes the risk of vacant sampling, and 2) it is a substitution for precedence and exclusion constraints. A question is how this skew should be selected in an optimal way. In global scheduling of a task chain, the end-to-end delay is of importance. To guarantee a feasible or optimal global schedule is a work with high complexity. The choice of control period should be based on the dynamics of the closed loop specifications. The closed loop is in turn designed to reject disturbances or follow a trajectory with specified dynamics. The common belief that a faster sampling improves performance is only true for some systems. When it comes to implementation, the processing capacity limits the rate of how often the task chain can be invoked. A set of control tasks can be optimized with respect to control periods. A sum of weighted performance indices is frequently used to form a global performance index. The problem is that other tasks which are unsuitable to be incorporated into the global performance index also co-exist with the controllers. One idea is to model other tasks in the system as disturbances. A different strategy when choosing control periods is to sample the system in a non-uniform way, i.e. multirate or event triggered control and actuation. The main reason is to decrease the total utilization of a common resource by assigning sampling periods according to the dynamics of the processes. The theory for multirate systems is mature, but the theory for event triggered sampling is still under development. Certain periods should be avoided in the control design, but if the rules of thumb in e.g. [Åström and Wittenmark, 1997] are followed, there should be no problem with observability or reachability. Discrete controllers without a continuous time counterpart, i.e. with discrete poles in the origin, should be avoided; one drawback with these is the sensitivity of small control periods. From a control point of view there are two types of jitter: control period jitter and control delay jitter. The former type can be divided into clock synchronized or drifting jitter. Drifting jitter can be avoided by careful coding of the invocation of a new task instance. A zero-order-hold circuit is usually time triggered and exhibits a small amount of clock synchronized period jitter. The time triggered sensor propels the actions in the task chain where delay jitter arises. One result from scheduling is often jitter in the response time of a task. This can be seen as a clock synchronized period jitter or in some situations also as delay jitter. Jitter is popularly modelled by a stochastic process governed by a frequency distribution, although jitter does not have to be unknown or unintentional. Even small jitter (which is a fraction of a period or delay) can cause vacant sampling and sample rejection, which is more severe to binary data and control logic, than to real valued data. The distorting impact of jitter on a dynamic system seems to be very modest in general, but it depends of course on the dynamics of the system and the nature of the jitter. Optimal controllers have been developed to stabilize systems with jitter. A special case of
48
jitter arises in loops with asynchronously related periods. In static scheduling, jitter can be completely avoided if the task periods can be altered. In priority based scheduling, jitter is decreased by increasing the priority or by decreasing the time window for execution. A problem with high utilization in priority based scheduling is the corresponding long response time and severe jitter. By setting an allowable maximum utilization ceiling to a low level a lot of jitter in on-line scheduling can be avoided. A difficulty with analysing transient errors is the uncertainties in the assumptions of the current state of the process, the future disturbances and setpoints, the probability of a failure etc. It is also difficult to set up limits in the state space which a process must stay within not to exhibit a dynamic failure. Those limits are time varying. The notion of deadline is here used in a different way than in scheduling: it denotes the maximum delay before a dynamic failure, from the applications point of view, instead of being an attribute of the task model. A strict time triggered scheduling approach is a good solution to practically all timing problems. One minor drawback is a possible increase in delay but the major drawback is the loss of scalability. A scalable time triggered system would be a winning combination. Some on-line scheduling algorithms seem to be suitable for integration with control applications: 1) In section 3.3.3 the approximate but elegant elastic task model can prove to be effective for a practical application, where e.g. a quadratic performance index can be linearized for a limited range of control periods. 2) Control algorithms with time varying execution time fit the notion of imprecise computation by [Liu et al., 1991] in section 3.2.2. 3) The slack stealing algorithm seems to be suitable to tailor to control implementations, [Thuel and Lehoczky, 1994] in section 3.2.2.
5 Acknowledgement The author wishes to thank Martin Törngren and Jan Wikander for guiding the work and for providing useful comments to the manuscript of the report. This work was carried out within the DICOSMOS project, see e.g. [Törngren and Sanfridson (Ed.), 1998], funded jointly by NUTEK, the Swedish national board for industrial and technological development, and VTD, Volvo technological development.
49
6 References [Andreff, 1994] N. Andreff, “Robustness to jitter in real-time systems”, ISSN 0280-5316, ISRN LUTFD2/TFRT--5507--SE, June 1994. [Araki and Yamamoto, 1986] M. Araki and K. Yamamoto, “Multivariable Multirate Sampled-Data Systems: State-Space Description, Transfer Characteristics, and Nyquist Criterion2, IEEE Trans. on Automatic Control, vol 31, no 2, February 1986. [Audsley et al., 1993] N. Audsley, K. Tindell, A. Burns, “The End of The Line for Static Cyclic Scheduling?”, IEEE 1993, 1993. [Beccari et al., 1999] G. Beccari, S. Caselli, M. Reggiani, F. Zanichelli, “Rate Modulation of Soft RealTime Tasks in Automomous Robot Control Systems”, Proceedings of the 11th Euromicro conference on real-time systems, June 1999. [Belle Isle, 1975] Belle Isle, “Stability of Systems with Nonlinear Feedback Through Randomly TimeVarying Delays”, IEEE Trans. on Aut. Control, Vol. 20, No. 1, February 1975. [Berg et al., 1988] M. Berg, N. Amit and J.D Powell, “Multirate digital Control System Design”, IEEE Trans. on Automatic Control, vol. 33. no. 12, December 1988. [Bettati and Liu, 1992] R. Bettati and Jane W.-S. Liu, “End-to-End Scheduling to Meet Deadlines in Distributed Control”, IEEE, 1992. [Binns, 1997] P. Binns, “Incremental Rate Monotonic Scheduling for Improved Control System Performance”, IEEE Proceed. 3rd RT Techn. and Appl. Symp., p. 80-90, 1997. [Bridal, 1997] O. Bridal, “Issues in the Design and Analysis of Dependable Distributed Real-Time Systems”, PhD thesis, Computer Engineering, Chalmers, Technical report no. 297, ISBN 91-7197-433-4, January 1997. [Buttazzo et al., 1998] G C Buttazzo, G Lipari, L Abeni, “Elastic Task Model For Adaptive Rate Control”, IEEE 1998, 1998. [Cervin, 1999] A. Cervin, “Improved Scheduling of Control Tasks”, Proceedings of the 11th Euromicro conference on real-time systems, June 1999. [Chan and Özgüner, 1995] H. Chan, Ü Özgüner, “Optimal Control of Systems over a Communication Network with Queues Via a Jump System Approach”, IEEE 1995, p. 1148-1153, 1995. [Chen, 1984] C.-T. Chen, “Linear System - Theory and design”, ISBN 0-03-060289-0, 1984. [Choi et al., 1997] S. Choi, A. K. Agrawala, L. Shi, “Intelligent Temporal Control“, Proceedings of Intelligent Information Systems, ISS ‘97, p514-522, 1997. [Choi, 1998] S. Choi, “End-to-end Optimization Technique in Heterogeneous Distributed Real-Time Systems”, ACM SIGPLAN Workshop on languages, compilers and tools for embedded systems, Montreal, June 1998. [Davidson, 1973] C. Davidson, “Random sampling and random dealys in optimal control”, PhD thesis, TRITA-MAT-1973-8,diss. 21429, May 1973. [Eker, 1999] J. Eker, “Flexible Embedded Control Systems - Design and implementation”, PhD Thesis, Dep. of Automatic Control, Lund, ISSN 0280-5316, ISRN LUTFD2/TFRT-1055-SE, December 1999. [Gerber et al., 1995] R. Gerber, S. Hong, M. Saksena, “Guaranteeing End-to-End Timing Constraints by Calibrating Intermediate Processes”, IEEE Transaction on Software Eng. Vol 21, July 1995, p. 579592, 1995. [Ghosh et al., 1995] S. Ghosh, R. Melhem, D. Mosse, “Enhancing Real-Time Schedules to Tolerate Transient Faults”, IEEE Proceed. 16th RTSS, 1995. [Halevi and Ray, 1988] Y. Halevi, A. Ray, “Integrated Communication and Control Systems - Part 1 and 2”, Journal of Dynamic Systems, Measurement, and Control, Vol. 110, p. 367-381, December 1988. [IEEE dictionary, 1992] “The new IEEE standard dictionary of electrical and electronics terms”, IEEE standard 100-1992. 5th edition. [Jalote, 1994] P. Jalote, “Fault tolerance in distributed systems”, ISBN 0-13-301-367-7, Prentice-Hall, 1994. [Jensen, 1992] D. Jensen, “Asynchronous Decentralized Realtime Computer Systems”, NATO Advanced study Institute on Real-Time Computing, Sint Maarten, 21p, October 1992. 50
[Kang et al., 1997] D.-I. Kang, R. Gerber, M. Saksena, “Performance-based Design of Distributed RealTime Systems”, IEEE 1997. [Karbassi and Bell, 1996] S.M. Karbassi and D.J. Bell, “The effect of sampling period on the behaviour of systems incorporating state feedback”, Int. J. of Control, 1996, Vol. 63, no. 2, p. 351-364, 1996. [Kim and Shin, 1997] B.K. Kim, K.G. Shin, “Task Assignment and Scheduling for Open Real-Time Control Systems”, Proc. ACC’97, p.3664-3668, June 1997. [Kim et al., 1994] H. Kim, K.G. Shin, “On the Maximum Feedback Delay in a Linear/Nonlinear Control System with Input Disturbances Caused by Controller-Computer Failures”, Trans. on Control Syst. Techn., vol. 2, no. 2, June 1994. [Kim, 1998] B. K. Kim, “Task Scheduling with Feedback Latency for Real-Time Control Systems”, IEEE Proceed. 5th Intern. Conf. on RT Computing Systems and Appl., p. 37-41, 1998. [Klein et al., 1993] M. Klein, T. Ralya, B. Pollak, R. Obenza, M. Harbour, “A Pracitioner’s Handbook for Analysis for Real-Time Systems - Guide to Rate Monotonic analysis for Real-Time Systems”, Kluwer Academic Publishers, ISBN 0-7923-9361-9, 1993. [Kopetz and Grünsteidl, 1992] H. Kopetz, G. Grünsteidl, “TTP - A Time Triggered Protocol for Automotive Applications”, Research Report nr 16/1992 Institut für Technische Informatik, Wien, October 1992 (Paper with same title publised 1993 in FTCS-23). [Kopetz, Nossal et al., 1997] H. Kopetz, R. Nossal, R. Hexel, A. Krüger, D. Millinger, R. Pallierer, C. Temple, M. Krug, “Mode Handling in the Time Triggered Architecture”, IFAC DCCS’97, Seoul, Korea, 1997. [Krishna and Shin, 1997] C. M. Krishna, K. G. Shin, “Real-Time Systems”, McGraw-Hill, ISBN 0-07114243-6, 1997. [Lennartson, 1985] B. Lennartson, “On the Choice of Controller and Sampling Period for Typical Process Models”, Teknisk rapport 85:11, Inst. f. Reglerteknik, Chalmers, 1985. [Lennartson, 1987] B. Lennartson, “On the Choice of Controller and Sampling Period for Linear Stochastic Control”, Preprints 10th IFAC World Congress, Munic, 1987. [Lin and Herkert, 1996] K.-J. Lin and A. Herkert, “Jitter Control in Time-Triggered Systems”, Proc. of the 29th IEEE annual Hawaii international conference on system sciences, 1996. [Liu and Layland, 1973] C. L. Liu, J. W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment”, Journal of the ACM 20(1):46-61, 1973. [Liu et al., 1991] J. W.-S. Liu, K.-J. Lin, W.-K. Shih, A. Chuang-shi Yu, “Algorithms for Scheduling Imprecise Computations”, IEEE computer, p. 58-68, 1991. [Lönn and Axelsson, 1999] H. Lönn, J. Axelsson, “A Comparison of Fixed-Priority and Static Cyclic Scheduling for Distributed Automotive Control Applications”, Proceedings of the 11th Euromicro conference on real-time systems, June 1999. [MacGregor, 1976] J. F. MacGregor, “Optimal Choice of the Sampling Interval for Discrete Process Control”, Technometrics, vol. 18 no. 2, May 1976. [Melzer and Kuo., 1971] S.M. Melzer and B.C. Kuo, “Sampling Period Sensitivity of the Optimal Sampled Data Linear Regulator”, Automatica, vol. 7, p. 367-370, 1971. [Nilsson, 1998] J. Nilsson, “Real-Time Control Systems with Delays”, PhD Thesis Dept. of Automatic Control Lund, ISSN 0280-5316, ISRN LUTFD2/TFRT--1049--SE, 1998. [Ray and Halevi, 1988] A. Ray, Y. Halevi, “Finite-dimensional modelling of network-induced delays for real-time control systems”, Proc. of the American Control Conference, June 1988. [Ray, 1989] A. Ray, “Introduction to Networking for Integrated Control Systems”, IEEE 1989, 1989. [Ray, 1994] A. Ray, “Output Feedback Control Under Randomly Varying Distributed Delays”, Journal of Guidance, Control and Dynamics, Vol 17, No 4, July-August 1994. [Redell, 1998] O. Redell, “Modelling of Distributed Real-Time Control Systems - An Approach for Design and Early Analysis”, Licentiate Thesis, Mechatronics lab, TRITA-MMK 1998:9 ISSN 14001179 ISRN KTH/MMK--98/9--SE, 1998. [Rostamzadeh et al., 1995] B. Rostamzadeh, H. Lönn, R. Snedsböl, J. Torin, “DACAPO: A Distributed Computer Architecture for Safety-Critical Control Applications”, Technical Report No. 217L, ISBN 91-7197-227-7, 1995. [Ryu et al., 1997] M. Ryu, S. Hong, M. Saksena, “Streamlining Real-Time Controller Design: From Per51
formance Specifications to End-to-end Timing Constraints”, IEEE Proceed. 3rd RT Techn. and Appl. Symp., p.91-99, 1997. [Salt et al., 1993] J. Salt, J. Tornero and P. Albertos, “Modelling of non-conventional sampled data systems”, 2nd IEEE conf on control applications, Vancouver, September 1993. [Sandström, 1999] K. Sandström, “Modelling and Scheduling of Control Systems”, Licentiate Thesis, TRITA-MMK 1999:5, ISSN 1400-1179, ISRN KTH/MMK/R--99/5--SE, 1999. [Seto et al., 1996] D. Seto, J. P. Lehoczky, L. Sha, K. G. Shin, “On Task Schedulability in Real-Time Control Systems”, Proceedings RTSS 96, p.13-21, 1996. [Seto, Krogh and Sha, 1998] D.Seto, B.H. Krogh, L. Sha, and A. Chutinan, “Dynamic Control System Upgrade Using the Simplex Architecture”, IEEE Control Systems, August 1998. [Seto, Lehoczky and Sha, 1998] D. Seto, J. P. Lehoczky, L. Sha, “Task Period Selection and Schedulability in Real-Time Systems”, Proceedings of the 19th IEEE Real-Time Systems Symposium, pp. 188 198, 1998. [Sha et al., 1995] L. Sha, R. Rajkumar, M. Gagliardi, “A Software Architecture for Dependable and Evolvable Industrial Computing Systems”, Technical Report CMU/SEI-95-TR-005, ESC-TR-95-005, July 1995. [Shin and Cui, 1995] K. G. Shin, X Cui, “Computing Time Delay and Its Effects on Real-Time Control Systems”, IEEE Trans on Contr Syst Techn, Vol 3 no 2, June 1995. [Shin and Kim, 1992] K.G Shin, H. Kim, “Derivation and Application of Hard Deadlines for Real-Time Control Systems”, Trans. on systems, man and cybernetics, vol. 22, no. 6, November 1992. [Shin and Krishna, 1985] K. G. Shin, C. M. Krishna, Y.-H. Lee, “A Unified Method for Evaluating RealTime Computer Controllers and Its Application”, IEEE Trans on Autom Contr, Vol AC-30 No 4, April 1985. [Shin and Meissner, 1999] K. G. Shin, C. L. Meissner, “Adaptation and Graceful Degradation of Control System Performance by Task Reallocation and Period Adjustment”, Proceedings of the 11th Euromicro conference on real-time systems, June 1999. [Stankovic et al., 1996] J. Stankovic et al., “Strategic Directions in Real-Time and Embedded Systems”, ACM Computing Surveys, Vol. 28, No. 4, December 1996. [Stankovic, 1996] J. Stankovic et al, “Strategic Directions in Real-Time and Embedded Systems”, ACM Computing Surveys, Vol. 28, No. 4, p. 751-763, December 1996. [Thuel and Lehoczky, 1994] S. R. Thuel, J.P Lehoczky, “Algorithms for Schduling Hard Aperiodic Tasks in Fixed-Priority Systems using Slack Stealing”, Proceedings of RTTS 1994, p.22-33, 1994. [Tindell et al., 1992] K. Tindell, A. Burns, A. J. Wellings, “Allocating Hard Real-Time Tasks: An NPHard Problem Made Easy”, The Journal of Real-Time Systems, No. 4, p. 145-165, 1992. [Törngren and Sanfridson (Ed.), 1998] M. Törngren, M. Sanfridson, (Editors) “Research problem formulations in the DICOSMOS project”, Technical Report, TRITA-MMK 1998:20, ISSN 1400-1179, ISRN KTH/MMK--98/20--SE, Dep. of Machine Design, KTH, 1998. [Törngren and Wikander, 1996] M. Törngren, J. Wikander, “A Decentralization Methodology for RealTime Control Applications”, Contr. Eng. Practice, Vol. 4, No. 2, p. 219-228, 1996. [Törngren, 1995] M. Törngren, “Modelling and Design of Distributed Real-Time Control Applications”, Doctoral Thesis at DAMEK, KTH, TRITA-MMK 1995:7, ISSN 1400-1179, ISRN KTH/MMK--95/ 7--SE, 1995. [Törngren, 1998] M. Törngren, “Fundamentals of Implementing Real-Time Control Applications in Distributed Computer Systems”, Real-Time Systems 14, p219-250, Kluwer Academic Publishers, 1998. [Voulgaris, 1994] P. Voulgaris, “Control of Asynchronous Sampled Data Systems”, IEEE Trans. on Automatic Control, vol. 39, no. 7, July 1994. [Walsh et al., 1999] G.C. Walsh, Y. Hong and L. Bushnell, “Stability analysis of networked control systems”, American Control Conference, 1999. Proceedings of the 1999, Volume: 4 , p. 2876 -2880, 1999 [Wikander and Törngren, 1994] J. Wikander, M. Törngren, “Decentralized control systems for modular machines”, 20th Intern. Conf. on IECON, Vol. 3, p. 1490-1494, 1994. [Wittenmark et al., 1995] B. Wittenmark, J. Nilsson, M. Törngren, “Timing Problems in Real-Time Control Systems”, ACC 95, 1995. 52
[Wittenmark et al., 1998] B. Wittenmark, B. Bastian, J. Nilsson, “Analysis of Time Delays in Synchronous and Asynchronous Control Loops”, 37th CDC, Tampa, December 1998. [Zeng-Qi, 1981] S. Zeng-Qi, “Sampling Interval Selection and Feedback Reduction for the Control of Time Continuous Processes”, PhD Thesis no. 379, Chalmers, 1981. [Zhenyu et al., 1999] Y. Zhenyu, S. Huazhang and C. Zongji, “The Frequency-Domain Heterogeneous Control Mixer Module Method for Control Reconfiguration”, Proc. of the 1999 IEEE Intern. Conf. on Control Applications, Hawaii, August 1999. [Årzen, 1999] K.-E. Årzen, “A simple event-based PID controller”, 14th IFAC congress, Beijing, 1999. [Åström and Bernhardsson, 1999] K. J. Åström and B. Bernhardsson, “Comparison of periodic and event based sampling for first-order stochastic systems”, 14th IFAC congress, Beijing, 1999. [Åström and Wittenmark, 1997] K.-J. Åström, B. Wittenmark, “Computer Controlled Systems - Theory and Design”, 3rd edition, ISBN 0-13-314899-8, 1997.
53