Design and Development Methodology for Resilient ... - IEEE Xplore

14 downloads 160104 Views 235KB Size Report
The University of Texas at Austin. {honguk, jlyi ... of reliable control software for cyber-physical systems ... software development tool that supports a UML-like.
The 28th International Conference on Distributed Computing Systems Workshops

Design and Development Methodology for Resilient Cyber-Physical Systems Honguk Woo1, Jianliang Yi 1, James C. Browne1, Aloysius K. Mok1, Ella Atkins2, and Fei Xie3 1

Dept. of Computer Sciences The University of Texas at Austin {honguk, jlyi, browne, mok}@cs.utexas.edu

2

Dept. of Aerospace Engineering University of Michigan [email protected]

Dept. of Computer Science Portland State University [email protected]

resource states and (iii) incorporates a monitoring and analysis subsystem which monitors properties of the system to determine the state of the system, insures that the system is in the operating mode appropriate for its state including switching controllers when system state has left normal mode. b) A capability for verifying the correctness of the safe mode control system and verifying the correctness of the normal mode control system and the monitoring and analysis system as far as is possible.

Abstract Mission-critical cyber-physical systems must be resilient to all classes of failures, both hardware and software components. Failures affecting a system’s ability to accurately control its physical actions are of special concern, requiring a meta-level monitoring and reaction ability to enable high-performance nominal and safe post-failure operation. This paper addresses these challenges by unifying formal software engineering with a suite of feedback control laws and efficient resource monitoring within a comprehensive design and development methodology.

This research is a collaborative effort where the expertise of researchers in real-time systems (Mok, Woo and Yi), software engineering and verification (Xie and Browne) and control systems (Atkins) are combined. The approach is to demonstrate the methodology on a simple control system which serves as a minimal but adequate testbed for the methodology. We have used the class room demonstration feedback control system, TableSat [6] as the basis for our demonstration experiment. The two control systems to be used are the normal mode control system which is distributed for classroom use of TableSat and a safe mode controller which will enable TableSat to remain functional with a degraded sensor configuration.

1. Overview and Objectives This paper combines principles of feedback control and stability with methods for formal software engineering to define a protocol for the design and implementation of reliable control software for cyber-physical systems which must be resilient in the presence of resource failures and/or variable resource configurations. A common structure for cyber-physical systems is a feedback control loop which uses sensors to monitor the state of the system and analysis of the system state to determine settings for actuators. Feedback control has proven to be a very successful tool for ensuring correct behavior of complex systems in the presence of uncertainty. It is commonly the case that feedback control loops assume a stable configuration of sensors and actuators. This research deals specifically with the case where the feedback loop admits the possibility of changes among and/or failures of resources, but implements appropriate reactions to ensure continued operation. Nonlinear control theory has an explicit concept of a stability envelope, informally defined as a subset of system states around the operating conditions within which the system can recover from errors. The goal of feedback control is to keep the system within its stability envelope. This paper formulates and instantiates a design and development methodology for enhancing the resilience of embedded control software for cyber-physical systems.

2. Methodology Specifications for Failure Modes, Controller Configuration and Monitor Configuration

System Architecture and Property Specification Implementation with Objectbench

C/C++ Program for Multiple Controller TableSat Code

Code Generation with CodeGenesis

Test on TableSat with QNX

Verified, Tested Multiple Controller TableSat

System Model in xUML Testing with Objectbench, Verification with ObjectCheck, and Monitoring with ResCheck Tested and Verified System Model

Production Control System

Figure 1. Methodology as Instantiated for the Demonstration and Evaluation

Figure 1 illustrates the methodology as instantiated using TableSat as the testbed, xUML eXecutable (Unified Modeling Language [1]) as implemented by the Objectbench development environment [2] as the system modeling language, ObjectCheck [3] as the verification system and ResCheck [7] to implement the monitoring and analysis subsystem. The rectangles are artifacts. The arcs are labeled with processes or

The conceptual foundations for the design methodology to be formulated, instantiated and demonstrated are: a) A software architecture implementing feedback control which: (i) partitions core control from other system functionality, (ii) implements multiple controllers corresponding to different system

1545-0678/08 $25.00 © 2008 IEEE DOI 10.1109/ICDCS.Workshops.2008.62

3

525

tools supporting the transitions among artifacts. Each artifact and step is sketched following.

checker [5]. COSPAN accepts them as inputs and checks whether the property is valid in the model. Tested and Verified System Model – This system model is now ready to be translated to executable C/C++ code so that it can be installed and executed on the QNX operating system on the TableSat hardware. Code Generation (with CodeGenesis [4]) – The CodeGenesis compiler of Objectbench translates the xUML model to C/C++ which can then be interfaced to the QNX operating system for execution on TableSat. C/C++ Code for Multiple TableSat Controllers – The C/C++ code for the TableSat system generated by CodeGenesis is manually interfaced to the QNX operating system, the host OS for TableSat. Verified, Tested Multiple Controller TableSat – The tested and verified code is installed on TableSat. Experiments are conducted with manual disabling of all other sensors except the gyro to simulate a failure and cause the transition to the safe mode controller.

Specifications for Failure Modes, Controller Configuration and Monitor Configuration – These specifications are manually derived through classical systems and control engineering. Architecture and Property Derivation – The system is mapped to an architecture in which the software is partitioned into core functionalities and none-core functionalities. The goal is to design a core segment of software which is sufficiently compact so as to be amenable to formal analysis both by functional verifiers and runtime monitoring methods. The modelbased approach concentrates on the development of the core segment. The safety and liveness properties sufficient to guarantee resilience operation are manually derived in the context of the architecture. System Architecture – For the TableSat case study there are two controllers, a normal mode controller which is from the TableSat distribution and a safe mode controller which enables TableSat to continue operation with a degraded sensor configuration. Implementation with Objectbench – The executable model in xUML for the core segment is manually developed by using Objectbench. Objectbench is a software development tool that supports a UML-like Object-Oriented Analysis method (xUML) [1] and generates executable and compilable analysis models. System Model in xUML – The system model is a complete implementation of the functionality of the system. The representation is in xUML except for the implementation of the control algorithm numerics which are implemented as C functions invoked from the xUML model. The monitoring and analysis subsystem is implemented following the ResCheck process. An environment (also in xUML) which simulates the inputs from the sensors and the failure modes closes the system. (It would also be possible to implement only the core functionality in the verifiable representation and embed it in the non-core software.)

3. TableSat The University of Michigan’s TableSat platform [6] is a one degree-of-freedom “Tabletop” satellite that emulates the dynamics, sensing, and actuation capabilities required for spacecraft attitude control. TableSat is driven by two “computer fan” thrusters commanding clockwise and counter-clockwise torques, respectively, and experiences extremely low friction on its central pivot point. TableSat contains a highprecision rate gyro to measure angular velocity along with a three-axis magnetometer and set of four core sun sensors (CSS) to measure pointing direction. An onboard Diamond Systems Prometheus PC/104 computer running the QNX real-time operating system communicates to a ground station via a wireless 802.11b interface. The computer interfaces to sensors through 16-bit analog-to-digital converters and to actuators through amplified 16-bit digital-to-analog channels. TableSat exhibits nonlinear dynamics, including offaxis wobble given certain actuation magnitudes and switching profiles. TableSat provides a variety of practical challenges in hardware and software for control engineers with limited experience using real physical systems. In its current lab environment, magnetometer and CSS sensor calibrations are nonlinear and dependent on location and time of day (e.g., proximity to the ferrous building frame, lighting conditions). Control system designs have ranged from rate control based on gyro readings to pointing with single or multiple sensor measurements. TableSat onboard software is composed of the following four threads, each of which executes as a periodic real-time task of constant frequency:

Testing with Objectbench and Analysis with ResCheck. – The system model is tested using the simulator which is provided with the Objectbench xUML system. The tests include simulation of sensor failures and transitions to safe model controllers. Property verification with ObjectCheck – The properties stated informally in the architecture derivation step are mapped into temporal logic and verified on the model. In ObjectCheck, designers of the system use the Property Specification Interface and xUML Visual Modeler to specify the properties of the system model in xUML. An xUML-to-S/R translator converts a property and the system model into S/R [5], the input formal language of the COSPAN model

526

• • • •

State Estimator thread that reads sensors and estimates the current state of TableSat; Controller thread that applies the specified control law to compute output torque; Actuator thread that outputs computer fan voltages to achieve the commanded control torque; Communication thread that supports data/command transmission from/to an external client program.

shutting down the single light source or by adding a second “competing” light source. Although we have identified an ideal lab location at which our magnetometer is calibrated, we can similarly induce magnetometer failures by carrying TableSat through the lab during testing. A magnetometer-based control law destabilizes when magnetic North and a building beam exert approximately equal magnetic influences. Pointing direction is marginally stable but incorrect when TableSat further approaches a ferrous building column due to the dominance of the structure’s local magnetic field but with a less-localized directional signal.

“Safe Core” Control System - Due to the atmospherefree spacecraft environment and the low friction experienced by TableSat, uncontrolled actuation can lead to undesirable high-speed rotations that impart dangerous forces to system components. For space operations, damping rotational motion to zero, or nearzero, is viewed as a capability all spacecraft must possess at all times, even after being “safed” due to other exception(s). We have implemented a core (safe) controller for TableSat that relies only on its rate gyro to stabilize TableSat motion should other sensors or controllers be compromised. Rate control is a straightforward algorithm. A simple proportional controller approximately achieves commanded rates; augmentation with an integral term drives steady-state rate error to zero and has been demonstrated effective given TableSat’s relatively high thread execution rates and small external disturbance magnitudes. Although subject to bias thus typically small but non-zero steadystate drift, rate control is not difficult to prove correct so long as the rate gyro and computer fans function properly, a simplifying assumption we make to constrain the scope of our core controller for this work. Because it uses only rate gyro data, the proportionalintegral rate control law deployed as the TableSat safe core control law is impervious to changes in lighting conditions and magnetic field disturbances.

Regardless of the pointing angle data source, TableSat applies baseline proportional-derivative and proportional-integral-derivative controllers over pointing direction and angular rate to achieve and maintain pointing commands. For this project, it is sufficient to command TableSat to achieve a single reference attitude or constant angular rate (likely zero for the safe controller) since we are studying resource (sensor) failure rather than logic failure. However, as an analogue to spacecraft science missions, TableSat will ultimately be extended to execute uploaded pointing command sequences of specified durations, enabling more complex analysis of the long-term interaction of safe core and risky control laws.

4. Software Architecture, Verification and Resource Analysis and Monitoring The xUML model of TableSat was derived by refactoring the C-implemented software into logical components, extending the component set to include multiple controller components and implementing these components as classes in xUML. Because TableSat is a demonstration-scale system, almost all of the system is core functionality. The xUML model of TableSat developed in Objectbench specifically includes the class model and the state model. The class model contains: • 4 classes for TableSat threads: TS_ESTIMATOR,

Augmented (Risky) Control System - Rate control is a safe backup but cannot accurately point TableSat in any particular direction. Under normal operating conditions, TableSat incorporates inertial attitude measurements in its feedback control law. Three inertial attitude sensing strategies are possible, all relying on the gyro for angular rate estimates: 1) Magnetometer-based (North-referenced) pointing, 2) Sun sensor (CSS) pointing, and 3) Pointing over a fused estimate of magnetic field and light sensor measurements. The initial implemented controllers [6] relied strictly on the magnetometer for pointing information, but CSS-based control is more capable in TableSat’s University of Michigan locale due to ferrous building structure disturbance of Earth’s magnetic field.

TS_CONTROLLER, TS_ACTUATOR, TS_COMMUNICATION •

7 classes for global data being shared by the above threads: GD_ESTIMATOR, GD_STATEESTIMATOR, GD_CONTROLLER, GD_ACTUATOR, GD_FANDATA, GD_SENSORREADINGS, GD_STATUS

A scheduler class with periodic time intervals for TableSat threads: SCHEDULER • A lock interface class to the global data being shared by the threads: GD_LOCK_INTERFACE The behavior of a class over time is formalized in a state machine where a transition between states is triggered by an event. An event can be generated within either an inter-class or intra-class relation. Figure 2 is a class diagram of the xUML TableSat system. •

Failure of attitude sensors is the primary reason the risky controller will malfunction in our experiments. To conduct controlled, repeatable tests, we induce sensor failures. The CSS fails in two ways: by

527

SCHEDULER_SEQ (SCSQ)

GD_STATEESTIMATE (GSE) R1

TS_ESTIMATOR (TES)

R2 R3 R4

GD_LOCK_MANAGER (GDM) TS_CONTROLLER (TCN)

R5 R6

TS_ACTUATOR (TAC)

R7

GD_SENSORREADINGS (GSR)

the related C code. The ResCheck system is also used as the framework for checking sensor and actuator states.

GD_ESTIMATOR (GES) GD_CONTROLLER (GCN)

5. Conclusions and Future Work

GD_ACTUATOR (GAC)

A common requirement across cyber-physical systems is to recognize changes in physical system state during execution and configure the control software appropriately. The hybrid systems community commonly approaches this problem “bottom up” from dynamic system models through discrete mode (automata) analysis, often leaving the executable code as the final step, making analysis and verification of the software difficult. This paper approaches the hybrid software-physical system problem “top-down”, formulating a software architecture in which to embed implementations of the feedback control laws and implementing and verifying software for a cyberphysical system that could experience failures in its sensors or actuators. One remaining task for future work is to formally unify mode-based (or ultimately adaptive) control laws with software model generation and analysis and verification tools such as those described in this paper.

GD_FANDATA (GFD) GD_STATUS (GST)

Each data object has its manager for emulating lock/unlock requests from TS threads.

Figure 2. TableSat Class Diagram

The following are properties which have been verified on an early version of the resilience-extended TableSat. • Key control states are reachable. This type of property ensures that the key control states of the system are reachable. They are in the form of NEVER(Control_State), which indicates that the referred state never appears in any system execution. (Control_State can be generalized to any predicate over system states.) These properties should not hold on the system, indicating that Control_State is reachable in some system execution. • Bad states are unreachable. This type of property is ensures that the control core stays in its predefined safety envelop through checking that certain bad states are not reachable in any system execution. They have the form Never(Unreachable_State). (Unreachable_State can be generalized to any predicate over system states.) These properties should hold on the system, indicating these states are unreachable. • Scheduling fairness. This type of property ensures tasks of the core are fairly scheduled. They are in the form of Repeatedly(Task_Execution_Entry_Point). These properties should that they should hold on the system.

6. Acknowledgement This research was supported in part by NSF Grant 0613665 to the University of Texas at Austin, Grant 0650049 to the University of Michigan and Grant 0613930 to Portland State University. We acknowledge QNX Software Systems for support through opensource and educational licensing programs.

7. References [1] S. J. Mellor and M. J. Balcer, Executable UML: A Foundation for Model-Driven Architecture, Addison-Wesley, New York, 2002. [2] SES: Objectbench User Reference Manual, SES, 1996. [3] F. Xie, V. Levin and J. C. Browne, ObjectCheck: A Model Checking Tool for Executable Object-Oriented Software System Designs, Proc. of FASE, 2002. [4] SES: Code Genesis Manual, SES, 1996 [5] R. H. Hardin, Z. Har'El and R. P. Kurshan, COSPAN. Proc. of Computer Aided Verification, 1996 [6] M. F. Vess, System Modeling and Controller Design for a Single Degree of Freedom Spacecraft Simulator, Master’s Thesis, Aerospace Engineering, Univ. of Maryland, 2005. [7] J. Yi, H. Woo, J. C. Browne, A. K. Mok, F. Xie, E. Atkins and C-G. Lee, Incorporating Resource Safety Verification to Executable Model-based Development for Embedded Systems, Proc of RTAS, 2008.

In a cyber-physical system, resource bound properties are often considered as important as functional properties for system correctness. Traditionally, critical resource properties, e.g., WCET, can be statically analyzed. However, during the execution of a cyberphysical system, the environment settings might change and the system must be reconfigured. In general, capturing all the situations in a dynamic environment by a single static model would lead to insufficient accuracy. “pure” runtime monitoring (that monitors executions without any help from static analysis) can be used in this case to enforce resource properties, but often at the price of significant overhead and lastmoment possible problem detection. The goal of the ResCheck [7] is to establish a resource monitoring evaluation mechanism through static analysis of the system model, which contains the state machines and

528

Suggest Documents