The Fifth Workshop on Digital Fluid Power, October 24 - 25, 2012, Tampere, Finland
SIMULATIONS WITH FAULT-TOLERANT CONTROLLER SOFTWARE OF A DIGITAL VALVE Mikko Huova, Miikka Ketonen, Petr Alexeev*, Pontus Boström*, Matti Linjama, Marina Waldén*, Kaisa Sere* Tampere University of Technology Department of Intelligent Hydraulics and Automation P.O.Box 589, 33101 Tampere, Finland E-mail:
[email protected] * Åbo Akademi University, Turku Centre for Computer Science Joukahaisenkatu 3-5, 20520, Turku, Finland E-mail:
[email protected]
ABSTRACT Model-based design approach is widely used to optimize controllers of distributed digital valve systems, leading to controllers simultaneously minimizing power consumption and tracking error. According to this approach an optimal controller (OC) is designed by comprehensive modelling of the relationship between parallel connected on/off-valves and the cylinder actuator in steady-state conditions. Unfortunately this relationship is highly non-linear. The design complexity of the OC is high. For this reason existing verification tools do not allow ensuring absence of design errors in this controller. These errors can cause incorrect valve control signal or result in absence of this signal leading to potentially hazardous physical processes in the hydraulic system. This requires designing the controller as a fault tolerant safety-critical hard real-time system. It is important to ensure that the system will work in a reasonable manner despite of possible design errors in the OC. This paper presents a method to resolve the problem by introducing an acceptance test (AT) to verify output signals of the OC. A safe controller (SC) – a simplified version of the OC for which the design can be verified is proposed. Control signal from SC is submitted to valves if AT detects incorrect output signals of OC. Simulation study shows that the SC gives good enough control performance even though the control resolution is not as good as with the OC. KEYWORDS:
Digital valve control, Real-time, Controller development, Safety critical control, Fault-tolerance, Preemptive scheduling
1. INTRODUCTION Fault tolerant operation of digital valve system is studied by Siivonen et al. in [1] and [2]. However, the analysis of the fault-tolerance has been so far restricted to the valve faults. This paper concentrates on fault-tolerant architecture of the controller of a digital
valve. The goal is to develop methods which allow fault tolerant operation of the controller even if a runtime error occurs in the controller or it overruns its deadline. Fault-tolerance is achieved by having two redundant controllers and careful design of the real-time scheduling scheme of the controller software. The idea is to utilize an optimal controller and a separate safe controller. The optimal controller is complex and results in the best possible control accuracy, but may lead to erroneous functionality because of possible programming defects. The erroneous functionality might include hazardous valve command signals or even hang ups of the algorithm. The other controller is simple and considered safe. It should always lead to at least suboptimal valve command signals and its correct operation can be verified by utilizing formal methods. 1.1. Digital valve control Figure 1 shows a typical distributed digital valve setup. It consists of four digital flow control units (DFCUs). A DFCU is comprised of several parallel connected on/offvalves, which are controlled to achieve a number of unique flow rates. There are different ways to control an actuator using the distributed digital valve. Following list presents examples of control methods that can be utilized: 1. Control of opening of the DFCUs proportional to the user input. 2. Control of flow rates of the DFCUs proportional to the required cylinder flow rates (calculated from velocity reference). 3. Control of steady-state velocity using model-based control (square root valve model and simultaneous opening of two DFCUs). 4. Control of steady-state velocity using model-based control (generalized valve model and simultaneous opening of 2-4 DFCUs).
Figure 1. Distributed digital valve system The first method is the most simplistic and does not allow for example active pressure compensation. Although simple, the method could be suited to some applications and still allows for example the pre-programming of the spool to match the area ratio of different cylinders. The second control method allows software based pressure compensation by utilizing pressure measurement. Opening of the DFCUs are set according to the calculated flow demand of both cylinder chambers to achieve the desired velocity. The control accuracy can be improved by calculating the resulting steady-state velocity for different DFCU opening combinations and by selecting the optimal one. The third
method relies on square root model of the on/off-valves to allow symbolic solution for the steady-state model of the system [3]. In order to improve the modelling accuracy, a generalized valve model may be utilized as in the fourth method. The generalized valve model allows the setting of the exponent (typically 0.4…0.6) according to measured pQ-data. As there is no symbolic solution to this valve-cylinder model, iterative methods are utilized to solve the steady-state velocity of the actuator for a number of different DFCU opening combinations. As all four DFCUs are under simultaneous control, the resolution is significantly improved against the three more simplistic methods [4], [5]. 1.2. Issues in development of safety critical controllers Controllers of digital hydraulics systems are implemented by software (firmware) of specialised computing devices, e.g., an embedded PC. Since the controller handles potentially dangerous physical processes in hydraulics, this controller has to be considered as safety-critical hard real-time system. For this reason and because of a high design complexity of a controller, modern design tools are used for the development of the controller’s software. Unfortunately verification process of complex controllers used in digital hydraulics becomes very time consuming if it is even possible at all. State-of-the art verification tools can effectively be used to check correctness of small and relatively simple components and ensure their reliability and safety. In practice it is therefore more efficient to ensure safety of digital hydraulics systems by establishing fault-tolerance to possible design faults in their software. Different software faults, specific for controllers and their influence on hydraulics, are described below. 1.2.1.
Incorrect execution results
If the result of the termination of the control software is wrong, the result is said to be incorrect or inconsistent, in particularly: Plus/minus infinities and not-numbers (NaNs). These results cannot be converted into valid output values of DFCUs (on/off valves). Imprecise results are correct numbers of the correct type, but applying these results to actuators of the digital hydraulics system is unsafe. For example, imprecise results may go beyond the permitted range of output values. 1.2.2.
Deadline misses
A deadline miss is the situation when the duration of the execution time of software module exceeds the specified time limit (deadline). The controller has to update output values of all DFCUs periodically with minimal timing jitter. Maximum allowed jitter can be defined in requirements by the deadline value. The behaviour of the hydraulics system can be unpredictable if controller will not keep this deadline.
1.2.3.
Run-time exceptions
An exception is a run-time error (e.g., division by zero, overflow and underflow) that occurred during execution of software and forces its abnormal termination. The obvious consequence of an exception in controller is unavailability to update output signals of DFCUs.
2. SAFE ARCHITECTURE OF A MULTI-THREAD CONTROL ALGORITHM 2.1. Overall description of the architecture A controller for digital hydraulic system can be considered as non-linear digital control system, which consists of: plant (hydraulics), sensors, actuators and computing device with software. A generalized architecture of the example of such control system used in this paper as a case study is presented in Figure 2.
Figure 2. Generalized architecture of the proposed control system. The sampling frequency of ADC is higher than the sampling frequency of on/off valves (DFCUs) output.
2.2. The approach of the safe architecture As mentioned in Subsection 1.2, the software of the optimal controller cannot be guaranteed to be completely reliable (error-free) because of its high complexity. In order to ensure safety of the system software design errors should be mitigated or masked. Effective way of masking such errors is a recovery blocks (RcB) structure [6, 7]. RcB provides fault-tolerance based on redundancy of software design. In the simplest form RcB consists of: Primary version of the software function which provides accurate results but can contain errors. For the case of digital hydraulics this primary version is represented by the optimal controller (OC) – an optimized but potentially unsafe version of the main control algorithm. One or more alternative versions of the same software function which provides results of the same purpose but may be with less accuracy. For the case of digital hydraulics this primary version is represented by the safe controller (SC) – a simplified version of OC. Acceptance test (AT) which allows analysing results of primary or alternative versions of software function to define if these results are correct. So, the output of AT is binary: “correct” and “not correct”. The general algorithm of RcB for the case of just one SC is the following: 1. Execute OC and perform AT on its results . Return of AT is “correct” otherwise proceed to the step 2. 2. Execute SC and perform AT on its results . Return is “correct” otherwise RcB fails.
if the result if the result of AT
For the case of digital hydraulics, the simplicity of SC allows ensuring its reliability confirmed it by various verification tools suitable for Simulink models, e.g. [9]. In particularly, we can assume that no fault, described in Subsection 1.2, can occur during the execution of SC. For this reason there is no need to execute AT on results of SC. Note also that it is assumed that the RcB newer fails in this approach. The disadvantage of this structure is low precision of that sometimes cannot prevent unwanted processes in the hydraulics, e.g., cavitation. However, ensures safety of the system. Execution results and are comparable, so the basic algorithm for the AT can be constructed as a comparison with a threshold of the absolute value of the subtraction and . The implementation of the RcB approach to the controller is presented in Figure 3. It is obvious that AT will require as input parameters both and .
Figure 3. Implementation of the recovery blocks approach for the controller. The algorithm of RcB requires OC to be executed, so the worst-case execution time (WCET) [8] of RcB strongly depends on WCET of OC. In practice a deadline miss of OC may force RcB to miss its deadline too. For this reason the approach of RcB should be adapted to meet real-time requirements. Modern real-time software is implemented by utilising pre-emptive scheduling schemes: the whole software is decomposed into several tasks with different priorities. Each task execution is called tasks instance or job. The event of start of a job called release. If high-priority job is released during the execution of low-priority job, this low-priority job suspends while high-priority job starts and executes until its termination. The low-priority job can continue its execution afterwards. This process is called pre-emption. If priorities of tasks do not change during execution, this scheduling scheme called static. The scheduling scheme is called feasible if all tasks terminate before their deadlines. The pre-emptive scheduling scheme can be applied for the RcB in order to prevent possible deadline misses of the OC. The task decomposition will be the following: lowpriority task for OC and high-priority task for SC and AT. This approach allows preempting OC in case of its deadline miss by SC and AT. By this reason controller will be able to submit correct output data to valves without missing deadline even in case of fault of OC. This tasks decomposition implies special precedence constraint: both OC and SC should be terminated before correspondent AT can be started. This constraint can be met by different solutions, for example by additional decomposition of high-priority task into the following two tasks: the first task for the SC and the second task for the AT. The second task needs to be released with some offset providing a time slot for SC and OC to ensure precedence constraint. For periodical task set the offset for the second task is not required because this task can perform AT for execution results of OC and SC from the previous period. For tasks with the same period and deadline, e.g. AT, OC and SC, precedence constraints can be ensured by arranging priorities by the following rule: higher priority means earlier execution. The proposed task decomposition is feasible for both periodical and non-periodical task sets. The controller has to acquire signals from sensors and to submit signals to valves periodically. This implies that its software should be decomposed into several periodical tasks with different but fixed priorities (static scheduling). In this case the whole software of the controller was modeled as multi-rate Simulik diagram where each rate corresponds to specific task.
There are different static scheduling schemes for periodical task sets. Most of these schemes do not take into account data transfers between tasks (define tasks as independent). Data transfers between tasks can be mitigated by using non-pre-emptive protocol (NPP) of data transfers. The detailed description of this protocol and its properties is presented in the Section 7.10 of [12]. Here two widely use scheduling schemes for periodical task sets are considered: (RM) [10] or deadline-monotonic (DM) [11]. The RM algorithm arranges priorities according the following rule: higher priorities are assigned to tasks with shortest periods. Let be the total number of tasks, and be the WCET and period of task number respectively. According to [10] RM scheduling is feasible if tasks deadlines are equal to their periods and the condition
1. >2
2. 3. 4. 5. 6.
=0
>0
2
0
0
,
,
>
The first condition is true if the average calculated velocities of the two cylinder sides differ more than the user-set threshold value. The second condition hold if the calculated velocity error of A-side differs more than two times the threshold value. Condition three is the same for B-side. Condition four holds if the safe controller outputs stop command while the optimal controller does not. Condition five holds if the output of the optimal controller results in significantly negative calculated velocity for either side while the safe controller results in positive velocity. Condition six equals condition five into opposite direction. The first condition is the basis of the acceptance test. Conditions two and three take care of situations, where the average velocity of the optimal controller is correct while the calculated velocities of both cylinder sides are far from the correct value. Condition four takes care of situation, where the optimal controller would continue to drive the system with small velocity when the system should be stopped. Conditions five and six take care of situations, where the calculated velocity of the optimal controller has wrong direction. It is worth to note that small velocity to wrong direction is considered acceptable as this occurs from time to time close to zero velocity.
4. SIMULATION STUDY ON DIGITAL VALVE CONTROL 4.1. Simulation model The system studied is presented in Figure 8. Simulation model is based on a boom mockup which is sized to model dynamics of a typical mobile boom. Big inertia of the system results in low natural frequency of 3-5 Hz when fully loaded.
Figure 8. Simulated system and its parameters 4.2. Simulations with optimal controller The optimal controller is presented thoroughly in [5] and it is utilized in this study with slightly altered controller parameters, which are listed in table 1. Table 1. Controller parameters Controller sample time 12 ms Input filter sample time Position controller gain 3 Velocity feed forward gain Start treshold 4 mm/s Stop treshold Minimum pressure 3 MPa Maximum pressure Nominal pressure difference 1.5 MPa Minimum pressure difference Corner frequency of load 8 rad/s Corner frequency of supply force filter pressure filter Force hysteresis 5 kN Power treshold Earn time 500 ms Mode switching delay Supply pressure rate 50 MPa/s Weight for pressure error Weight for switching 0.03 Weight for power loss Acceptance test velocity 20 mm/s Minimum time period of safe treshold controller
2 ms 0.74 3 mm/s 20 MPa 0.5 MPa 40 rad/s 100 W 120 ms 2e-16 4e-7 200 ms
Most of the controller parameters remain the same as in [5]. The valve block simulated is different to the valve block utilized in [5]. Therefore some parameters have been altered to achieve optimal control performance. Controller sample time is set according to the valve response time. Start and stop thresholds are increased slightly to avoid
repetitive starting and stopping especially with the safe controller algorithm. Supply pressure rate is increased to match the rate of the current supply system in the actual boom mock up. Cost function weight for power loss is decreased to improve low velocity control resolution. The only parameter that is utilized by the safe controller (apart from the common mode selection parameters) is the weight term for switching, which is slightly reduced to achieve better control resolution. Judging by the simulations, the switching weight of the safe controller needs to be multiplied by a factor of four to reach similar activity of the valves as with the optimal controller. The velocity threshold related to the acceptance test is set to 20 mm/s. It is defined as smallest value which does not result in discarding of the optimal controller output, when there is no actual fault in the operation. It is beneficial to require certain time period of correct operation of the optimal controller before the output of the optimal controller is utilized after a malfunction. In this case the parameter is set to 200 ms to avoid repetitive switching between the optimal and the safe controller. Three different position trajectories are tested. The references include single extending movement and single retracting movement, which last 1.25 s each. The trajectory is a fifth order polynomial and the amplitudes of the movements are 12 mm, 30 mm and 70 mm, resulting in maximum velocity references of 18 mm/s, 45 mm/s and 105 mm/s. 4.2.1.
Control performance of the optimal and the safe controller
Figures 9, 11 and 13 present simulated trajectories driven with the optimal controller. Medium size trajectory is presented, which maximum velocity is roughly half of the maximum velocity of the valve-cylinder system studied. The system is simulated with variable supply pressure and full mode selection logic. Therefore, the cylinder is driven in inflow-outflow mode to both moving directions: extending (IOe) and retracting (IOr) as well as in differential connection (De, Dr). Optimal controller utilizes crossflow in order to improve the control resolution, which can be seen as simultaneously active supply side and tank side DFCUs. Figures 10, 12 and 14 present the controllability utilizing the safe controller for comparison. Exactly same trajectory and loadings are driven as with the optimal controller. Figures 9 and 10 present comparison of the performance of the two controllers driving the system with loading A. Most significant difference can be noted at slow velocities, where the control resolution of the optimal controller is considerably more accurate. The coarse resolution of the safe controller yields oscillations when the motion is stopped. Even though the control performance of the safe controller is not as good as the optimal controller, a lot of the functionality is still achieved by utilizing such simple control scheme: velocity tracking is acceptable and chamber pressure tracking is as good as with the optimal controller.
Figure 9. Optimal controller, medium size trajectory, loading A.
Figure 10. Safe controller, medium size trajectory, loading A Figures 11 and 12 compare the two control schemes when driving restricting loading B. Although both controller algorithms generate small oscillations during stopping of the motion, the velocity control of the optimal controller is slightly smoother than the operation of the safe controller. The optimal controller utilizes cross flow for a short time period only, as the high supply pressure would lead to relatively big energy losses.
Figure 11. Optimal controller, medium size trajectory, loading B
Figure 12. Safe controller, medium size trajectory, loading B Figures 13 and 14 present the results with overrunning loading C. The difference between the performance of the two controllers is significant: the slow velocity resolution of the optimal controller is better because of crossflow and the velocity trajectory is also smoother resulting from the optimization of the steady-state velocity of the system.
Figure 13. Optimal controller, medium size trajectory, loading C
Figure 14. Safe controller, medium size trajectory, loading C 4.2.2.
Simulations with fault tolerant control architecture
Figure 15 presents a large trajectory driven with the optimal controller. Runtime error of the controller is simulated by virtually hanging up the operation of the controller at 1 s. The output of the controller is held from that point on. Both the output of the optimal controller and the output of the safe controller are presented. Fault detection is inactivated and the hazardous operation of the optimal controller results in fast extending motion until the cylinder end is reached.
Figure 15. Large trajectory, loading A. Fault detection inactivated. Figure 16 presents the same trajectory with the defect in the optimal controller. The operation of the optimal controller hangs up at 1 s and revives at 3 s. The output of the optimal controller is fed to the valves until the calculated velocity resulting from the outputs of the two controllers differ sufficiently, after which the output of the safe controller is fed to the valves. The calculated velocities of the two controllers are presented in the figure as well as the time period, when the faulty action of the optimal controller is detected and the output of the safe controller is utilized. As the operation of the optimal controller is again fully functioning at 3 s, the output of the safe controller is utilized until 3.2 s, which is because of the user set time period parameter. Figure 17 presents the small trajectory with the identical fault. The fault is detected when the output of the safe controller is zero while the output of the optimal controller is nonzero. Even if the faulty action of the optimal controller is not detected before the stopping of the motion, there is no big position error or other hazardous action. The only deterioration of the control performance can be seen as more abrupt stopping of the motion. The optimal controller hangs up with DFCU state 2 for both cylinder sides and the motion is stopped stepwise by closing the DFCUs resulting in some oscillations.
Figure 16. Large trajectory, loading A. Fault detection active.
Figure 17. Small trajectory, loading A. Fault detection active.
5. CONCLUSION AND FUTURE WORK The simulations of the safe controller show acceptable control performance. Thus the safe controller seems to enable relatively good controllability despite its simplicity. The biggest difference in comparison to the optimal controller can be seen in small velocity resolution. The proposed architecture of the controller, based on the recovery blocks approach, leads to safe operation of the whole system despite of possible design faults in the model of optimal controller. The acceptance test is relatively simple, but ensures that the physical valves are always driven with an acceptable control signal. The proposed architecture can mask runtime errors, erroneous results and deadline misses of the optimal controller. It should be noted that on platforms, in which all tasks run in the same memory space, memory corruption in the optimal controller can affect the safe controller. Switching between safe controller and optimal controller during motion does not result in big stepwise changes to the valve command signals. Therefore, the system performs smoothly even if the optimal controller misses its deadline or suggests a hazardous valve control signal. The fault-tolerant control system presented will be tested on real experimental system.
6. ACKNOWLEDGEMENTS This research is funded by the Academy of Finland (Grant No. 139540) and by the DIGIHYBRID-project which is part of EFFIMA-program of the Finnish Metals and Engineering Competence Cluster, FIMECC Ltd.
REFERENCES [1]
Siivonen, L., Linjama, M. & Vilenius, M. 2005. Analysis of Fault Tolerance of Digital Hydraulic Valve System. Bath Workshop on Power Transmission and Motion Control, September 7-9, Bath, UK, pp. 133-146
[2]
Siivonen, L., Linjama, M., Huova, M. & Vilenius, M. 2009. Jammed on/of valve fault compensation with distributed digital valve system. International Journal of Fluid Power, 2, pp. 73-82
[3]
Linjama, M., Koskinen, K.T. & Vilenius, M. 2003. Accurate trajectory tracking control of water hydraulic cylinder with non-ideal on/off valves. International Journal of Fluid Power, 1, pp. 7-16.
[4]
Linjama, M. & Vilenius, M 2005. Improved digital hydraulic tracking control of water hydraulic cylinder drive. International Journal of Fluid Power , 1, pp. 29-39.
[5]
Linjama, M., Huova, M., Boström, P., Laamanen, A., Siivonen, L., Morel, L., Walden, M. & Vilenius, M. 2007. Design and implementation of energy saving digital hydraulic control system. The Tenth Scandinavian International Conference on Fluid Power, May 21-23, Tampere, Finland, pp. 341-359.
[6]
K.H. Kim and H.O. Welch. Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications. IEEE Transactions on Computers, 38(5):626 –636, may 1989.
[7]
Nguyen, D. and Dar-Biau Liu “Recovery blocks in real-time distributed systems” Reliability and Maintainability Symposium, 1998. Proceedings, pp.149-154, 1922 Jan 1998.
[8]
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. The worst-case execution time problem - overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., 7(3):36:1– 36:53, May 2008.
[9]
Pontus Boström. Contract-based verification of Simulink models. In Proceedings of the 13th international conference on Formal methods and software engineering, ICFEM’11, pages 291–306, Berlin, Heidelberg, 2011. Springer-Verlag.
[10] C. L. Liu and James W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46–61, January 1973. [11] N C Audsley, A Burns, M F Richardson, and A J Wellings. Hard real-time scheduling: The deadline-monotonic approach. Proc IEEE Workshop on RealTime Operating Systems and Software, pages 1–6, 1991. [12] Giorgio C. Buttazzo. Hard Real-Time Computing Systems. Springer US, 2011.