UML Based Reverse Engineering for the Verification of Railway ...

2 downloads 0 Views 257KB Size Report
engineering is performed using UML as the modelling language used to achieve ... Of course, a reverse engineering based on UML well suits to object oriented.
UML Based Reverse Engineering for the Verification of Railway Control Logics Chiara Abbaneo1, Francesco Flammini1 2, Armando Lazzaro1, Pietro Marmo1, Nicola Mazzocca2, Angela Sanseviero1 1 ANSALDO SIGNAL - Ansaldo Segnalamento Ferroviario S.p.A., RAMS Division Via Nuova delle Brecce 260, Naples, Italy {abbaneo.chiara, flammini.francesco, lazzaro.armando, marmo.pietro, sanseviero.angela}@asf.ansaldo.it 2 Università “Federico II” di Napoli, Dipartimento di Informatica e Sistemistica Via Claudio 21, Naples, Italy {frflammi, nicola.mazzocca}@unina.it Abstract The Unified Modeling Language (UML) is widely used as a high level object oriented specification language. In this paper we present a novel approach in which reverse engineering is performed using UML as the modelling language used to achieve a representation of the implemented system. The target is the core logic of a complex critical railway control system, which was written in an application specific legacy language. UML perfectly suited to represent the nature of the core logic, made up by concurrent and interacting processes, using a bottom-up approach and proper modeling rules. Each process, in fact, was strictly related to the management of a physically (resp. logically) well distinguished railway device (resp. functionality). The obtained model deeply facilitated the static analysis of the logic code, allowing for at a glance verification of correctness and compliance with higher-level specifications, and opened the way to refactoring and other formal analyses.

1. Introduction Safety-critical railway control systems require a number of thorough verification and validation activities, regulated by international standards (see [12] and [4]). In this paper we present an approach used to perform a static analysis and improvement of a part of the software used as the core logic of the Radio Block Centre (RBC), a complex safety-critical computer which is the most important subsystem of ERTMS/ETCS (European Railway Traffic Managements System / European Train Control System) [11]. To give an idea of the complexity of the problem, the core logic of the RBC is made up by more than 200.000 lines of code. RBC software was mainly written using a legacy language highly “proven in use” in past projects (above all in interlocking applications, see [2]). The choice of using such language is justified by several factors: the reliability of the development environment, the ease of use of the natural language-like syntax, the available debugging tools, etc. Being a proprietary language, no existing methodology or tool could be directly applied in order to accomplish the following objectives: •

to obtain an extensive documentation describing in detail how the core logic works, which is necessary for any kind of testing and maintenance activities;



to trace the architecture and behavior of the core logic into the higher level software specification, allowing for an easy verification of compliance;

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE



to optimize code reliability and performance by means of a sort of refactoring, that is modification without impact on high level system behavior.

We found out that the best way to achieve the aforementioned objectives was a bottom-up modeling aimed at building a comprehensive UML model of the software, including both structural and behavioral diagrams. UML is widely accepted as a de facto standard in software engineering; its syntax and semantics have been formalized by the Object Management Group (OMG) [9] and used by several commercial automated modeling and development suites (see for instance [10]). Several applications of UML, apart from pure software design and development, are referenced in research literature (see [1]). The use we make of UML is not as formal as a model checker (see [2]), but it easily allows to manage complex systems and can be the basis of a series of analyses and refinements, as explained in the following of this paper. One of the most widespread refinement techniques is known with the name of refactoring [8], which involves code manipulation and optimization in order to achieve software reliability and/or performance improvements in a transparent way. In this paper we describe the use of UML in a reverse engineering approach. Reverse engineering is the process of analyzing a system to identify its components and their interrelationships and to create a representation of the system in another form or at a higher level of abstraction (see [6] and [5]). Of course, a reverse engineering based on UML well suits to object oriented programming paradigms. In our case, the legacy logic language is not specifically object oriented, but “process oriented”, with each process associated with a well distinguished entity (either physical or logical). Therefore, modeling was quite straightforward and UML revealed to be a powerful documentation and analysis tool. This paper is organized as follows: Section 2 contains a brief description of ERTMS/ETCS and of the Radio Block Center, concentrating on the organization of the specification and on the software architecture of the core logic; Section 3 introduces the UML based modeling approach used to build a bottom-up representation of the software and explains how verification, traceability and refactoring have been performed, giving some examples; Section 4 summarizes the results obtained and presents some future applications and works in progress.

2. Brief description of the case-study ERTMS/ETCS is the specification of a standard aiming at improving safety, performance and interoperability of European railways [11]. In Italy, the so called Level 2 specification of ERTMS/ETCS is used on high-speed railway lines. ERTMS/ETCS specifies three main subsystems: trackside, on-board and lineside. The trackside subsystem is the ground part of the overall system, and manages signalling and train routing. The most important trackside subsystem is the Radio Block Center, which has the aim of collecting track and train information in order to feed the trains with their needed data (i.e. the distance they are allowed to cover, the static speed profiles allowed by the track and possible emergency information). The on-board control system performs train protection by controlling train speed against the elaborated dynamic speed profiles. The lineside is constituted by the so called balises (or Eurobalises, as defined in ERTMS), that are devices positioned between the track lines, which have the aim to transmit data telegrams to the trains passing over and energizing them. Such data is stored and used by the on-board control system; in ERTMS Level 2 balises are used like milestones to allow trains detect their position and communicate it to the Radio Block Center, which will use such information to provide train separation. The Radio Block Center is mainly responsible of detecting train position and delivering the correct Movement Authorities (MA) and Static Speed Profiles (SSP) to the trains. In order to

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE

detect the necessary route and Track Circuit (TC) occupation status, the RBC must be connected to the national Interlocking system (IXL), which is not standardized in ERTMS. Moreover, the RBC is usually able to manage a limited number of track sections, thus it must be connected to other adjacent RBCs in order to allow for the so called train hand-over. The hand-over is the complex procedure that manages the passage of a train from the area supervised by a RBC to the area of another RBC. In ERTMS L2 the on-board and trackside communicate by the GSM-R radio network (requiring Base Transceiver Stations, BTS, to be installed along the railway track), using the Euroradio protocol. The overall architecture of an ERTMS/ETCS level 2 system implementation is shown in the left part of Figure 1. The software architecture of the RBC is depicted in the right part of Figure 1. There are some “application” processes (written in a safe subset of the C programming language), which manage the interaction of the RBC with the external world, and some “logic” processes, which instead manage the core logic of the RBC and are implemented using an application specific logic language, as already mentioned. The logic manager has to interpret logic language, translating it into executable code, and to schedule the logic processes. The shaded rectangles in the figure represent the logic processes which implement the RBC high-level logics and which were the target of our reverse engineering and refactoring work. The logic processes of the RBC have to implement, together with the application processes, its high level functional specification. RBC features a number of logic processes, each one linked to a particular physical entity or logical concept. For instance, the “MA” process, shown in Figure 1, verifies the integrity of the Movement Authority assigned to the train against all the significant track and route conditions. Such conditions are received from the interlocking system by means of the “IXL interaction” application process and are managed by the “TC” logic process, wich stores Track Circuit physical conditions in its internal variables. If the MA integrity verification fails (for instance a track circuit is not clear or involved in an emergency condition), the MA process has to command the sending of an emergency message to the “Train interaction” application process, which will manage the sending of the proper radio messages to the train. In this simple example, the MA and TC processes interact with the application processes to manage data from and to the external world; moreover they interact with each other, because the MA process must ask the TC process the status of track circuits in order to verify the integrity of the Movement Authority. There are many other logic processes which behave in a similar way. They all feature: •

a set of data variables and a single “process status” variable; Figure 1. The implementation of ERTMS Level 2 (left) and of the RBC software (right). TRAIN TRACK CIRCUITS GSM-R

Balise Group Interlocking

BTS

BTS

RBC Operator Interface

Adjacent RBCs Radio Block Center

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE



a set of “operations” which can be activated by: - application processes or the RBC operator by means of “manual commands”, - other logic processes using “automatic commands” - directly when the process status variable is assigned a certain value.

Typical process operations are: •

Verify conditions on the variables of the same process or on variables of different processes;



Assign values to variables of the same process, including process status variable, or of other processes (there are no restrictions on the visibility and modifiability of variables);



Issue automatic commands to other processes; issue “application commands” to application processes (e.g. “send MA”).

3. Description of the modeling, verification and refactoring approach Our main objective was to build an UML model from the logic code, using proper diagrams, and compare such specification (obtained “bottom-up”) with the RBC high level logic specification, in order to verify the compliance of the logic implementation. As described in the previous section, the structure of logic processes well suits to be modeled using an object oriented specification language. In fact, logic processes can be easily thought of as classes, with their internal “variables” constituting the attributes, and their “commands” being the operations. This similarity allowed us to build from the logic code the most important UML structural view, class diagrams, statically showing the relationships between the logic processes. As already mentioned, the object oriented (OO) programming paradigm is not directly supported by the logic language, nor it has always been respected by developers the data encapsulation rule of OO programming. This was one of the first results we were able to achieve: even using a non OO language, the basic programming rules of the OO paradigm were to be respected. This implied to add operations, whenever necessary, in order to make processes modify variables of other processes only by using the methods of the latter, and not by directly accessing external variables. Another Figure 2. Modeling and verification at a glance. important result was that such a LOGIC 1st LEVELSPECIFICATION Req. xx.yy: When the MA verification process is activated, the RBC Logic modification allowed us to add shall verify the status of the track circuits assigned to the MA and then […] defensive programming controls in the ... verification of compliance new added methods, and not in the UML MODEL (2nd level spec.) 1) CLASS DIAGRAMS 2) SEQUENCE DIAGRAMS 3) STATE CHARTS calling methods, gaining in reliability, MA readability and code length. TC -variables MA MA_state1 Using similar considerations, we +operations() 1 send_MA were able to make a first level verify_cond() TC restructuring of the code, together with MA_state2 -variables op() model building. Once completed, the +operations() * model diagrams have been used for a reverse refactoring engineering traceability study, necessary for the LOGIC CODE PROCESS MA; verification of compliance. Of the UML VARIABLES process_status, control, … OPERATIONS send_MA, … behavioral views, sequence diagrams OPERATION send_MA: IF cond ASSIGN “ok” TO VARIABLE “control” best suited to represent the behavior of AND SEND AUTOMATIC COMMAND “op” TO PROCESS “TC” the logic processes in order to compare ... it with the one requested by the high-

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE

level specification, which are written in natural language in the form of input-output relations. Theoretically, a formal traceability could be obtained by a hierarchical superposition of sequence diagrams, matching the ones obtained top-down from the 1 st level logic specification and those, more detailed, obtained bottom-up by modeling the logic code. This solution, requiring to model the natural language requirements by use case and sequence diagrams, was excluded in our experience because not really advantageous with respect to a direct comparison with the natural language requirements, and too much time consuming (thousands of requirements would have to be modeled). Nevertheless, partial sequence diagrams had to be linked to make up a complete operating scenario for direct comparison with higher level specification. Finally, state diagrams were added to the model with the main aim of checking the correctness of transitions for the “process status” variable, which was of a fundamental importance as it triggered most of process behavior. Therefore, of all the diagrams standardized in UML we found out we only needed the basic subset of use case, class, sequence and state diagrams. Such a subset is functionally complete and is also generally used for software development [10]. Other types of diagrams (e.g. activity diagrams) were only used as complementary views to simplify the understanding of complex operations. The modeling tool we used to build UML diagrams was the Rational Rose Modeler Edition [10]. Figure 2 gives an at a glance representation of the described methodology. In the following of this section, the modeling technique from logic language to UML diagrams is described referring to the simple “Change of Traction Power” (CTP) logic process (whose code length is more than 20 times shorter than the one of the “MA” process). The CTP process has to manage manual commands of activation/deactivation of a change of traction power line section, coming from the RBC operator. The activation/deactivation command must be accepted only if the track circuit in which there is a change of power is free and not included in a MA assigned to a train. In the following we present the modelling rules in the building of UML views for such example process. - Use case diagrams: they simply represent the ways a logic process can be called by application processes or by the operator using manual commands; therefore, they contain all the operations activated by manual commands (it is an high level representation of the ways the external world interacts with the RBC logic processes); the simple use case diagram for the CTP process is shown in Figure 3. CTP setting RBC Operator (from Use Case View)

CTP CTP_state Control CTP_Activation() CTP_Deactivation(

ActivateCTP

TC list

CTP [1..3]

RBC Operator DeactivateCTP

TC (from TC)

MA (from MA)

Figure 3. CTP use case diagram (left) and class diagram (right).

- Class diagrams: they are built from a static software architecture view (data structure with internal/public attributes, operations), but they also show the relationships and interactions between processes (variable read/write, procedure call). They are very important as they represent the backbone of the entire model. In fact, the

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE

first step of the modeling approach is the definition of a class for each of the logic processes. The relationships between classes are determined by the “access” or “assign” statements (as aforementioned, all process variables are public) and by the issuing of commands, triggering process operations. Given the high number of associations (some processes can share up to 20 distinct links) a comprehensive class diagram would be very complex and almost unreadable. We preferred to build up partial class diagrams, each one focusing on a single process and showing all the incoming and outgoing links to/from that process. In Figure 3 we also show the class diagram for the CTP process. The CTP process features two local variables, a single “outlinked” process and two operations. In case a process only accesses (or “reads”) data of another process (usually to check some conditions on the state of the other process), the link is modelled by a dependency relation (dashed line). In case, instead, the process also activates operations of the other process (by automatic commands), the link is modelled as a real association (full line). For each relation between processes, a list of accessed variables and operations is defined. This allows to define constraints useful when building sequence diagrams. From the diagram of Figure 3 it is evident that the CTP process is used by the MA process. In fact, the MA process has to manage the presence of a CTP line section as an additional information to be added to the outgoing MA message in order to inform the on-board system of the CTP procedure. - Sequence diagrams: they are specified for each operation, showing its implementation and detailed interactions between logic processes. A full arrow has been used for check and assignment statements, specifying the nature of the operation as a comment; a normal arrow is used, instead, for the activation of an operation (this is a non standard notation). For better clarity, each complex diagram has been detailed using linked notes. In Figure 4 we report an example of sequence diagram for the CTP process, showing the activation operation (the deactivation is very similar): when an activation command is received, the state of the associated TC (belonging to a properly declared linked processes list) must be checked, in order to verify it is in the “free” and “not requested” (for a MA) states. If the condition is satisfied, the automatic command CTP_Activation() is issued to the TC process. : CTP

: RBC Operator

TC List : TC

NoTrain

CTP_Activation( )

Handover Activation

Read TC status

Preannouncement

TC control

Handover Cancellation MA Computation Deactivation CTP_Activation( ) [control=TRUE]

MA Computation Activation

Waiting for Acceptation

Liberation

Cancellation

Acceptation

Figure 4. The CTP activation sequence diagram (left) and the Hand-Over statechart (right).

- Statecharts: they show the transitions of the state variables of the logic processes against the triggering events. “Entry” type actions are defined for each state, which model the activation of private methods (i.e. the ones activated by process status). The CTP process, being very simple, does not need an associated state machine (in fact, no “process status” special variable is specified in the class diagram of Figure 3; note that “CTP_state” is not the “process status” attribute ). Therefore, we decided to show in Figure 4 a little more complex statechart example, taken from the hand-over process.

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE

Finally, Activity diagrams were used as a complementary view to obtain a detailed representation of the elaboration flow, showing both data exchange between processes and the ordered execution of the statements. While building the model we exploited all the syntactic controls and automations of the Rational Rose Modeler application, which constituted a first congruity check on the model under construction. For instance, as the model grows, it is very difficult to forget to add a link on a partial class diagram, as already defined links are automatically added by the application; in sequence diagrams it is not possible to call an operation which has not been already defined in the class diagram; the accessibility of attributes and operations according to how they are defined (public, protected, private) is automatically checked; etc. Different kinds of verifications were possible on the UML diagrams: for instance, the verification that a functionality specified in the 1st level logic requirements is present in the code, or on the contrary that no functionality not required by the specification is present in the code (this is a sort of coverage and traceability analysis for functions). In practice, we discovered some pieces of code related to never activated functions or useless statements, which were inherited from an early specification. Moreover, at a glance verifications on UML models were straightforward: a sequence diagram should contain only the processes that are involved in that function, as specified by high level requirements; a statechart should guarantee the reachability of all process states and prevent from the occurrence of deadlocks, etc. Such kind of analyses were performed informally, exploiting the know-how and skill of system experts. For the traceability analysis, process operations, which were modeled singularly in sequence diagrams, were linked together in order to form a complete high level scenario, traceable on the high level specification. To perform this, the execution process was statically analyzed from the external activation of an operation (by a “manual command”) to the reaching of a stable process state, e.g. the one following the sending of a message to an application process. A proper package named “Scenarios” containing all the so built sequence diagrams was created for the traceability analysis. Each “composed” or “high-level” sequence diagram represented for instance a scenario of Start of Mission, of Hand-Over, of Emergency, and so on. When a behavioral problem was suspected, we exploited the logic trace function of the simulation environment in order to perform a step by step execution and accurately detect causes and consequences of the error. If the suspect was confirmed, a proper Software Problem Report was issued to the engineering/development division of the company. Finally, as aforementioned, a considerable amount of time was invested in order to optimize code quality and performance, using refactoring techniques and exploiting model views as much as possible. In particular, defensive programming controls were inserted to check and react on inputs that are illegal or incompatible with process status. Moreover, refactoring involved the grouping of condition checks, the re-ordering of checks (weighted by occurrence probability and/or by the number of necessary steps to perform the check of the condition, as obtained from sequence diagrams) and other specific performance optimizations. In fact, most of the logic code was made up by condition verification statements. By properly grouping and shifting conditions we were able to predict and minimize the number of controls performed in each elaboration cycle, thus improving system performance. For instance, if a TC is interested by an emergency, there is no need to check its occupation status: immediately after the emergency is detected, an emergency message must be sent to the train. Thus, it would be natural to choose such control as

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE

the first to be performed. However, the probability of an emergency is very low, and such a check would be false most of the times, wasting elaboration time. From this example it should be clear that the order of condition checks had to be fine tuned according to each case. The behavioral impact of such modifications could be easily checked by means of UML sequence diagrams. In case of verified compliance with high-level requirements, the logic behavior had to be preserved. In some cases, the refactoring was deeper and involved analysis of interactions of more logic processes, counting the number of needed elaboration cycles and trying to minimize it while performing the same functionality. This was a sort of design review, whose description is out of the scope of this work.

4. Conclusions The approach presented in this paper allowed us to manage the complexity of the software under test and to understand in detail its behavior. Once the UML model has been built, traceability, verification, refactoring and other analyses which initially seemed nearly impossible to achieve, became quite straightforward. The building of the UML model, which is still in refinement, constituted most of the work, but, once defined the reverse engineering criteria, it became a mechanical and easy task. We are now evaluating the possibility of automate the building of the model by formalizing the translation/abstraction rules. The verification of model properties is also partly automatable. We think that the approach presented in this paper is quite general and effective to be used for analogous applications in different fields; for sure, it will be refined and employed in future projects managed by our company.

5. Acknowledgements The authors wish to thank Arturo Amendola and Leonardo Impagliazzo for their invaluable collaboration and support, and the anonymous reviewers for the suggestions they gave in order to improve this paper. The work of Francesco Flammini was partially supported by I.P.E. (Istituto per ricerche e attività educative).

6. References [1] A. Bondavalli, A. Fantechi, D. Latella, L. Simoncini, “Design Validation of Embedded Dependable Systems”, IEEE Micro 21(5), 2001, pp. 52-62 [2] A. Cimatti et al., “Formal Verification of a Railway Interlocking System using Model Checking”, Formal Aspects of Computing 10(4), 1998, pp. 361-380 [3] A. Egyed, N. Medvidovic, “Consistent Architectural Refinement and Evolution using the Unified Modeling Language”, Proceedings of ICSE 2001, Toronto, Canada, May 2001, pp. 83-87 [4] CENELEC: EN 50126 Railways Applications – The specification and demonstration of Reliability, Maintainability and Safety (RAMS) [5] Cung A., Lee Y.S., “Reverse Software Engineering with UML for website maintenance”, IEEE Proceedings of Working Conference in Reverse Engineering ’00, 2000, pp. 100-111 [6] E. J. Chikofsky, J. H. Cross, “Reverse Engineering and Design Recovery: A Taxonomy”, IEEE Software, vol. 7, no. 1, January 1990 [7] G. De Nicola, P. di Tommaso, R. Esposito, F. Flammini, P. Marmo, A. Orazzo, “A Grey Box Approach to the Functional Testing of Complex Automatic Train Protection Systems”, LNCS Vol. 3463, Springer-Verlag Heidelberg, 2005, pp. 305-317 [8] Martin Fowler et al., Refactoring: Improving the Design of Existing Code, Addison-Wesley Professional, 1st edition, 1999 [9] OMG website: http://www.omg.org/uml [10] Rational Corporation website: http://www.rational.com [11] UNISIG ERTMS/ETCS – Class1 Issue 2.2.2 Subset 026 [12] W. S. Heath, Real-Time Software Techniques, Van Nostrand Reinhold, New York, 1991

Proceedings of the International Conference on Dependability of Computer Systems (DEPCOS-RELCOMEX’06) 0-7695-2565-2/06 $20.00 © 2006

IEEE