reliability of data transfer and handling in railway ... - CiteSeerX

2 downloads 136 Views 137KB Size Report
aspects and software testing through a real-world application of the Hungarian State Railways. Finally the ... usability, performance, serviceability, capability,.
RELIABILITY OF DATA TRANSFER AND HANDLING IN RAILWAY TELEMONITORING SYSTEMS Tamás Bécsi, Szilárd Aradi Budapest University of Technology and Economics, Department of Control and Transport Automation Address: Bertalan 2., Budapest, Hungary, H-1111 Phone: (+36-1) 463-1044, Fax: (+36-1) 463-3087, E-mail: {becsi.tamas, aradi.szilard}@mail.bme.hu

Abstract: The paper describes the structure of a telemonitoring system for fleet management. The system is based on a locomotive on-board computer with built-in GPS Module. The on-board computer measures the operation characteristics, the geographical position of the locomotive and logs the activity of the engine-driver. The collected data is sent to a central database server via GSM network. In the paper, after a short review of the schematic structure, the bottlenecks and threats of such a system are outlined with special regard to software reliability aspects, such as reliability models, complexity aspects and software testing through a real-world application of the Hungarian State Railways. Finally the advantages of the system and further development possibilities are summarized. Keywords: telemonitoring, communication safety, software reliability

1. INTRODUCTION Thanks to the rapid development of microelectronics and mobile telecommunications by the end of the 90's a wider range of fleet diagnostics and satellite tracking became possible. These innovations gave a technological background of the creation of fleet management systems. Economical demand for fleet management systems strengthened as an effect of the increased competitive situation in passenger and freight transport especially in road traffic. Spread in on-line fleet management systems was greatly aided by the constant decrease of communication charges and the increase of speed in data transmission. For all these reasons, in the recent years online fleet management systems of transportation spread rapidly. The usage of these systems gives a lot of advantages, which creates the reason of its existence in railway operations. Its advantages are the followings: - a greater safety in delivery, - aiding dynamic freight arrangement, - constant tracking of the mechanical condition of the vehicles, - easier documentation, - aiding efficiency wages, - developing safety of traffic,

- developing safety of freight, - increased environmental protection. In the paper, after a short review of the schematic structure, the bottlenecks and threats of such a system are outlined with special regard to software reliability aspects, such as reliability models, complexity aspects and software testing through a real-world application of the Hungarian State Railways.

2. SOFTWARE RELIABILITY Software reliability is an important attribute of software quality, together with functionality, usability, performance, serviceability, capability, installability, maintainability and documentation. Software reliability is hard to achieve, because the complexity of software tends to be high. While any system with a high degree of complexity, including software, will be hard to reach a certain level of reliability, system developers tend to push complexity into the software layer, with the rapid growth of system size and ease of doing so by upgrading the software. While the complexity of software is inversely related to software reliability, it is directly related to other important factors in software quality, especially functionality, capability,

etc. Emphasizing these features will tend to add more complexity to software.

2.1. Definition According to ANSI, Software Reliability is defined as: the probability of failure-free software operation for a specified period of time in a specified environment (ANSI/IEEE, 1991). Although Software Reliability is defined as a probabilistic function, and comes with the notion of time, we must note that, differently from traditional Hardware Reliability, Software Reliability is not a direct function of time. Electronic and mechanical parts may become "old" and wear out with time and usage, but software will not rust or wear-out during its life cycle. Software will not change over time unless intentionally changed or upgraded.

2.2. Software failure mechanisms Software failures may be due to errors, ambiguities, oversights or misinterpretation of the specification that the software is supposed to satisfy, carelessness or incompetence in writing code, inadequate testing, incorrect or unexpected usage of the software or other unforeseen problems. While it is tempting to draw an analogy between Software Reliability and Hardware Reliability, software and hardware have basic differences that make them different in failure mechanisms. Hardware faults are mostly physical faults, while software faults are design faults, which are harder to visualize, classify, detect, and correct. Design faults are closely related to fuzzy human factors and the design process, which we don't have a solid understanding. In hardware, design faults may also exist, but physical faults usually dominate. In software, we can hardly find a strict corresponding counterpart for "manufacturing" as hardware manufacturing process, if the simple action of uploading software modules into place does not count. Therefore, the quality of software will not change once it is uploaded into the storage and start running. Trying to achieve higher reliability by simply duplicating the same software modules will not work, because design faults can not be masked off by voting. (Pan, 1999)

2.3. The bathtub curve Over time, hardware exhibits the failure characteristics shown in Figure 1, known as the bathtub curve. Periods are the burn-in phase, the useful life phase and the end-of-life phase.

Fig. 1. Bathtub curve for hardware reliability Software reliability, however, does not show the same characteristics similar as hardware. A possible curve is shown in Figure 2 if we projected software reliability on the same axes (Pan, 1999). There are two major differences between hardware and software curves. One difference is that in the last phase, software does not have an increasing failure rate as hardware does. In this phase, software is approaching obsolescence; there is no motivation for any upgrades or changes to the software. Therefore, the failure rate will not change. The second difference is that in the useful-life phase, software will experience a drastic increase in failure rate each time an upgrade is made. The failure rate levels off gradually, partly because of the defects found and fixed after the upgrades.

Fig. 2. Revised bathtub curve for software reliability The upgrades in Figure 3 imply feature upgrades, not upgrades for reliability. For feature upgrades, the complexity of software is likely to be increased, since the functionality of software is enhanced. Even bug fixes may be a reason for

more software failures, if the bug fix induces other defects into software. For reliability upgrades, it is possible to incur a drop in software failure rate, if the goal of the upgrade is enhancing software reliability, such as a redesign or reimplementation of some modules using better engineering approaches, such as clean-room method.

2.4. Software Reliability Models A proliferation of software reliability models have emerged as people try to understand the characteristics of how and why software fails, and try to quantify software reliability. Over 200 models have been developed since the early 1970s, but how to quantify software reliability still remains largely unsolved. As many models as there are and many more emerging, none of

the models can capture a satisfying amount of the complexity of software; constraints and assumptions have to be made for the quantifying process. Therefore, there is no single model that can be used in all situations. No model is complete or even representative. One model may work well for a set of certain software, but may be completely off track for other kinds of problems. Most software models contain the following parts: assumptions, factors, and a mathematical function that relates the reliability with the factors. The mathematical function is usually higher order exponential or logarithmic. Software modeling techniques can be divided into two subcategories: prediction modeling and estimation modeling. Both kinds of modeling techniques are based on observing and accumulating failure data and analyzing with statistical inference. The major differences of the two models are shown in Table 1.

ISSUES Data reference

PREDICTION MODELS Use historical data

When used in development cycle

Usually made prior to development phases; can be used as early as concept phase

Time frame

Predict reliability future time

at

some

ESTIMATION MODELS Uses data from the current software development effort Usually made later in life cycle (after some data have been collected); not typically used in concept or development phases Estimate reliability at either present or some future time

Table 1. Difference between software reliability prediction models and software reliability estimation models Representative prediction models include Musa's Execution Time Model, Putnam's Model etc. Using prediction models, software reliability can be predicted early in the development phase and enhancements can be initiated to improve the reliability (Lyu, 1996). Representative estimation models include exponential distribution models, Weibull distribution model, Thompson and Chelson's model, etc. Exponential models and Weibull distribution model are usually named as classical fault count/fault rate estimation models, while Thompson and Chelson's model belong to Bayesian fault rate estimation models. The field has matured to the point that software models can be applied in practical situations and give meaningful results and, second, that there is no one model that is best in all situations. Because of the complexity of software, any model has to have extra assumptions.

Only limited factors can be put into consideration. Most software reliability models ignore the software development process and focus on the results, the observed faults and/or failures. By doing so, complexity is reduced and abstraction is achieved, however, the models tend to specialize to be applied to only a portion of the situations and a certain class of the problems.

2.5. Software Reliability Metrics Measurement is commonplace in other engineering field, but not in software engineering. Though frustrating, the quest of quantifying software reliability has never ceased. Measuring software reliability remains a difficult problem because we don't have a good understanding of the nature of software.

Software size is thought to be reflective of complexity, development effort and reliability. Lines Of Code (LOC), or LOC in thousands (KLOC), is an intuitive initial approach to measuring software size. But there is not a standard way of counting. Typically, source code is used and comments and other non-executable statements are not counted. This method can not faithfully compare software not written in the same language. The advent of new technologies of code reuses and code generation technique also cast doubt on this simple method. Function point metric is a method of measuring the functionality of a proposed software development based upon a count of inputs, outputs, master files, inquires, and interfaces. The method can be used to estimate the size of a software system as soon as these functions can be identified. It is a measure of the functional complexity of the program. It measures the functionality delivered to the user and is independent of the programming language. It is used primarily for business systems; it is not proven in scientific or real-time applications (Pan, 1999). Complexity is directly related to software reliability, so representing complexity is important. Complexity-oriented metrics is a method of determining the complexity of a program’s control structure, by simplifying the code into a graphical representation. Representative metric is Cyclomatic Complexity (VanDoren, 2000). The Cyclomatic Complexity of a software module is calculated from a connected graph of the module (that shows the topology of control flow within the program): Cyclomatic complexity: CC = E - N + p

(1)

where E = the number of edges of the graph N = the number of nodes of the graph p = the number of connected components Studies show a correlation between a program's Cyclomatic Complexity and its error frequency (Table 2). A low cyclomatic complexity contributes to a program's understandability and indicates it is amenable to modification at lower risk than a more complex program. A module's Cyclomatic Complexity is also a strong indicator of its testability. Test coverage metrics are a way of estimating fault and reliability by performing tests on software products, based on the assumption that software reliability is a function of the portion of

software that has been successfully verified or tested. Cyclomatic Complexity 1-10 11-20 21-50 >50

Risk Evaluation Low risk Moderate risk High risk Very high risk, unstable program

Table 2. Cyclomatic Complexity

2.6. Reliability testing Software reliability refers to the probability of failure-free operation of a system. It is related to many aspects of software, including the testing process. Directly estimating software reliability by quantifying its related factors can be difficult. Testing is an effective sampling method to measure software reliability. Guided by the operational profile, software testing (usually black-box testing) can be used to obtain failure data, and an estimation model can be further used to analyze the data to estimate the present reliability and predict future reliability. Therefore, based on the estimation, the developers can decide whether to release the software, and the users can decide whether to adopt and use the software. Risk of using software can also be assessed based on reliability information. The primary goal of testing should be to measure the dependability of tested software. There is agreement on the intuitive meaning of dependable software: it does not fail in unexpected or catastrophic ways. Robustness testing and stress testing are variances of reliability testing based on this simple criterion. The robustness of a software component is the degree to which it can function correctly in the presence of exceptional inputs or stressful environmental conditions. Robustness testing differs with correctness testing in the sense that the functional correctness of the software is not of concern. It only watches for robustness problems such as machine crashes, process hangs or abnormal termination. The assumption is relatively simple; therefore robustness testing can be made more portable and scalable than correctness testing (Pan, 1999). Stress testing, or load testing, is often used to test the whole system rather than the software alone. In such tests the software or system are

exercised with or beyond the specified limits. Typical stress includes resource exhaustion, bursts of activities, and sustained high loads.

3. SYSTEM STRUCTURE General construction of online fleet management systems are demonstrated on Figure 3. The three main features are: - on-board computer, - central server, - user computers. The on-board units (OBU) on the locomotive measure the operational parameters of the locomotive (state of the switches, energy consumption, motor parameters, etc.), and its position (aided by GPS based location), and they

store the data given by the engine-driver (the name of the actual activity, etc.). These parameters are sent to a central server at the actualization of previously defined events (alarm-signal, sudden decrease in fuel level, etc.) and in previously defined periods of time. On-board computers communicate with the central server through GSM network. The incoming data are evaluated and stored in a database. If necessary the central server can send an alarm to a given e-mail address or even a mobile phone. In this structure communication from the server towards the locomotive is plausible as well. Aided by this the incoming data packages can be confirmed, a written message can be sent to the engine-driver and the parameters of the board unit can be set.

Fig. 3. System structure Locomotives are detectable and observable almost constantly (online) and the operating parameters (running performance of vehicles, energy consumption, activities and work time of drivers, delivery performance) can be followed by a later evaluation of data stored in the centre (offline).

4. DATA TRANSFER The communication system can be build up by OSI model (ISO/IEC 7498-1, 1994) as shown on Table 3. The connection point between the OBU-s and the server is the session layer (TCP socket).

The first step is the determination of the session layer's protocol. There are a few key features that set TCP apart from User Datagram Protocol: - Ordered data transfer, - Retransmission of lost packets, - Discarding duplicate packets, - Error-free data transfer, - Congestion/Flow control. In the data block of TCP packet a record structure was built up, which contains the data of the locomotive and the train. For the declaration of the structure and data types a standard XML schema (XML Schema, 2004) was created.

OSI model Physical layer Data link layer Network layer Transport layer

Used protocol or service GSM, 100BASE-TX GPRS, Ethernet Internet Protocol (IP) Transmission Control Protocol (TCP)

Implementation in OBU GSM modem

TCP stack in GSM modem

Session layer

TCP socket

Presentation layer

Data exchange with XML

Microcontrollers software

Application layer

SQL server



Implementation in central server Network card Network card Operation system TCP server class (Server software) Client thread (Server software) Client and SQL thread (Server software) SQL server (Oracle)

Table 3. Architecture of communication system

5. BOTTLENECKS AND THREATS Such a system has to be available 24/7, since the processing of the acquired data of the OBU-s must be real-time for online organization, or delivery plan change purposes. Data accessibility comes from the availability of all system parts: - OBU availability: (electric power supply, reliability, accessibility) - Network availability (GSM network, gateways and firewalls) - Communication Server (accessibility, reliability) - Database accessibility The networking of the investigated system of the Hungarian State Railways uses the public GSM network using GPRS transfer, which is not a safety-critical application, and does not yet has 100% coverage for the whole railway system. The forthcoming installation of the GSM-R system should handle multiply availability and reliability aspects of the system. An easy way to ensure higher database accessibility is the partitioning of the database for a transaction database part for communication tasks, and a storage part for performance-critical queries. Another necessary part of the data storage is the duplication of both parts of the system, beyond the fact that this is the simplest solving for redundancy; the duplication ensures the possibility of database application maintenance and upgrading without the loss of service. Therefore the narrowest point for the path of the data flow (excluding the problems of the public GSM network) is the communication server itself. Its reliability and robustness is one

of the key points for building a well-working telemonitoring system.

6. RELATED WORK Considering the above mentioned viewpoints a server application was developed, which has the following main tasks: - receiving data from the locomotives, - piggybacking to the locomotives, - identification of engine-drivers, - inserting data to the SQL database, - sending alert, if necessary, - setting the parameters of OBU-s, - remote diagnostic handling - software update. The server is listening on a specified TCP port, and waiting for the clients. The application works with more simultaneous threads, because it needs to serve parallel a huge number of clients, and it has got several parallel jobs. The main thread is the ServerThread is listening continuously on the TCP port. If an incoming connection occurs, the ServerThread creates a ClientThread, which takes over the handling of the connection. The ClientThread is the most complex part of the software. It has to deal with the following tasks: - data receiving - data checks (syntactic, semantic, checksum) - data conversion and passing to the SQLThread - piggybacking to the clients

- identification of the drivers - sending the parameters of the OBU The SQLThread is waiting for the data from the ClientThreads. The tasks of the SQLThread are keeping the database connection, inserting the data into a table, and querying the engine-driver's ID and the OBU's parameters. In software engineering, reducing complexity through reducing software intelligence often generates reduced capabilities, or increased performance needs. On the other hand, partitioning software tasks is an obvious solution. These conditions indicated that this highly critical application needs adequate sub-processes, and a fast, yet reliable process-control. For this task, the most important part of the software (ClientThread) received a simple, so thus error-proof state machine for flow-control. The state machine approach resulted in some beneficial results. - The number of code branches and loops has decreased. - Code reading and understanding became easier. - Code and functional expansion became independent to other software parts. - Still software integration level remained high. During the development phase of such application it is important to identify the potential coding risks. For this task a software verification tool is necessary, the development of the recent application was aided by Codehealer a highly efficient, powerful source code analysis and verification tool. The use of such application enables the early recognition of unused or unreachable code, identifiers hiding others, uninitialized or unreferenced identifiers, dangerous type casting, undefined or unused function values, which all hold the potential coding errors so thus application errors. The use of the software verification tool is also helps the computation of software reliability metrics, according to its results and aid of development, the server module dealing with the clients has the Cyclomatic Complexity of 5, meaning the module has acceptably low risk For testing an OBU simulator was developed. This tool can simulate a large number of OBU-s automatically. Functional test, load test, performance test, and stress test were performed during development. The results shows, that the software is robust with low resource demand. The long-lasting reliability testing is in progress.

7. SUMMARY The paper described the brief schematic structure of a railway centric telemonitoring system, evaluated its bottlenecks and threats. Through a real-world application of the Hungarian State Railways, the reliability of such a system is outlined with special regard to Cyclomatic Complexity, which is a highly important metric for under-development software systems because of its high impact on the possibility of failures indicated by software upgrades. The testing of the present system proved that it can be built to be a robust and reliable application, but since high safety integrity level can not be reached using public GSM network.

REFERENCES Pan, J. (1999). Software Reliability. 18-849b Dependable Embedded Systems, Spring 1999 VanDoren, E. (2000). Cyclomatic Complexity. http://www.sei.cmu.edu/str/descriptions/cyclo matic_body.html Pan, J. (1999). Software Testing. 18-849b Dependable Embedded Systems, Spring 1999 R. Lyu, M. (1996) Handbook of Software Reliability Engineering. McGraw-Hill publishing, 1995, ISBN 0-07-039400-8 Scientific Association for Infocommunications: Telecommunication Networks and Informatics Services, On-line book, http://www.hte.hu Dicső K., Marcsinák L.: Today and future of onboard systems; Hungarian Rail Technology Journal 2006/3. pp. 11-15. Szemkeő M:. Purposes, application areas, services and installation phases of GSM-R system; Hungarian Rail Technology Journal 2005/1. pp. 12-16. Aradi Sz.: Telemonitoring System with Locomotive On-Board Computer; Hungarian Rail Technology Journal 2007/1. pp. 27-8. ANSI/IEEE, "Standard Glossary of Software Engineering Terminology", STD-729-1991, ANSI/IEEE, 1991 ISO/IEC 7498-1, Information Technology – Open Systems Interconnection – Basic Reference Model: The Basic Model, 1994 XML Schema, http://www.w3.org/XML/Schema, 2004.

Suggest Documents