Software Reliability

33 downloads 102393 Views 2MB Size Report
Automotive ECUs, IEEE Explore. ▫ Oct 2012 - Successive Software Reliability ..... Contractor's Self Certification Records,. T&C records, RAM Demo. Report, SMS.
Progressive RAMS Assurance & Management for Railway Projects An EN 50126 Approach

Dr. Ajeet Kumar Pandey

Myself in Brief Education  Ph.D. (Reliability Engineering): IIT Kharagpur, Kharagpur, WB, India.

Current Affiliations RAMS Director

Consultant

 M.Tech (Software Engineering): MNNIT, Allahabad, UP, India. Earlier Affiliations  Member: IET (UK), IEEE (USA)

How to reach me ? [email protected]

ajeet.mnnit

https://in.linkedin.com/pub/dr-ajeetkumar-pandey/20/1b7/a59

+91 8886411889

Dr. Ajeet Kumar Pandey

Contributions towards System Assurance & RAMS 

Text Book

 

Early Software Reliability Prediction: A Fuzzy Logic Approach Springer- Verlag New York, USA, 2013.



Research papers & articles

    

        

May 2015 - RAM Apportionment Model for Mass Rapid Transit Systems, IEEE Explore. Jun 2015 - Software Safety Assurance for Metro Railways, Business Magazine, Traffic Infra Tech, India. Feb 2015 - Survey of Algorithms on Maximum Clique Problem, International Journal, India Dec 2014 - Opinion Mining & Sentiment Analysis for Social Media using Fuzzy Logic, International conference, India. Nov 2014 - RAMS Management for a Complex Railway System: A Case Study, International Symposium, India. Jan 2013 - Safety Analysis of Automatic Door Operation for Metro Train: A Case, Springer, International conference, India. Non 2012 - Cost Effective Reliability Centric Validation Model for Automotive ECUs, IEEE Explore. Oct 2012 - Successive Software Reliability Growth Model: A Practical Approach, International Symposium, India. Nov 2012 - A Fuzzy Model for Early Software Quality Prediction and Module Ranking, International Journal, India Jun 2011 - Early fault detection model using integrated and cost-effective test case prioritization, International Journal, India. Dec 2010 - Test Effort Optimization by Prediction and Ranking of Faultprone Software Module, IEEE Xplore. Oct 2010 - Fault Prediction Model by Fuzzy Profile Development of Reliability Relevant Software Metrics, International Journal, USA. Sep 2010 - Predicting Fault-prone Software Module Using Data Mining Technique and Fuzzy Logic, International Journal, India. Jun 2010 - Modified BUSTRAP: An Optimal BUS TRAvel Planner for Commuters using Mobile”, International Journal, India. Jun 2010 - Multistage Fault Prediction Model Using Process Level Software Metrics”, DQM Research Center, Serbia. Jan 2010 - An Early Software Fault Prediction Model using Process Maturity and Software Metrics, International Journal, India Jun -2009 - A Fuzzy Model for Early Software Fault Prediction using Process Maturity and Software Metrics, International Journal, India. Sep 2007 - Digitally Signed SMS for Business Transaction” National Conference, India.

Dr. Ajeet Kumar Pandey

Agenda  System Assurance RAMS : What, Why & How ?  Challenges and Benefits: RAMS Assurance  Associated Standards: EN 50126, EN 50128 and EN 50129

 Railway RAMS: Quality of Service and Influencing Factors  RAM & Safety Requirement Specifications  Progressive RAMS Assurance & Management Approach

 Progressive Deliverables Lists: RAM & Safety

Dr. Ajeet Kumar Pandey

System Assurance: What ? 

System –> System Engineering –> System Assurance –> System Integration 

Systems assurance can be viewed as the process of analysis and validation to ensure that a system meets all aspects of the reliability, availability, maintainability, safety (RAMS) i.e., to provide evidence (assurance) that the systems engineering has been performed correctly and completely.



In many context Systems Assurance is often referred to as RAMS Assurance.



System Assurance Plan sets out a management approach to ensure that system are designed and developed to meet specified RAMS target including proven electromagnetic compatibility.



New Approach to System Assurance: Progressive System Assurance 

ensures that RAM and Safety target is delivered progressively as an intrinsic part of the design and delivery process – reducing costly delays and overruns and significantly improving engineering efficiency.

Dr. Ajeet Kumar Pandey

Elements of System Assurance

System Assurance Management: Key Elements RAM • • • •

EN 50126 EN 50128 EN 50129 EN 50121

Safety • • • •

EMI/ EMC

Interface

Configuration

Process for specifying requirements Organization, roles and responsibilities Recommended techniques & Measure Lifecycle issues and documentation

This Session has focused on RAMS Assurance / Management in-line with EN50126

Project RAMS RAM

Safety

Product RAMS RAM

Safety

RAM & Safety analysis are handled separately because: • Failure behaviors (All failures Vs dangerous failures) • Analysis nature (Supposed to do Vs Not supposed to do…) • Goal nature (Failure free Vs Mishap Fee …) • Requirements nature (Performance & Safety related) • Many more…

Dr. Ajeet Kumar Pandey

RAMS Assurance: Why? 

Rising Performance Requirements: 

Indian Railways rapidly moving towards executing its high speed train and recently approved “Talgo” trial runs for 160 and 200 kmph on the Delhi-Mumbai route.



Rising Customer Expectation: Punctuality, Comfort Level;



Balancing Life Cycle Cost: Assets Management; Optimizing Maintenance Cost; Spare part analysis; RCM; etc.

Source: Report of High Level Safety Review Committee; 2012; http://www.indianrailways.gov.in/hlsrc/index.html Dr. Ajeet Kumar Pandey

RAMS Assurance: Why? 

Ensuring Safety for passengers; public, staff and environment Table: 05 Year Accident Data of Indian Railway Year



Collision

Derailment

Level Crossing Acc.

Fire

Misc.

Total

2009-10

9

80

70

2

4

165

2010-11

5

78

53

2

1

139

2011-12

9

55

61

4

2

131

2012-13

6

48

58

8

0

120

2013-14

4

52

51

7

3

117

Derailment constitute largest portion of 50% of total accident followed by 36% accidents at unmanned level crossing gates, 5% collisions, 2% fire, 3% mics.

Source: Report of High Level Safety Review Committee; 2012; http://www.indianrailways.gov.in/hlsrc/index.html

Dr. Ajeet Kumar Pandey

Why System Fails? 

System failure: Random and Systematic failure. 

Random faults are due to physical cause (corrosion, thermal stressing and wear-out etc.), while Systematic faults are produced by human error during system development and operation.



Radom failure can be predicted through statistical information ( MIL 217 etc.)

Systematic failure

Systematic faults are produced by human error during development & operation. Once Systematic fault certainly appear in circumstances favor.

created; it will future when

Moreover; it is difficult to predict the occurrence of systematic failure.

Dr. Ajeet Kumar Pandey

RAMS Management: How? 

How to address Systematic Error (Human Error)? 

Systematically follow the process (EN50126 Guidelines): 

EN 50126-1 addresses system issues on the widest scale.



EN 50126-2 addresses application of EN 50126-1 for safety



EN 50126-3 addresses application of EN 50126-1 for rolling stock RAMS



Railway authorities around the world are using CENELEC standards and guidelines (EN 50126, EN 50128 & EN50129) for implementing RAMS assurance in their railway projects.



CENELEC Standards provides a management framework for Railway Authorities (RA) and Railway Support Industries (RSIs) to ensure systems have been designed, constructed, and operated considering all critical factors related to RAM and Safety.

Dr. Ajeet Kumar Pandey

CENELEC Standards and its Applicability

CENELEC Standards EN 50126 addresses system issues on the widest scale. EN 50129 addresses the approval process for individual systems which can exist within the overall railway control and protection system. EN 50128 focuses Software development Software for railway control and protection systems.

Dr. Ajeet Kumar Pandey

RAMS Assurance: Challenges 

In first impression the project management/execution team feels System Assurance (RAMS) activities is a obstruction because they are not getting instant benefits. Moreover, they are not able to visualize the future payoff as well. 



System Assurance is similar to Life Insurance where by little investment bigger risk can be managed.

System Assurance culture starts with leadership; leadership drives culture which drive behavior. “ System Assurance is everyone responsibility” should be encouraged.



Fire safety and Health safety & environment (HSE) is not a part of System Assurance.



Document centric approach.



Degree of independence for implementing SA Activities.



Budget Allocation: For larger scale projects, budgets for Systems Assurance shall be between 0.4 up to 1% of the total value of the project ( Rough estimates).

Dr. Ajeet Kumar Pandey

RAMS Assurance: Challenges Issues

Possible Solution

Inadequate RAMS resources made available late in the project

Commitment by the management team and client to SA activities

Safety personnel not integrated into design review process

Management training on SA to that they can understand the benefits to be gained from SA

Engineering (design) personnel not involved in SA process

Engineering personnel encouraged to participate HAZOP/FMECA

Weak interface between systems integration and systems assurance

Provision of specific interface meetings between SA and SI personnel

Sub-contractors poorly controlled in terms of their delivery of RAMS studies

SA Plans must contain sections on the management of sub-contractors

Setting unrealistic and unachievable numerical RAMS targets

Client must consult with supplier at the contract stage and if supplier cannot meet the targets because they are unrealistic

Arguments on what if “ a RAMS target is not met but the design meets engineering specification”

client sets RAMS Targets at tender stage and associated conditions. Supplier must STATE how he intends to meet the targets or why he requires a relaxation on the target

Dr. Ajeet Kumar Pandey

Progressive RAMS Management

Dr. Ajeet Kumar Pandey

Railway RAMS: Bathtub curve 

RAMS is a long term system characteristic and for Railways it is highly useful as Railway infrastructure and systems having long useful life.

The weaker units die off leaving a population that is more rigorous

Failures occur more in a random sequence during this time.

At this point, units become old and begin to fail at an increasing rate

In the context of Railway Systems

RAMS is long term system characteristic Dr. Ajeet Kumar Pandey

Metro Project System Breakdown Structure L-0 Other Rail Network

State Govt. Min. of Defense

Railway Project

Emergency Services

Min of Railway

System Boundary

Other stakeholders

L-1

L-2

LRUs

Operation & Maintenance O&M Manual/Procedure SMS etc.

Hose pipe, Safety Valve, Circuit Board, Ballcock, etc.

Dr. Ajeet Kumar Pandey

Railway RAMS & Quality of Service



RAMS is a characteristic of system’s long term operation and can be achieved by the application of established engineering concept & techniques across the lifecycle.



Its applicability to various Railway Subsystem vary as Subsystems are neither equally critical nor do they impact equally on the service affecting failure



Quality of service is influenced by other characteristics concerning functionality and performance such as frequency of service, regularity of service, fare structure etc. as shown below.

Dr. Ajeet Kumar Pandey

EN 50126-1:Railway RAMS & Quality of Service

Elements of Railway RAMS



Safety and availability are interlinked in the sense that any conflict may prevent achievement of a dependable system.



Safety and availability are higher level RAMS target and can only be achieved by meeting all reliability and maintainability requirements and controlling the ongoing operation and maintenance activities. Dr. Ajeet Kumar Pandey

EN 50126-1: Elements of Railway RAMS

Definition: Reliability, Availability, Maintainability & Safety



Reliability: All possible failure modes, probability of occurrence of each failure, effect of the failure on the functionality. [FMEA/FMECA/FTA etc.].



Maintainability: Time for the performance of the planned maintenance, time for detection, identification and location of the faults, and time for the restoration of the failed system.



Operation & Maintenance: All possible operation modes and required maintenance action across the life cycle.



Hazards: All possible hazards under all possible modes such as normal operation, during maintenance and emergency operation. Characteristic of each hazards in terms of severity and consequences.



Safety Related Failures: All safety related failure modes (Subset of reliability failure mode), probability of occurrence of each failure, effect of the failure on the functionality as well as environment. Dr. Ajeet Kumar Pandey

EN 50126-1: Elements of Railway RAMS 

Failures in system will have some effect on the behavior of system.



All failures adversely effect the system reliability whereas only some specific failures will have an adverse effect on safety within the particular application.



Environment may also influence the functionality of the system and in turn safety of the railway application.

Dr. Ajeet Kumar Pandey

EN 50126-1: Factor Influencing Railway RAMS



Identification of the factors that can influence the RAMS parameter is vital to specification of RAMS requirements.



RAMS of a railway system is influenced in three ways 

By sources of failures introduced internally within the system or any phase of the system lifecycle ( System conditions).



By sources of failures imposed on the system during the system operation ( Operation Condition)



By sources of failures imposed on the system during the maintenance activities ( Maintenance Condition)



The means to achieve railway RAMS requirements relates to CONTROLLING the factors which influence RAMS throughout the life of system. Dr. Ajeet Kumar Pandey

EN 50126: RAMS Specification 

RAMS requirement specifications is a complex process 

Reliability Requirement



Availability Requirement



Maintainability Requirement



Safety Requirement



It is important to note that EN standards define only a process to specify RAMS requirements and assist accordingly to achieve it.



It not define RAMS target, quantities, requirements or solution for specific railway applications



In most of the case RAMS target are provided by the client at pre tender stage. Dr. Ajeet Kumar Pandey

RAMS Assurance: Metric & Measures

   

Failure MTBF, MTTF, MTTR Availability, Downtime Reliability, Design Life



Inherent Reliability: MTBF



Service Reliability: MTBSAF



MTBSAF: Mean Time Service Affecting Failure



MKBSAF: Mean Kilometer Between Service Affecting Failure



MTBCF: Mean Kilometer Between Component Affecting Failure



MTTSR: Mean Time to Service Restore



Inherent Availability ( Ai):



Achieved Availability ( Aa):



Service Availability ( As):

Between

   

Hazard, Risk Safety Integrity Level (SIL) SFF, Diagnostic Coverage PFD ( Probability of Failure)

Definition and Allocation of Safety Requirements  Risk classification ( R1 to R4)  Risk Matrix and Risk Acceptance Criteria (ALARAP; GAMAB; SFAIRP is defined by the client and vary project to project)  VPF (value of a prevented fatality); defined by the country/project 

The ‘value of preventing a fatality’ (VPF) is the generally accepted metric by which the safety benefit from proposed safety improvements are assessed as an aid to effective decision-making.

Dr. Ajeet Kumar Pandey

Signalling RAM Requirements FC1: Delay to train arrival at terminal station exceeding 1 minute but less than equal to 2 minutes shall happen only after 100 hrs.

Reliability Requirements

MTBAF (in Hrs.) Failure Category FC1: 1 to < 2 Min. FC2: 2 to < 5 Min. FC3: 5 to < 20 Min. FC4: 20 to 99.5 %

MTTR for train other trackside equipment excluding switches

< 30 minutes



Availability shall be calculated using the following formula.

MTTR for equipment located in equipment rooms or control rooms

< 15 minutes

MTTR for point machines

< 60 minutes

Percentage Availability = {1- [DT (CM) I Total Time]} x 100

Dr. Ajeet Kumar Pandey

Signalling RAM Requirements: RAM Apportionment



The RAM analysis of the signalling systems is carried out for the following subsystems:



Train Borne System Reliability;



FEC / IO Rack;



Wayside Reliability;



Wayside Equipment Reliability;



Wayside ATS Reliability;



Trackside Equipment Reliability;



Wayside DCS Reliability;



Control Room Equipment Reliability;



ACE Rack;



Crew Control Room; and



IM Rack;



Maintenance Control Room

ZC

Equipment





Room

peripherals

Wayside Equipment Room peripherals Reliability Prediction 

Relay Racks;



Control CTF;



FODF; and



ATS / DCS Backbone.

The reliability parameters, Mean Time between Failures (MTBF) and Failure Rate (FR), are obtained by the combination of the following methods:   

Using the Line Replaceable Unit (LRU) and component field data from earlier or similar projects; Using vendor supplied data; and Parts count analysis as per guidelines MIL-HDBK-217F Dr. Ajeet Kumar Pandey

Signalling Safety Requirements

Safety Requirements 

General Safety Requirement 

Compliance with legislations, Acts and Standards



Proven in use: proven-in-use systems / components / services shall be used/procured with a known and high degree of safety & reliability.



Fail safe design principle shall be incorporated for safety critical features



Redundancy in design: Inbreeder reduce the probability of occurrence of a failure



All hazards associated with the rolling stock will be mitigated to As Low As Reasonably Practicable (ALARP).



All safety critical functions are implemented using 2oo2 or 2oo3 checked redundancy architecture.



The Contractor shall provide minimum SIL 4 for the following safety functions, e.g. 

Automatic Train Protection (ATP) functions 

Wayside ATP functions



On-board ATP functions

Dr. Ajeet Kumar Pandey

Signalling Risk Acceptance Criteria

Frequency of Occurrence

Risk Evaluation

R1

Intolerable

R2

Undesirable

R3

Tolerable

R4

Negligible

Likely to occur frequently where the hazard will be almost continually experienced = >10 times per year.

Probable:

Will occur several times yearly so that the hazard can be expected to occur often = > 1 times per year.

Severity Level

Catastrophic

Likely to occur several times in the system operation so that the hazard can be expected several times = >1 times in 10 years.

Critical

Unlikely:

Likely to occur in the system operation so that the hazard can be expected to occur a few times = 1 time in 100 years. The hazard is not expected to occur = >1 times in 1,000 years.

Remote:

Very unlikely to occur within the Project duration = >1 times in 10,000 years.

Marginal

Improbable:

Extremely unlikely to occur within the Project duration = >1 times in 100,000 years).

Insignificant

Incredible:

Not conceivably possible that the hazard may occur during t the Project duration = = 25 Min.

RS/Train-Km. 10,000 60,000 90,000 500,000

Availability Requirements

Where, • DT (SC), or Down Time due to service checks • DT(OPM), or Down Time due to Other Preventive Maintenance • DT (CM), or Down Time due to Corrective Maintenance

Dr. Ajeet Kumar Pandey

Rolling Stock RAM Requirements Maintainability Requirements

Maintainability Requirements shall be specified both qualitative & quantitative Qualitative requirement may be as: • • •

Simplicity of maintenance, operation and emergency procedures, ease of repair of damaged cars and equipment, etc. Particular attention shall be paid during the design of the cars to ensure that scheduled maintenance tasks are achieved in minimum time and using minimum manpower. Those components, systems and assemblies which require routine maintenance, frequent attention or unit replacement, shall be easily accessible for in situ maintenance.

Item

Quantitative requirement: The rolling stock will achieve the following maintainability targets:• •



LRU replacement takes less than 30 min. The train consist shall be designed and the maintenance process shall be developed so that the three car set train can be fully overhauled (POH) over a 72-hour period including testing. Component Change-Out Requirements shall be as per the Table shown:

Maximum Person-Hour s Motor Bogie (complete) 6.00 Transmission (each) 4.00 Windshield (excluding sealant c 6.00 uring time) VAC unit 2.00 Propulsion inverter (complete) 4.50 Brake tread unit (each) 1.00 Side window (excluding sealant 2.00 curing time) Auxiliary power supply 1.00 Coupler (complete) 3.00 All air filters 3.00 Brake pad (pair) 0.25 Battery filling 0.25 Light ballast (each) 0.25 Dr. Ajeet Headlight (each) 0.25 Kumar Pandey

Rolling Stock: RAM Apportionment of Reliability Target by RS Contractor at subsystem level done

Sample data value are considered for calculations; matching of some data may be coincidence

Dr. Ajeet Kumar Pandey

Rolling Stock Safety Requirements Safety Requirements



General Safety Requirement 

Compliance with legislations, Acts and Standards



Proven in use: proven-in-use systems / components / services shall be used/procured with a known and high degree of safety & reliability.



Fail safe design principle shall be incorporated for safety critical features



Redundancy in design: Inbreeder reduce the probability of occurrence of a failure



All hazards associated with the rolling stock will be mitigated to As Low As Reasonably Practicable (ALARP).



The Contractor shall provide minimum SIL 2 for the following safety functions 

Wheel Slide Protection and Service Brake Function of BECU



Signal door closed and locked to train



Prevention of unwanted door opening



Emergency opening external/ internal



Obstruction detection

Dr. Ajeet Kumar Pandey

Rolling Stock Risk Acceptance Criteria

Frequency of Occurrence Frequent: Probable: Occasional:

Risk Evaluation

R1

Intolerable

R2

Undesirable

R3

Tolerable

R4

Negligible

Risk Reduction/Control

Shall be eliminated or reduced. Shall only be accepted when risk reduction is impractical on agreement with Client Acceptable with adequate control and on agreement with Client Acceptable without any agreement

Description Likely to occur frequently where the hazard will be almost continually experienced = >10 times per year. Will occur several times yearly so that the hazard can be expected to occur often = > 1 times per year. Likely to occur several times in the system operation so that the hazard can be expected several times = >1 times in 10 years.

Unlikely:

Likely to occur in the system operation so that the hazard can be expected to occur a few times = 1 time in 100 years. The hazard is not expected to occur = >1 times in 1,000 years.

Remote:

Very unlikely to occur within the Project duration = >1 times in 10,000 years.

Improbable:

Extremely unlikely to occur within the Project duration = >1 times in 100,000 years).

Incredible:

Not conceivably possible that the hazard may occur during t the Project duration =

Suggest Documents