Anomaly Detection In Onboard-Recorded Flight Data Using Cluster Analysis Lishuai Li, PhD Candidate, MIT (
[email protected]) Maxime Gariel, Postdoc, MIT (
[email protected]) R. John Hansman, Professor, MIT (
[email protected]) Rafael Palacios, Professor, Comillas Pontifical University, Spain (
[email protected])
30th DASC October 18, 2011
Motivation
Massive data collected during routine flights Can be used to improve safety and operations Data recorded onboard by Flight Data Recorder (FDR) or Quick Access Recorder (QAR) No. of parameters: 88-2000 Sample rate: 0.25-8 Hz
Annual fatal accident rate (per million departures)
Aviation safety management Past: learn from accidents Now: proactively identify safety hazards during routine operations
50
Rest of the world US & Canadian operators
40 30 20 10 0 60
70
80
90
00
10
Year
Fatal Accident Rate – Commercial Jet Fleet1 Pitch Heading Roll Pressure Alt. Gear. Radio Alt. Calibrated Airspeed Groundspeed 15:05
15:30
16:00 Time (MM:SS)
16:30
16:55
Onboard Recorded Flight Data Example2 1 Boeing,
2010 Statistical Summary, June 2011.
2
NTSB, FDR Group Chairmen Factual Report, DCA09MA027, 2009
1
Current Data Analysis Current analysis method commonly used in Flight Operational Quality Assurance (FOQA) programs Exceedance analysis 9 Detect anomalous events by pre-specified limits Examples of Exceedance Events1 Event Parameters and safety limits Takeoff Climb Speed High HAT > x ft, HAA < x ft, CAS > V2 + x kts Excessive Power Increase
HAT < 500 ft 1 > x
Operation Below Glideslope Glide Slope Deviation Low > x dots, HAT < x ft
Statistical analysis on specific queries Altitude At W hich Landing Flap Is Set 2 500
Count
400
300
200
100
0 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
Altitude (feet AAL)
Limitation: Only known issues are examined 1 Federal
Aviation Administration. 2004. Advisory Circular. 120-82 Flight Operational Quality Assurance
2 Campbell,
Neil. 2003. Flight Data Analysis – An Airline Perspective
2
Objective Develop a method to detect anomalous flights From routine FDR/QAR data Without specification of what parameters to watch and their thresholds Characterize data patterns of multivariate time series Identify nominal cases and abnormal cases Identify anomalous flights, which are then referred to domain experts for further review
3
Approach Multivariate Cluster Analysis Identify clusters of similar flights based on multiple flight parameters Hyperspace
Flight Parameters of Flight X
Anomalous flights
2000
1500
1000
500
0
- 500 4400
4500
4600
4700
4800
4900
5000
5100
5200
4900
5000
5100
5200
…
v V
Cluster 1
Altitude
2500
N1
34
33
32
31
30
Cluster 2
29
28
27 4400
Perform cluster analysis in a hyperspace where each data point represents a flight Identify nominal clusters Detect outliers
4500
4600
4700
4800
Need to transform data recordings to a form applicable for cluster analysis Convert all time series to a vector for each flight 4
Three-Step Process General Approach
Detailed Steps
Flight Parameters of Flight X Altitude
2500
Time Series
2000
1500
1000
500
0
- 500 4400
4500
4600
4700
4800
4900
5000
5100
5200
…
Step 1 - Time Series to Vectors
N1
34
33
32
31
30
29
28
27 4400
4500
4600
4700
4800
4900
5000
5100
5200
v V
Vectors of High Dimensions
Step 2 - Dimension Reduction
Hyperspace Anomalous flights
Vectors of Reduced Dimensions
Only when the number of data points is smaller than vector dimensionality
Step 3 - Cluster Analysis
Cluster 1
Cluster 2
Clusters Anomalous Flights 5
Method Evaluation A limited set of FDR data was available for this research 2881 flights 7 aircraft types (13 model variants) Subsets by aircraft model were separately evaluated Available flight parameters vary by aircraft model Results of B777 subset were presented in this paper 9 Number of flights: 365 9 Number of flight parameters: 69
Applied the multivariate cluster analysis approach Three-step process Evaluated results from cluster analysis Anomalous flights Nominal clusters
Step 1: Time-series to Vectors Obtain samples referring to a specific event Takeoff Phase
Approach Phase
3000
X90
2000 1000
X0 1320
1340
N1 (%)
100
50
1360
1380
1400
Yt
1500 1000
1360
1380
1400
1420
X90 1212
40 20 1210
1214
1216
1218 12
1220
Yt
50 30
Time from start (sec)
Xt
60
Y90 Y
1340
X0
500
1420
Y0 1320
2000
0 1210
…
…
0 1300
X
Xt
Pressure Alt. (ft)
4000
0 1300
Y
Back track samples from touchdown to 6nm before touchdown
N1 (%)
X
Pressure Alt. (ft)
Track samples from applying takeoff power to 90 sec after takeoff
Y0
Y90 1212
1214
1216
1218 12
1220
Ground track distance (nm)
Form a vector for each flight: r V [ X 0 , X 1 ,L , X 90 , Y0 , Y1 ,L , Y90 ,L ] 91 samples per parameter 68 parameters for takeoff 69 parameters for approach (“Radio Height” only available during approach)
7
Step 2: Dimension Reduction Due to the sparseness of this dataset, need to reduce dimensions to increase data density for cluster analysis Number of data points 365 flights