May 11, 2018 - Outliers in Functional Time Series â Challenges for. Theory and Applications of Robust Statistics. Daniel Kosiorowski1. Dominik Mielczarek2.
Our proposal
Challenges
Summary
Outliers in Functional Time Series – Challenges for Theory and Applications of Robust Statistics Daniel Kosiorowski1 2 AGH
Dominik Mielczarek2
Jerzy P. Rydlewski2
1 Cracow University of Economics, Poland, University of Science and Technology, Krakow, Poland,
XII International Conference in Honour of prof. A Zelia´s, 8–11 May 2018, Zakopane, Poland
References
Our proposal
Challenges
Summary
References
MOTIVATIONS Many economic phenomena may be described as time series of functions on a certain continuum i.e., e.g., the Internet traffic, air pollution intensity or an electricity consumption within consecutive days and nights, utility curves of a local community during half-century. These phenomena may effectively be studied within statistical methodology called functional data analysis (FDA). Functional outliers (FO) may adversely influence a performance of statistical procedures dedicated for the FDA like functional analysis of variance or functional regression or discrimination analysis. FO may signal frauds in trade exchange, "systematic, not rapid" intrusions into transaction or informational systems. Outliers may signal an occurrence of a new phenomenon. Time dependency decrease a quality of outliers detection tools. Well established depth functionals fail to recognize obvious outliers.
Our proposal
Challenges
Summary
References
FUNCTIONAL TIME SERIES (FTS)
FUNCTIONAL OBSERVATIONS We treat observations as random curves X = {x(t), t ∈ [0, T ]}, where T is fixed, 2 i.e., as random elements R from the separable Hilbert space L ([0, T ]) with the usual inner product hx, y i = x(t)y (t)dt (see Horváth & Kokoszka, 2012). The probabilistic space is equipped with the Borel σ−algebra. Furthermore, it has been proved that probability distributions do exist for functional processes with values in Hilbert space (see Bosq, 2000). FUNCTIONAL TIME SERIES Functional time series (FTS) is a series of functions indexed by time.
Our proposal
Challenges
Summary
References
A CONCEPT OF OUTLIER Observations, which behave differently from the majority of data. This may be caused by errors, but outliers could also have been recorded under exceptional circumstances, or they possibly belong to another population. They may have harmful effect on the conclusions drawn from data, but may also contain valuable information. Outliers detection procedure: Find a fit which is close to the fit we would have found without outliers, identify outliers by their large deviation (distance or residual) from that fit. By choosing robust fit we avoid masking and swamping phenomena (known limitations of the three sigma edit rule). Classical method of outliers detection base on the boxplot proposed by Tukey. It is known, that the method is not appropriate, for example, for skewed distributions (swamping-good points are flagged as outliers). Outlyingness depends on the assumed statistical model (P. Huber).
Our proposal
Challenges
Summary
References
UNEMPLOYNMENT RATE (UR) vs. MINIMAL WAGE (MW) IN FRANCE AND CZECH REPUBLIC IN A PERIOD OF 1999-2016 CZECH REPUBLIC 1999−2015
20
FRANCE 1999−2015
35
LS
18
DeepReg
16
30
M
UR
UR
MM
25
14
LS
DeepReg
12
M
10
20
MM
1000
1100
1200
MW
1300
200
250
300
350
MW
400
450
500
Our proposal
Challenges
Summary
References
EXAMPLE OF 1D TIME SERIES WITH OUTLIERS
400 300 200 100
AirPassengers
500
600
The contaminated Box and Jenkins example
1950
1952
1954
1956
1958
1960
Our proposal
Challenges
Summary
References
AMPLITUDE AND PHASE VARIABILITY OF FDA OBSERVATIONS
Magnitude outliers–they lay outside the range of the vast majority of data, Shape outliers–they are within the range of the rest of the data but differ in shape from them
Our proposal
Challenges
Summary
References
SAMPLE FROM FAR(1) MODEL WITH 5% SHAPE OUTLIERS
0 −4
−2
Value
2
4
Sample from functional autoregrresive model FAR(1) with outliers
0
10
20
30
40
50
60
40
50
60
Time
0 −5
Value
5
Functional boxplot
0
10
20
30 Time
Given the effect of dependence, an observation could be magnitude outlier despite lying in the central region of the dataset.
Our proposal
Challenges
THE MODIFIED BAND DEPTH (MBD)
Summary
References
Our proposal
Challenges
Summary
References
THE MBD Generalized band depth (GBD, see López-Pintado and Jörnsten, 2007) and modified band depth (MBD, see López-Pintado and Romo, 2009) of curve x with respect to functional sample X N estimates the curves’ frequency of being in the center. We have a sample of n functions X n = {x1 , ..., xn }. Let us define: GBD(x|X n ) =
2 n(n − 1)
X 1≤i1