Challenges for Theory and Applications of Robust Statistics

1 downloads 0 Views 637KB Size Report
May 11, 2018 - Outliers in Functional Time Series – Challenges for. Theory and Applications of Robust Statistics. Daniel Kosiorowski1. Dominik Mielczarek2.
Our proposal

Challenges

Summary

Outliers in Functional Time Series – Challenges for Theory and Applications of Robust Statistics Daniel Kosiorowski1 2 AGH

Dominik Mielczarek2

Jerzy P. Rydlewski2

1 Cracow University of Economics, Poland, University of Science and Technology, Krakow, Poland,

XII International Conference in Honour of prof. A Zelia´s, 8–11 May 2018, Zakopane, Poland

References

Our proposal

Challenges

Summary

References

MOTIVATIONS Many economic phenomena may be described as time series of functions on a certain continuum i.e., e.g., the Internet traffic, air pollution intensity or an electricity consumption within consecutive days and nights, utility curves of a local community during half-century. These phenomena may effectively be studied within statistical methodology called functional data analysis (FDA). Functional outliers (FO) may adversely influence a performance of statistical procedures dedicated for the FDA like functional analysis of variance or functional regression or discrimination analysis. FO may signal frauds in trade exchange, "systematic, not rapid" intrusions into transaction or informational systems. Outliers may signal an occurrence of a new phenomenon. Time dependency decrease a quality of outliers detection tools. Well established depth functionals fail to recognize obvious outliers.

Our proposal

Challenges

Summary

References

FUNCTIONAL TIME SERIES (FTS)

FUNCTIONAL OBSERVATIONS We treat observations as random curves X = {x(t), t ∈ [0, T ]}, where T is fixed, 2 i.e., as random elements R from the separable Hilbert space L ([0, T ]) with the usual inner product hx, y i = x(t)y (t)dt (see Horváth & Kokoszka, 2012). The probabilistic space is equipped with the Borel σ−algebra. Furthermore, it has been proved that probability distributions do exist for functional processes with values in Hilbert space (see Bosq, 2000). FUNCTIONAL TIME SERIES Functional time series (FTS) is a series of functions indexed by time.

Our proposal

Challenges

Summary

References

A CONCEPT OF OUTLIER Observations, which behave differently from the majority of data. This may be caused by errors, but outliers could also have been recorded under exceptional circumstances, or they possibly belong to another population. They may have harmful effect on the conclusions drawn from data, but may also contain valuable information. Outliers detection procedure: Find a fit which is close to the fit we would have found without outliers, identify outliers by their large deviation (distance or residual) from that fit. By choosing robust fit we avoid masking and swamping phenomena (known limitations of the three sigma edit rule). Classical method of outliers detection base on the boxplot proposed by Tukey. It is known, that the method is not appropriate, for example, for skewed distributions (swamping-good points are flagged as outliers). Outlyingness depends on the assumed statistical model (P. Huber).

Our proposal

Challenges

Summary

References

UNEMPLOYNMENT RATE (UR) vs. MINIMAL WAGE (MW) IN FRANCE AND CZECH REPUBLIC IN A PERIOD OF 1999-2016 CZECH REPUBLIC 1999−2015

20

FRANCE 1999−2015

35

LS

18

DeepReg

16

30

M

UR

UR

MM

25

14

LS

DeepReg

12

M

10

20

MM

1000

1100

1200

MW

1300

200

250

300

350

MW

400

450

500

Our proposal

Challenges

Summary

References

EXAMPLE OF 1D TIME SERIES WITH OUTLIERS

400 300 200 100

AirPassengers

500

600

The contaminated Box and Jenkins example

1950

1952

1954

1956

1958

1960

Our proposal

Challenges

Summary

References

AMPLITUDE AND PHASE VARIABILITY OF FDA OBSERVATIONS

Magnitude outliers–they lay outside the range of the vast majority of data, Shape outliers–they are within the range of the rest of the data but differ in shape from them

Our proposal

Challenges

Summary

References

SAMPLE FROM FAR(1) MODEL WITH 5% SHAPE OUTLIERS

0 −4

−2

Value

2

4

Sample from functional autoregrresive model FAR(1) with outliers

0

10

20

30

40

50

60

40

50

60

Time

0 −5

Value

5

Functional boxplot

0

10

20

30 Time

Given the effect of dependence, an observation could be magnitude outlier despite lying in the central region of the dataset.

Our proposal

Challenges

THE MODIFIED BAND DEPTH (MBD)

Summary

References

Our proposal

Challenges

Summary

References

THE MBD Generalized band depth (GBD, see López-Pintado and Jörnsten, 2007) and modified band depth (MBD, see López-Pintado and Romo, 2009) of curve x with respect to functional sample X N estimates the curves’ frequency of being in the center. We have a sample of n functions X n = {x1 , ..., xn }. Let us define: GBD(x|X n ) =

2 n(n − 1)

X 1≤i1

Suggest Documents