Document not found! Please try again

Robust Dynamic Plots for Detection of Regression ...

3 downloads 0 Views 60KB Size Report
1 European Commission, Joint Research Centre, Institute for the Protection and ... authors and other statisticians cooperate with the European Anti-Fraud Office.
Robust Dynamic Plots for Detection of Regression Outliers and Mixtures D. Perrotta1 , F. Torti2,3 and M. Riani3 1

2 3

European Commission, Joint Research Centre, Institute for the Protection and Security of the Citizens, I-21027 Ispra, Italy Dipartimento di Statistica, Universit` a di Milano Bicocca, I-20126 Milano, Italy Dipartimento di Economia, Sezione di Statistica, Universit` a di Parma, I-43100 Parma, Italy

Keywords: Forward Search, outlier detection, statistical dynamic graphics, fraud detection

Abstract One of the biggest challenges of the statisticians is to be able to effectively present and communicate statistical results (Tufte 1983, Spence 2001). This happens in particular when working on applied problems with non-statisticians. For example in the Joint Research Centre (JRC) of the European Commission the authors and other statisticians cooperate with the European Anti-Fraud Office (OLAF) to highlight in European trade data potential cases of fraud, data quality issues and other oddities related to specific international trade contexts. Among the methods used for this purpose, there is the Forward Search of Atkinson et al (2000, 2004), a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. A natural extension of the Forward Search to the estimation of regression mixtures (Riani et al 2008) has been also tested on trade data. Unlike other robust approaches, the Forward Search is a dynamic process that produces a sequence of estimates. In fact in the Forward Search we monitor the evolution of residuals, parameters estimates and inferences as the subset size increases, presenting our results as forward plots which show the evolution of the quantities of interest as a function of sample size. One of the problems of the Forward Search has always been the lack of an automatic link among the information which comes from the great variety of plots which are monitored. Therefore, we have recently developed interactive tools which fill this gap by dynamically linking the information which comes from different robust plots and by providing the user with flexible selection tools. In this poster we describe these new robust graphical tools and we show how they can be used to explore complex structures in the data such as groups of outliers and linear mixtures. A real life demonstration will be given using our MATLAB toolbox on several datasets, including trade data affected by various types of oddities.

References A.C. Atkinson and M. Riani (2000) Robust Diagnostic Regression Analysis, Springer–Verlag, New York. A.C. Atkinson , Riani and Cerioli (2004) Exploring Multivariate Data with the Forward Search, Springer–Verlag, New York. M. Riani, A. Cerioli, A.C. Atkinson, D. Perrotta and F. Torti (2008). Fitting Mixtures of Regression Lines with the Forward Search. In: Fogelman-Soulie et al., editors, Mining Massive Data Sets for Security, pp. 271-286, IOS Press: Amsterdam (Netherlands). R. J. Spence (2001). Information Visualization, Addison Wesley: California. E. R. J. Tufte (1983). The Visual Display of Quantitative Information, 2nd Edition, CT:Graphics Press.

Suggest Documents