Uses of Symbolic Objects in Official Statistics

8 downloads 112 Views 26KB Size Report
EUSTAT, Basque Statistical Office, c/Duque de Wellington,2. ... different countries of the European Union, due to the fact that all have common variables as the ...
Uses of Symbolic Objects in Official Statistics Anjeles Iztueta1 & Patricia Calvo2 & Seppo Laaksonen3 & Edwin Diday4. 1,2 EUSTAT, Basque Statistical Office, c/Duque de Wellington,2. 01012 Vitoria-Gasteiz, Spain. [email protected], [email protected] 3 Statistics Finland. [email protected] 4 University Paris IX Dauphine and INRIA, France. [email protected] 1. Introduction Symbolic Objects created by E.Diday are a new way of representation of complex data. They are new statistical units, that summarize groups of individuals or data expressed as confidence or guarantee intervals. These statistical units take into account the quality of the data, and allow to do composed analysis of independent samples that come from different organizations of Official Statistics. This is the aim of the SODAS project (Esprit project of the European Commission), as well as to develop a software of symbolic data analysis and management. The process of building symbolic objects may have as starting point queries to a relational database. These dynamic queries extract automatically groups of individuals with common characteristics, as for example families, regions, etc. That is, each generated object defines a group of individuals. The symbolic objects created by means of queries, are also stored in tables. 7DEOH6\PEROLF'DWD7DEOH So 1

Sex {woman(0.33), man(0.67)}

Wages {[25:57]}

Profession or Office {Superior technicians and professionals (0.35), Managers (0.25), Administrative managers (0.4)}

So 2

{woman(0.5), man(0.5)}

{[18:42]}

{Merchants and salespersons(0.55), Administrative (0.45)}

In this example, each object represents a group of individuals and the whole table corresponds to a query of a relational database. The values taken by the variables on the symbolic objects can be a probability distribution as in "sex" and "profession" variables or a confidence interval as in "wages" variable. 2. One application: Data fusion. In recent years, techniques to link data from different files have been developing and applying for further analysis. These techniques need to safeguard the statistical confidentiality, and start from data files with a common part and independent samples. Here, matching is planned as group linkage and not as record linkage. An example showing this new use would be the linkage of continuous Labour Force samples of different countries of the European Union, due to the fact that all have common variables as the sociodemographical (sex, marital status, age, level of education and relation to labour force). The process of building symbolic objects starts once defined the common variables and modalities of the different surveys. Groups are going to be defined by the Cartesian product of the modalities of common variables. Each one is going to be described by a symbolic object. Symbolic objects of different data files are created separately and they are linked later. The linked symbolic objects summarize information of all data files and they are the base of further analysis. The following graph is created with the SODAS software and visualizes a symbolic object join of two EUSTAT surveys, Use of Time (EPT) and Living Conditions (ECV).

)LJXUH=RRPVWDURIDV\PEROLFREMHFW Variables of Living Conditions: jorn (duration of working day), comt (return home to have lunch), ract (branch of economic activity). Variables of Time Use: ropa (participation in arranging clothes), prac (sport practice), limp (participation in cleaning), prpc (time used in preparing meals, minutes). 3. Other application: Analysis of Interval Data Symbolic Objects may be also applied to data as confidence or guarantee intervals. In data estimation problems, symbolic objects would define the confidence intervals of the estimations, without being restricted to the punctual estimation. In the case of guarantee intervals, symbolic objects would be used as a data protection measure. Thus, the privacy of sensible data of a database would be assured if instead of giving these data as a result of a query, the result would be a symbolic object of interval type. REFERENCES DIDAY, E. (1992). Analyse des données et classification automatique numérique et symbolique. Seminario Internacional de Estadística en Euskadi. Volume 27. EUSTAT. DIDAY, E. (1998). Symbolic Data Analysis: a theory and tool for Data Mining. Invited conference at IFCS’98. Rome. Ed. Springer Verlag . DIDAY, E. and HEBRAIL, G. (1998). Symbolic Data Analysis: some in and out. KESDA'98. IZTUETA, A., and CALVO, P. (1998). Utilities and Applications of Symbolic Data Analysis to Official Statistics. KESDA'98. NOIRHOMME-FRAITURE, M., and ROUARD, M. (1998). Representation of Sub-populations and Correlation with Zoom star. Proc. NTTS'98 Sorrento, Italy. RÉSUMÉ Les Objets Symboliques sont utilisés en Statistiques Officiels pour résumer grandes quantités de données et peuvent décrire des groupes de population, régions, etc... Ces nouvelles unités statistiques constituent un pont naturel entre bases de données et statistiques. Dans ce travail on présente quelques applications des objets symboliques comme Fusion des données et analyse des Intervalles de Confiance.