methodology there is a great need for the use of special software tools in order to produce reliable and .... mining, text analytics and mostly statistical analysis. 4.
Using Software Tools for Estimation of Monthly Unemployment Rates in Bulgaria – Software Review Alexander Naidenov, PhD, Assist. Prof., Department Statistics and Econometrics, UNWE, Sofia, Bulgaria
Abstract. The economic processes acceleration tendency leads to the need for faster and accurate information about the key indicators in order to observe and eventually to control these processes. The unemployment rate as one of these important indicators is usually produced on quarterly and annually basis. During the last few years Eurostat is discussing the possibilities for production of the monthly labour market indicators with the Member States. Bulgaria, as an EU Member State, have experimented the possibilities for monthly unemployment rates estimation during 2007. Considering the sophisticated estimation methodology there is a great need for the use of special software tools in order to produce reliable and precise monthly unemployment rate estimates. This paper examines the software tools available for the estimation purposes and their application in the practical problem solving situations, emphasizing on the software pros and cons in the process of the Bulgarian monthly unemployment rates estimation. The tools discussed are SPSS ver.20, Demetra+ 1.0.3, g-Calib ver. 2.0 and Ecotrim 1.01. Even though these are widely used across the EU, there are some pitfalls and issues that are considered in the Bulgarian case of estimation using data from the Labour Force Survey of the Bulgarian National Statistical Institute. The software tools screenshots are provided too. Keywords: software, tools, estimation, monthly, unemployment, review
1. Introduction
The unemployment rate as a key economic indicator is a subject to observation not only by the public sector but also by the private one. While a few years ago it was enough to know the overall tendency in the unemployment rate dynamics (i.e. quarter or annual basis), now dictated by the fast changing economic situation the temporal disaggregation (i.e. on a monthly basis) of unemployment time series becomes more and more indispensable. Forced by the Eurostat regulations, Bulgaria as an EU member is obliged to observe these regulations in order to achieve methodology harmonization across the EU members. Therefore there is a great need for unemployment time series disaggregation methodology improvement especially because currently data are produced on quarterly and annually basis only1. Even though there were successful attempts in 2007 for the unemployment data disaggregation, 1
For more information see the Bulgarian LFS methodology (http://www.nsi.bg/otrasalmetodologia.php?otr=26&a1=158&url=img%2FMRUBRIK%2FLFS_Methodology.pdf)
now it’s time to build new working tools for the estimation of the monthly unemployment rates in Bulgaria. Considering the application of those complex statistical tools it is obvious that software tools would be needed in order to facilitate the production of these estimates. As it is described below there is no single solution for the considering problem solution and this leads to the need for a mixed use of a number of software tools.
2. Monthly unemployment rate estimation approaches The estimation of the unemployment rate in practice is based on the estimation of its two main components: number of employed persons and number of unemployed persons. The two latter constitute the rate as follows:
% =
∗ 100
(1)
Also we have to note that according to the EU practice in this monthly estimation field there are 8 main time series produced in order to provide information for 8 important demographic groups, as follows: 15-24 y.o. Males 25+ y.o. Employed 15-24 y.o. Females 25+ y.o.
Unemployment rate
15-24 y.o. Males 25+ y.o. Unemployed 15-24 y.o. Females 25+ y.o.
Fig.1. Monthly unemployment rate estimation groups
For each of these groups (time series) a further elaborate statistical analysis is done by the use of two main types of estimation approaches: •
Direct – using only the raw data from the quarterly based Labour Force Survey (LFS) executed by the Bulgarian National Statistical Institute (NSI).
•
Indirect – using LFS data and time series advanced modeling techniques (incl. interpolation and extrapolation).
We have to note here that both approaches have their pros and cons but elaborate description of the estimation methodology is not given here because the scope of this paper is dedicated to the software tools review2 only (a short review is provided in the next section). Also it is important to note that the literature concerning the estimation procedures discussed here shows a great variety of methods for monthly unemployment estimation but considering the data availability and specifications in Bulgaria, the choice is limited to four possible techniques (see next section). Also the concrete results from the estimation procedures are not shown here because of the confidentiality reasons3. Data used for experimentation purposes span 48 months from January 2008 until December 2011 and are provided by the NSI of Bulgaria. 3. Software estimation tools review As it was mentioned above, because of estimation complexity, it is not possible to achieve the requested monthly unemployment rate estimates without the use of software tools. The latter are specific for each of the approach techniques and can be presented in the following table: Tabl.1. Approaches, techniques and software tools for monthly unemployment rate estimation
Approach
Technique
Software
Calibration weighting
SPSS & g-Calib
Moving averages
SPSS & g-Calib
Extrapolation (incl. seasonal adjustment)
Demetra
Interpolation (data disaggregation)
Ecotrim
Direct
Indirect
In the next few paragraphs a short review is provided for each technique and corresponding software used.
2 3
More information about the methods can be found in Unemployment – LFS adjusted series, Eurostat, 2012 Results are part of an internal NSI project
3.1. Technique No.1 – Calibration weighting using SPSS & g-Calib Technique (short description): Calibration is a specific statistical technique which is mainly used to adjust the survey data to already known population totals or margins (the sum of number of employed, unemployed and out of labour force persons is adjusted to reproduce the total number of all persons aged 15+, according to the recent demographic data available). Calibration is usually based on the regression analysis and in some articles it is stated as inverse regression. By the use of calibration we can combine the available LFS data and demographic data in order to produce the monthly estimates. Software (short description): SPSS (Statistical Package for the Social Sciences) is a worldwide known computer program used for survey authoring and data collection, data mining, text analytics and mostly statistical analysis4. g-Calib has been developed in SPSS language by Statistic Belgium for the purposes of the data calibration. By using it we can import SPSS or MS Excel data into the software and then by the use of special g-Calib programming language we can adjust these data to the needed margins.5
Fig.2. Screenshot of SPSS ver.20 4
For more information see http://www-01.ibm.com/software/analytics/spss/
5
For more information see http://www2.unine.ch/files/content/sites/statistics/files/shared/documents/camille.pdf
Fig.3. Screenshot of g-Calib ver.2.0
Pros&cons: Here we consider only the pro&cons of the g-Calib only because SPSS is already well examined in many other papers and web reviews. Pros: simple interface; semi-automated calibration process; wide functionalities (many calibration options available); uses famous data formats (.sav, .xls, etc); nicely written help file. Cons: programming skills needed; very limited operation system compatibilities (works only on Windows 2000 SP16); too many software bugs and errors (especially concerning missing parameters or software libraries); unclear process messages; no software support available.
3.2. Technique No.2 – Moving averages using SPSS & g-Calib Technique (short description): Moving averages concern the compilation of data from three consecutive periods e.g. months (e.g. February, March and April) and weighting these data with the last available data. Then using averaging procedures a mean estimate is used to represent the data for the middle month (e.g. March). This method is used rarely because the 6
G-Calib software can be used also in other operating systems but only in virtual machines running MS Windows 2000 SP1.
data production for given month requires data for the following month, so the information is produced with a great lag. Software (short description): SPSS – see Technique No.1 for more information.
3.3. Technique No.3 – Extrapolation (seasonal adjustment) using Demetra+ Technique (short description): The presence of data for long periods of time (also called time series) gives the possibilities to analyze the data patterns if they exist. These patterns are usually analyzed by the removal of the seasonal component in those time series. The seasonal adjustment has broadened interpretation in the literature and it concerns the decomposition of the time series in four main components: seasonal, trend, cyclical and irregular component. By the elimination of seasonality we can observe the “real” tendency in given time series. When the time series trend is known then we can predict (extrapolate) the possible future fluctuations in the data, presuming that the trend will retain in the next periods. There is a great variety of methods for seasonal adjustment but two very sophisticated but advanced algorithms are frequently used nowadays: TRAMO&SEATS (Time series Regression with ARIMA noise, Missing values and Outliers & Signal Extraction in ARIMA Time Series) and X-12-ARIMA. By the use of these methods it is convenient to decompose the unemployed and employed time series and then to extrapolate the trends in those data in order to predict the future dynamics of the derivative unemployment rate time series. Software (short description): Demetra+ inherits the former product Demetra 2.2 and is a software tool for seasonal adjustments that was developed and published by Eurostat and implements the use of TRAMO&SEATS and X-12-ARIMA techniques. Both techniques can be divided into two main parts: a pre-adjustment step, which removes the "deterministic" component of the series by means of a regression model with ARIMA noises and the decomposition part. The two techniques use a very similar approach in the first part of the processing but differ in the decomposition part7.
7
For more information see http://www.cros-portal.eu/page/demetra
Fig.4. Screenshot 3.1 of Demetra+ ver. 1.0.3 Pros&cons: Pros: user-friendly interface, all Windows based OS compatibility; wide data file format usage; powerful processing and visualization engines; easy to interpret indicators and statistical characteristics; good possibilities for data and results exportation; parallel multiple time series processing functionalities; data export possibilities; lots of well-written manuals available. Cons: Current version 1.0.3 has a serious bug which makes it impossible to run TRAMOSEATS; too lagged software support (it takes about 3 weeks to answer a customer’s request). 3.4. Technique no.4 – interpolation (data disaggregation) using Ecotrim Technique (short description): Often in the statistical practice it is not feasible and/or not profitable to produce data on very frequent basis e.g. on monthly instead of quarterly or quarterly instead of annually base. But sometimes pushed by need for low frequency time series data (especially for the unemployment) we need find a solution. One of these is so called interpolation technique. It is a very complex method that concerns the constructing of new data points within the range of a discrete set of already known data points also known as curve-fitting techniques using mathematical functions and statistical techniques. The interpolation is based on the regression analysis and mostly uses methods such as: random
walk, Danton movement, Litterman, Fernandez univariate methods and many more methods. Because of the LFS quarterly data availability we can produce monthly estimates using interpolation techniques. Software (short description): Ecotrim is a program for temporal disaggregation of time series developed by Eurostat8.
Fig.5. Screenshot of Ecotrim ver.1.01 Pros&cons: Pros: well-structured interactive data processing mode available; rich variety of temporal decomposition methods; standard MS Excel files usage; Windows based systems compatibility (not support for 64-bit OS version); single and multiple processing modes available; basic exportation possibilities; nicely written manual available. Cons: No Ecotrim software development since year 2002 and therefore no updates and no support available In the process of software usage, from statistical point of view, the estimation tools discussed here (considering their pros and cons) satisfy the needs for the monthly unemployment rates estimation. However the usage of these tools requires good knowledge not only in the field of 8
For more information see http://circa.europa.eu/Public/irc/dsis/ecotrim/library
statistics but also in computer science (esp. in operating systems and programming) which considerably narrows the software usage audience.
4. Conclusion Summarizing the abovementioned we can conclude that the software tools used for estimation of monthly unemployment rates in Bulgaria have their positives and negatives which from one point of view give the opportunities to solve the raised estimation problems but from other point of view perpetuate the users to benefit from all the software functionalities. Excluding SPSS and Demetra+, the software tools need serious updating not only in visual, interaction and functional aspects but also in the recent statistical methodology advances.
5. References 1. National Bank of Poland: Demetra+ User Manual ver.4 (2011) 2. Eurostat: ESS guidelines on Seasonal Adjustment (2009) 3. ILO: Recommendations on Seasonal Adjustment for Employment and Unemployment Data (2010) 4. Eurostat: Unemployment – LFS adjusted series (2012) 5. UN: Working Instructions for Seasonal Adjustment with Demetra (2009) 6. Infante, E.: Using DEMETRA+, Paris (2011) 7. Kocak, N.: An Analysis of German Industrial Production with DEMETRA+, Turkish Statistical Institute (2011) 8. Eurostat, http://epp.eurostat.ec.europa.eu/ 9. National Statistical Institute of Bulgaria, http://www.nsi.bg/ 10. Collaboration between Researchers and Official Statisticians, http://www.crosportal.eu/page/seasonal-adjustment