Istat statistical process modelling and the Generic Statistical Business ...

3 downloads 47158 Views 108KB Size Report
Generic Statistical Business Process Model (GSBPM) is becoming more and more .... units, external units or bodies involved in data collection, software used to ...
Istat statistical process modelling and the Generic Statistical Business Process Model: a comparison Giovanna Brancato, Giorgia Simeoni [email protected], [email protected] Italian National Statistical Institute - Istat Abstract: Istat metadata and quality documentation system (SIDI/SIQual) relies on a framework that was developed in the late ’90 for surveys and extended later on to describe statistical compilations. Istat model is based on the integration of three main layers: current statistical production activities (phases and operations); quality control actions (special activities aimed at preventing, monitoring and evaluating errors) and use of generalised software to perform the current activities. This scheme is complemented with further areas of documentation: an articulated area of basic documentation on the content of the process, a repository of all the related documents (regulations, questionnaires, manuals, quality reports…), the standard quality indicators. The system was implemented to meet quality requirements, therefore it is strongly oriented to quality and methodology issues. Recently, the Generic Statistical Business Process Model (GSBPM) has become popular and widespread across Statistical Agencies. The GSBPM is a flexible tool to describe processes producing official statistics, based on a four-levels structure, integrated with some attributes (inputs, outputs, purpose, …). The paper will compare the two models analysing strengths and weaknesses of both approaches. The results of this comparison will be also used to improve the thesaura of SIDI/SIQual system to better respond to its uses.

1.Introduction In recent years great attention is being paid to statistical process documentation models. The Generic Statistical Business Process Model (GSBPM) is becoming more and more popular [14]. Istat has a long-standing experience in statistical process and quality documentation, having designed a structured system, named SIDI/SIQual in the late ’90s and launched its implementation from 2001 [4,5,6]. So far, the system has been successfully used for many purposes, among them: to respond to Eurostat evaluation on the compliance to the Code of Practice, to support direct quality assessment activities producing information for audit and self-assessment procedures, and to provide indirect quality evaluation generating quality reports for Istat top-management decision process. 1

In this paper GSBPM and Istat model have been analysed. Section 2 reports a brief presentation of GSBPM and a review of its main applications. Section 3 illustrates Istat model for process documentation. Section 4 highlights similarities and divergences between the two approaches. In Section 5 some general conclusions are drawn also considering different possible applications. 2. The Generic Statistical Business Process Model and its application The GSBPM has been developed within the Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata [14]. It is intended as a comprehensive model suitable to document any kind of official statistics business process, from the more traditional survey to the administrative data acquisition or to the statistical compilation. The GSBPM is organised in four levels: Level 0, the statistical business process; Level 1, the nine phases of the statistical business process; Level 2, the sub-processes within each phase and Level 3, a description of those sub-processes. It also recognizes several over-arching processes that apply across the statistical business process. The most relevant over-arching processes are quality and metadata management. When applying the model it is also strongly recommended to identify some attributes for each sub-process (level 2) such as inputs, outputs, purpose, owner, guides, enablers and feedback loops or mechanisms. The GSBPM is to be interpreted in a very flexible way: some elements can be skipped; the steps should not be followed in strict order and certain sub-processes can be repeated creating loops if necessary. The final version of GSBPM was approved by the METIS Steering Group for public release in April 2009. In less than 3 years the model has been or is being adopted by several National Statistical Institutes (NSIs) and International Organisations; in other cases, when a model was already used within an institution, it has been mapped with the GSBPM for international comparability reasons, in a very similar way to what is done in this paper, revealing its broad applicability [1,2,8]. The GSBPM original purpose was to provide a standard terminology to be used when discussing about metadata and statistical processes. Further valuable goals, such as support in process standardisation, quality improvement and efficiency saving have been reported by its application. In recent years, many NSIs are carrying out corporate projects with the aim of improving quality and efficiency and reducing costs. They are addressed to gradually substitute the traditional stovepipe organisation of production processes to one with broad sharing of harmonised methods and generalised tools. In general terms, GSBPM is seen as one of the pillars of these standardisation and 2

industrialisation projects [10] and it is being widely and successfully applied supporting them as a reference model. In particular, it is used to: i) analyse and describe the different phases and subphases of the current production processes; ii) identify which of them are common and iii) prioritise the investments [8,11]. It may also support the process of rationalisation of IT or methodological tools. In some applications, all the existing software or manuals are classified in the sub-processes of GSBPM and the result is used to rationalise, standardise and share the tools and/or to identify lacking areas. [2,8,11]. Furthermore, the GSBPM proved to be worthwhile as a reference model for quality assessment procedures. At Statistics Canada, for example, it was used to guide the discussions during the Quality Assurance Reviews in 2009 [8]. It revealed to be perfectly fit for use in all different kinds of statistical processes and valuable in order to identify the more risky phases and those where finding solutions could result in wider applicability. At Statistics Denmark, within the corporate project of standardisation started in 2009, the GSBPM was used to in-depth analyse several processes and prepare a list of observations and possible improvement actions for each process. Half a year later, a follow-up on the processes showed that a lot of improvement actions identified had been already implemented [11]. 3. Istat standard for statistical process documentation Since the late ’90s, Istat has developed an information system for documenting quality of the statistical production processes, i.e. processes aimed at producing statistical information such as surveys and statistical compilations, named SIDI [4,5,6]. The system documents standard quality indicators and reference metadata supporting the correct interpretation of the quality measures. In 2006 a navigation system, known as SIQual was released on Istat website and the English version was made available in 2009 (http://siqual.istat.it/SIQual/). Appropriate standards for terminology, conventions, documentation and formulas as well as navigation functionalities have been defined, in order to develop the quality documentation system and to have a proper representation of metadata and quality indicators. The SIDI/SIQual system is rather wide and has interactions with other planning, management and dissemination information systems within the Institute. In this paper, the focus will be bounded to the metadata for documenting the statistical production processes. The most important feature of Istat approach for describing the statistical business processes is the definition of a multilayer framework identifying three main strata, strictly connected one each other: 3

i) phases and operations; ii) quality control actions; iii) generalised software. In detail, for each statistical production process the following entities are documented: the activities manipulating data or oriented to their production, namely operations; the quality control actions aimed at preventing, monitoring and evaluating errors that affect accuracy; the generalised software used to perform both of them. In addition, in Istat approach, some topics related to the process, considered particularly relevant and requiring articulated metadata, are drawn from the operations and treated with more indepth documentation. This is the case of sampling design and index production methodology, as well as other topics as process periodicities, i.e. the frequencies of data collection, treatment and dissemination. Apart from the activities performed in the framework of the quality control system (quality control actions), the quality documentation area includes process and product standard quality indicators defined for surveys and for statistical compilations, as well as quality reports. An area reporting metadata on the content of the process is also necessary for quality interpretation. A description of the main objectives of the process, the observation units, the statistical domains of interest and the questionnaires used to observe them are therefore available in the system. Another important area is represented by the documents’ repository. For each process, a set of documents classified according to their nature is stored, going from questionnaires to manuals and regulations, to more operational documents as, for example, interviewers’ instructions.

Content

In-depth documentation: o Sampling design

Operations Phases Operations Sub-operations

o Index production methodology

Generalised software Phases Software Quality control actions Phases/Non sampling Errors • Preventing • Monitoring • Evaluating

Documents repository o Regulations, Manuals o Questionnaires o Documents (field operations, methods, standards, …) Standard Quality Indicators - Process oriented

Quality reporting

- Product oriented

000251660288251659264The framework is represented in the following scheme.

4

Almost the totality of the metadata elements defined in SIDI/SIQual are characterised by a mandatory descriptor, named “validity period”. For example, when documenting each operation/action/software the statistical production process editions to which they apply have to be specified, allowing to understand if they are performed occasionally or currently, in the same or in different process editions. In general, the items to be used for documenting a statistical production process are included in a “thesaurus”, that is a list of standard terms, shared among different processes. These lists are continuously updated by the quality pilots in charge of documenting Istat processes, that can add new items. These items are then subject to a centralised control and validation procedure, ascertaining the need for their inclusion, position in the hierarchy (if applicable), wording and English translation. A further element of SIDI, not reported in the above scheme, is represented by the actors involved in the operations/actions. In the initial version of the system, the “enabler” in charge of carrying out each operation/action (interviewer, coder, software, …) and its main characteristics were also documented and the generalised software was not available on the thesaurus form. Later on, since the majority of the activities in a process was carried out by Istat personnel in charge of it, this piece of information was reduced and a focus on the following “enablers” was maintained: reporting units, external units or bodies involved in data collection, software used to perform the activity. For the latter, an ad hoc thesaurus was created and designed as a further dimension of the process, besides the operations and the quality control actions. Concerning the reporting units, some relevant characteristics can be specified (definition, frame, different typology such as if they are also observation units or if they are used in the sample design and at which stage). Phases and operations, quality control actions and generalised software have a hierarchical structure. In general, current activities aimed at producing statistical information are documented according to main phases, within each phase operations and sub-operations are specified. The operations are expressed so as to document how the activity is performed and not simply that an activity is carried out. Within each phase, more than an operation can be selected. The levels of the hierarchy are not pre-defined, being up to a maximum of four. In addition, thesaura are tailored according to the type of the statistical production process: they are different for surveys and statistical compilations, sharing proper common phases/operations, quality control actions and 5

software. Statistical processes derived from administrative sources have a different data collection phase respect to the direct surveys and an additional focus on administrative features. For any direct survey, the phases included in Istat model are:

000251728896251730944251729920251731968For each phase, maximum detailed operations have to be selected. For example, as reported in the figure below, the specific type of Coding has to be chosen. For each operation, quality control actions can be documented. As already mentioned, where applicable, these actions are organised into preventing, monitoring and evaluating activities and grouped with respect to the main sources of error (e.g.: unit nonresponse, interviewer effect, …). Obviously, generalised software applies to operations/quality actions performed by electronic means. Keeping up the previous example, some quality actions are: training for coders, debriefing, control surveys to evaluate coding error. The use of ACTR (Automatic Coding by Text Recognition) for automatic coding and the use of BLAISE for assisted coding can be documented as generalised Data preprocessing Manual revision

Data preprocessing Quality control actions

… …

Automatic or computer assisted coding

Coding control

Coding Manual coding

Computer assisted coding Automatic coding Data entry



Prerequisites for coding control

BLAISE Prevention of coding errors Initial training for coders



Controlled data entry

… Coder control during field operations

Coder debriefings …

Phases and operations

ACTR

Ex-post evaluation of coding error operations Control surveys to evaluate coding error

6

Generalised sotware

software.

4. Istat SIDI/SIQual model and GSBPM: an analysis 0251765760Istat SIDI/SIQual and GSBPM frameworks have been developed for different purposes. The former has a strong orientation to quality documentation and is based on the assumption that a good knowledge of the business process is required for a correct interpretation of quality. As a consequence, the statistical production process, as well as other suitable characteristics are documented in detail and used to interpret standard quality indicators’ trends. In addition, a great emphasis has been placed on the fruition of the metadata and quality indicators for different users by developing a navigation system allowing to exploit the documentation, SIQual. The GSBPM was primarily intended to document any official statistics business process. The two approaches share a similar description of the statistical businesses process, through flows of activities hierarchically organised. They both have provided a means for the standardisation of the terminology, creating a common language, at national and at international level, respectively. A thorough analysis of the meaning of each GSBPM element and a comparison with SIDI/SIQual phases/operations and quality control actions have been performed. The higher level mapping is reported in the following figure.

GSBPM

ISTAT SIDI/SIQual

1. Specify needs

Planning

2. Design

Frame Development

3. Build

Re-planning

4. Collect

Data Collection

Data Pre-processing 5. Process

Editing and Imputation

Data Processing 6. Analyse

Data Validation

Statistical Disclosure Control 7. Disseminate

Dissemination Data Storage

8. Archive Documentation 9. Evaluate

Evaluating Quality Control Actions

7

An overall correspondence between levels 1 of GSBPM and the phases of Istat SIDI/SIQual model is observed. Different emphasis is placed on different phases by the two models. For example, the activities that in GSBPM are included in “5. Process” are split in SIDI/SIQual into different phases: “Pre-treatment”, “Editing and imputation” and part of “Data processing”. In other cases, the different views of the phases lead to different organisation in the model. In particular, Istat model gives relevance to the Planning of the process, identifying a phase aimed at designing the complete process and carried out before the first process edition. The development and the test of the tools, documented in phase “3. Build” of GSBPM, in SIDI/SIQual are structured in the quality actions thesaurus. Given its quality orientation, Istat approach stresses the “Frame development” step, being a relevant phase in sample surveys and the source of coverage errors. In addition, the redesign is documented through the “Replanning” phase. Though phase “9. Evaluate” of GSBPM seems not being represented in Istat standard, it has to be mentioned that some of the issues described under this phase are already placed in the evaluating quality control actions. More in general, quality management documentation is more structured in Istat approach by means of an ad hoc thesaurus, whereas GSBPM reports it as an over-arching process, without detailing it in the model. In conclusion, Istat approach appears to be more articulated and structured. As already mentioned, a substantial difference between the two models is that GSBPM states if an activity is performed, whereas SIDI/SIQual provides general elements on how the activity is performed. However, it seems like such a detail in GSBPM would be reached on level 3, that has not been standardised in the model, making it more flexible. The definition of standard items to describe the process up to a detailed level, as realised in Istat model, supports the search function and the automatic identification of the common practices, methods and software. A possible drawback is that the standard metadata items could not properly reflect process specificities and for this reason, in SIDI/SIQual, the documentation can be enriched by adding several free-text descriptions. Both models permit to skip some steps and, in principle, to document process cycles. In SIDI/SIQual the ordering of the phases is pre-set and a cycle can be documented by adding new items to the thesaurus. Nevertheless, process cycles are not appealingly represented in the navigation side of the system. GSBPM, as already mentioned, is not a linear model and certain subprocesses can be repeated creating loops, however this extreme freedom implies an additional effort that is to define the workflow, left unstructured in the model. 8

Finally, GSBPM has also been applied with success to business register maintenance. The main advantages from this application have been pointed out as the improvement in terminology standardisation and the support in identifying possible synergies among processes. However, the advantages of the application in this field have not been demonstrated in practice [12]. Istat model is not suitable for documenting business registers maintenance. Such a goal has not been pursued in SIDI/SIQual because Istat has its own information system for administrative archive and registers. The two systems are integrated with respect to the common information. 5. Conclusions GSBPM proved to be a worthwhile model for statistical business process documentation, spread at international level and consequently useful for international applications. Istat SIDI/SIQual model can be mapped with it, allowing Istat statistical processes to be represented by GSBPM if necessary. The mapping exercise carried out in this paper, permitted us to identify lacks or drawbacks in SIDI/SIQual thesaura, especially for those referred to statistical compilations. Since they are easily increasable, they have been extended to cover topics that were before under-represented. For any other use of such models, we believe that SIDI/SIQual approach can be considered as much suitable as GSBPM. In particular, Istat model already supports statistical audit and self-assessment procedures, providing base documentation automatically derived from SIDI [9], similarly to what done by other NSIs with GSBPM [8,11]. With respect to its application within wider standardisation and industrialisation projects, a lot of organisations are adopting GSBPM as reference model [7,8,11]. However, as stated in Bergdahl and Blomqvist [1], for the success of this kind of projects, any model would fit as far as it captures the majority of the activities and establishes a common and stable framework the organisation can rely on. In addition, caution should be taken in introducing a new model in case a reference model within the institution is already adopted [7]. Istat is now launching a corporate program including the standardisation and industrialisation of statistical business processes, and we are quite confident that Istat SIDI/SIQual would properly serve this scope. A further benefit in its adoption would be represented by the level of completeness reached by the system, covering the totality of Istat surveys and the most important statistical compilations. References [1] Bergdahl, M. and Blomqvist, K. (2011) National Implementation of the GSBPM – The Swedish Experience, UNECE Workshop of Statistical Metadata, Geneva, Switzerland, 5-7 October 2011

9

[2] Booleman M., Linnerud J. (2010) Cooperation based on the GSBPM, UNECE Work Session on Statistical Metadata (METIS), Geneva, Switzerland, 10-12 March 2010 [3] Brancato G., Carbini R., Simeoni G. (2009) Metadata and Quality Indicators to Report on Editing and Imputation to Different Users, UNECE Work Session on Statistical Data Editing, Neuchâtel, Switzerland, 5-7 October 2009 [4] Brancato G., Carbini R., Pellegrini C., Signore M., Simeoni G. (2006) Assessing quality through the collection and analysis of standard quality indicators: the Istat experience, Proceedings of the European Conference on Quality in Survey Statistics (Q2006), Cardiff, UK, 24-26 April 2006 [5] Brancato G., Pellegrini C., Signore M., Simeoni G. (2004) Standardising, Evaluating and Documenting Quality: the implementation of Istat Information System for Survey Documentation – SIDI, European Conference on Quality and Methodology in Official Statistics Q2004 Mainz, Germany 24-26 May 2004 [6] Brancato G., D’Angiolini G., Signore M. (1998) Building up the Quality Profile of Istat Surveys, Proceedings of the Joint IASS/IAOS Conference, Statistics for Economic and Social Development. Aguascalientes, Mexico, 1-4 September 1998 [7] Hamilton, A. (2010) Applying the GSBPM within an NSI: Experiences and examples from Australia, UNECE Work Session on Statistical Metadata (METIS), Geneva, Switzerland, 10-12 March 2010 [8] Reedman, L. and Julien, C. (2010) Current and future applications of the Generic Statistical Business Process Model at Statistic Canada, European Conference on Quality in Official Statistics (Q2010), 4-6 May 2010, Helsinki, Finland [9] Signore M., Carbini R., D’Orazio M., Brancato G., Simeoni G. (2010) Assessing Quality through Auditing and Self-assessment, European Conference on Quality in Official Statistics (Q2010), 46 May 2010, Helsinki, Finland [10] Statistics Netherlands (2011) Strategic vision of the High-level group for strategic developments in business architecture in statistics, UNECE 59th Plenary Session of the Conference of European Statisticians, 14-16 June 2011, Geneva, Switzerland [11] Stender, H. and Molrup-Nielsen, J. (2010), The adoption of the METIS GSBPM in Statistics Denmark, UNECE Work Session on Statistical Metadata (METIS), Geneva, Switzerland, 10-12 March 2010 [12] Unece (2011), Applying the Generic Statistical Business Process Model to business register maintenance, UNECE Conference of European Statisticians, Group of experts on Business Registers, Twelfth session, Paris, 14-15 September 2011 [13] Vale, S. (2010), Exploring the relationship between DDI, SDMX and the Generic Statistical Business Process Model, 2nd Annual European DDI Users Group Meeting, Utrecht, Netherlands 8-9 December 2010 [14] Vale, S. (2009), Generic Statistical Business Process Model. Version 4.0, Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS), April 2009

10