Mar 31, 2011 - Surveillance, data collected through observation over time ... stages. To set the context, the DDI basic life cycle is diagrammed below, with DDI ...
METADATA FOR THE LONGITUDINAL DATA LIFE CYCLE
By Larry Hoyle, Fortunato Castillo, Benjamin Clark, Neeraj Kashyap, Denise Perpich, Joachim Wackerow, and Knut Wenzig
03/31/2011
DDI Working Paper Series – Longitudinal Best Practice, No. 3 This paper is part of a series that focuses on DDI usage and how the metadata specification should be applied in a variety of settings by a variety of organizations and individuals. Support for this working paper series was provided by the authors’ home institutions; by GESIS - Leibniz Institute for the Social Sciences; by Schloss Dagstuhl - Leibniz Center for Informatics; and by the DDI Alliance.
Metadata for the Longitudinal Data Life Cycle
Metadata for the Longitudinal Data Life Cycle THE ROLE AND BENEFIT OF METADATA MANAGEMENT AND REUSE
PROBLEM STATEMENT/DESCRIPTION: This paper focuses on the unique characteristics of longitudinal studies in which the generation of data and metadata is repeated over time. These types of studies might involve multiple waves, either for a person or a population, or might involve ongoing continuous data collection. Some of the issues that are unique to longitudinal studies follow from the repetitive nature of their data collection. Other issues arise simply due to the extended period over which they are conducted, leaving more opportunity for unanticipated events. It is important to realize that studies which are not initially intended to be longitudinal may evolve into longitudinal studies. It is therefore best practice for all studies to structure initial metadata to be compatible with this potential repurposing across the data life cycle. Each stage in the workflow may be of particular interest to different groups. Note: In this document words in italics denote DDI elements, e.g., StudyUnit. Also, note that the term ―published‖ when referring to a DDI entity means a DDI instance that has been made available for use outside of the immediate group of its creators. This is denoted by the ―isPublished‖ attribute being set to ―true‖, and carries with it the requirement that versioning be begun.
APPROACH: We first listed a number of possible forms longitudinal studies might take: •
Surveillance, data collected through observation over time
•
Event-driven data collection
•
Panel studies / cohort studies, open cohort studies
•
Retrospective studies (probably not ―longitudinal‖, unless collected at multiple time periods)
•
Interventions or trials
•
Repeated cross-sections
From this list we chose open cohort studies, one of the more complex designs, as our exemplar, with the thinking that challenges for simpler designs would also be present for the more complex design. We also decided to discuss potential issues in life cycle order as described in Figure 1 below. We wanted to explore what is of particular importance with respect to the temporal aspect of the data. We also drew the distinction
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 1
Metadata for the Longitudinal Data Life Cycle
between longitudinal use of the data and longitudinal management of repeated passes through the life cycle stages. To set the context, the DDI basic life cycle is diagrammed below, with DDI modules connected to the stages of the life cycle for which they are most relevant. For longitudinal studies, other arrows exist in a somewhat different life cycle (see examples in the Repurposing and Redesign section of this paper).
Figure 1: From: Just Enough DDI 3.ppt, Arofan Gregory - Dagstuhl 2010 Longitudinal Data Workshop 10422
Study Concept Universes Longitudinal studies require careful documentation of a number of aspects throughout their course. The initial study population and study concepts should be described and then described again as they change over time. Sampling procedures, including any number boosting procedures, should be documented thoroughly. The ability to generate an accurate description of the study universe at any given time is essential. The metadata should allow for retrieval of data belonging to any version of a universe or sub-universe from the successive stages of the study. If the universe expands or changes in any way, the corresponding Universe element must have its version updated. Changes in instruments over time may generate changes in universes. An example might be a revision of a skip pattern, causing a question to be answered by a different population. Partitions of the universe may be described hierarchically as Universe elements within the parent Universe element. See section 3.3 Universe of the DDI 3.1 User Guide for a description of hierarchical universe structures. DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 2
Metadata for the Longitudinal Data Life Cycle
A Comparison element should be used to describe differences between universe versions. The VersionRationale may be used to provide a textual description of the changes. The example below shows documentation of a change in the Universe element. Note also that with the change in version of the Universe, all of its ancestors (UniverseScheme, ConceptualComponent, StudyUnit, and DDIInstance) have a version update. A VersionRationale is included for each. Also note the use of a LifecycleEvent to document the external event and its related change in the universe. Example 1 shows a change in Universe () documented in DDI 3.1. A Group element () contains a Purpose and a Comparison. The Comparison contains a UniverseMap pointing to the initial version 1.0.0 (SourceSchemeReference) () and the updated Universe, version 1.1.0 (TargetSchemeReference) (). The Correspondence element () describes Commonality and Difference between the versions. A LifecycleEvent () describes the external event precipitating the change. Note how the TargetSchemeReference () includes a reference to both the Universe () and its parent maintainable (UniverseScheme). Example 1 – A Change in Universe
Universe updated to version 1.1.0
A group to contain comparisons for changes
{not shown}
example_UniverseChange_UniverseScheme us.example 1.0.0 example_UniverseChange_FrenchWorkers us.example 1.0.0
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 3
Metadata for the Longitudinal Data Life Cycle
example_UniverseChange_UniverseScheme us.example 1.1.0 example_UniverseChange_FrenchWorkers us.example 1.1.0
Both versions are intended to cover people of pre-retirement age in France. In 2010 the French Senate voted to raise the retirement age to 62.
Universe updated to version 1.1.0 Example Study Unit An example for: Best Practices: Metadata for the Longitudinal Data Life cycle The Role and Benefit of Metadata Management and Reuse. This example demonstrates the documentation of a change in a universe. example_UniverseChange_UniverseScheme us.example 1.1.0 example_UniverseChange_FrenchWorkers us.example 1.1.0 An example of documenting the change in a Universe description
Universe updated to version 1.1.0
Universe updated to version 1.1.0
In 2010 the French Senate voted to raise the retirement age to 62 from 60.
People of Working age in France People of Working age in France. Note that in 2010 the French Senate voted to raise the retirement age to 62 from 60. The Universe then included more people and the mean age of those in the universe increased.
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 4
Metadata for the Longitudinal Data Life Cycle
example_UniverseChange_OrgSch us.example 1.1.0 example_UniverseChange_Org us.example 1.1.0 EXAMPLE An imaginary organization used for examples
2010-10-27 example_UniverseChange_OrgSch us.example 1.1.0 example_UniverseChange_Org us.example 1.1.0 The French Senate and National Assembly voted to raise the retirement age to 62. This changes the universe of working age people.
Versioning The extended time frame of longitudinal studies makes changes to metadata likely. These changes necessitate versioning of ―published‖ metadata. The strategy used for documenting versioning should be carefully described at the outset. This will be especially important with DDI 3.2 with its more flexible versioning notation. (Note that the versioning rules and version format have changed with DDI 3.1 and will change again with DDI 3.2.)
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 5
Metadata for the Longitudinal Data Life Cycle
In Example 2, a ResourcePackage () contains the Organization () information to be used by reference. The Organization element contains a Note () element outlining the versioning structure as recommended by the Best Practices paper on versioning (see DDI Working Paper Series -- Best Practices, No. 8, http://dx.doi.org/10.3886/DDIBestPractices08, based on DDI 3.0). That paper also points out the importance of versionDate and VersionRationale and indicates that versioning events should be documented in LifecycleEvents as seen above in Example 1. The Content () of the Note documents the particular organization’s unique rubric for versioning. Each organization may have its own rules for doing versioning. Example 2 – Documenting the Versioning Method This resource package contains common information about the example organization
Example Us.example is an organization name used for documentation
AboutOrganization_OrgSch us.example 1.0.0 AboutOrganization_Org us.example 1.0.0
The versioning scheme in this DDIInstance is as follows: All versions will consist of a string with three numbers separated by periods. Major changes produce an increment in the number to the left of the first period. Minor but meaningful changes produce an increment in the number between the decimal points. Very minor changes, such as correction of typographical errors, produce an increment in the number to the right of the second period. Late binding is not used. Elements will be marked as “isPublished” when the DDI is posted to the public site. versionDate is updated on unpublished metadata. Our initial version is always 1.0.0.
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 6
Metadata for the Longitudinal Data Life Cycle
Concepts Concepts may also evolve as the study progresses. Structured concepts may be useful in longitudinal studies. They can be created with a hierarchy of ConceptGroup (using references to single concepts) in ConceptualComponent. Nesting of Concepts within Concepts will be available in DDI 3.2. An example from a demographic surveillance site (DSS) study like the INDEPTH Network would be ―household at location‖, then ―social group‖, where the concept of social groups refines over time (INDEPTH Network site: http://www.indepth-ishare.org/). Best practice in documenting concepts is to use an existing controlled vocabulary, a thesaurus, or a DDI ResourcePackage when they exist. For more see Jääskeläinen et al. In Example 3, two Concepts, ―social conservative‖ () and ―economic conservative‖ (), are grouped in a higher level ConceptGroup – ―conservative‖ (). Example 3 – Structured Concepts This Resource Package contains example structured concepts
SocialConservative Self identified as socially conservative. Persons identifying themselves as socially conservative StructuredConcepts_ConceptScheme us.example 1.0.0 StructuredConcepts_CG_conservativeEconomic us.example 1.0.0 EconomicConservative Self identified as economically conservative Persons identifying themselves as economically conservative StructuredConcepts_ConceptScheme us.example 1.0.0
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 7
Metadata for the Longitudinal Data Life Cycle
StructuredConcepts_CG_conservativeSocial us.example 1.0.0 Conservative Self identified as conservative Persons identifying themselves as conservative StructuredConcepts_ConceptScheme us.example 1.0.0 StructuredConcepts_CG_conservativeSocial us.example 1.0.0 StructuredConcepts_ConceptScheme us.example 1.0.0 StructuredConcepts_CG_conservativeEconomic us.example 1.0.0
Study Unit A Group structure can enable representing metadata common to waves -- for an example using DDI 3.0, see Goebel and Wackerow 2007. Metadata describing the relationship of StudyUnits may be placed in a parent Group. A SeriesStatement may also be used to document relationships among waves. Metadata to be shared among StudyUnits are better represented in a ResourcePackage. The consensus was that a ResourcePackage is the more machine-actionable structure.
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 8
Metadata for the Longitudinal Data Life Cycle
Example 4 shows a SeriesStatement () pointing to the series to which its StudyUnit () belongs. Any additional StudyUnit in the series would contain a similar SeriesStatement. Example 4 – Using a Series Statement Example Study Unit This example demonstrates the use of a SeriesStatement. example_SeriesStatement_UniverseScheme us.example 1.1.0 example_SeriesStatement_FrenchWorkers us.example 1.1.0
us.example/LongitudinalLifecycle {external} Longitudinal Life Cycle Example Study Units LongLifeStudies This is a hypothetical series of study units for the Longitudinal Life Cycle Metadata paper
An example of documenting the use of a SeriesStatement Universe updated to version 1.1.0 In 2010 the French Senate voted to raise the retirement age to 62 from 60 People of Working age in France People of Working age in France. Note that in 2010 the French Senate voted to raise the retirement age to 62 from 60. The Universe then included more people and the mean age of those in the universe increased.
DOI: http://dx.doi.org/10.3886/DDILongitudinal03 -- Page 9
Metadata for the Longitudinal Data Life Cycle
Data Collection A longitudinal study is particularly well suited to metadata sharing practices. These include management and reuse of Instruments, QuestionSchemes, and CollectionEvents – including ModeOfCollection. An overall explicitly described versioning practice will become important here as well as the careful use of Comparison. The following example shows the documentation of a change in a question from QuestionItem version 1.0.0 () to QuestionItem version 2.0.0 () and their associated categories, CategoryScheme version 1.0.0 () and CategoryScheme version 2.0.0 (). A picture is also added to the revised version of the question. A QuestionMap (), CategoryMap () and ItemMap () document the changes, both commonalities and differences. Note that if there were associated codes, GenerationInstructions could be used to describe the relationship of values from the revised version to the original values in a machine-actionable way. No such facility seems to exist for categories. The proper value for a CommonalityWeight for changes in the ―Other‖ () & () category in this case is unclear.
Original Question (version 1.0.0): Which writing implement do you prefer? Pencil Pen Other
Revised Question (version 2.0.0): Which writing implement do you prefer? Pencil Pen Brush
Two ControlConstructSchemes () are also included to show how a sequence of questions is instantiated. Note that the version numbers within the ControlConstructSchemes are not required to match the version numbers of the questions they reference.
Other
Example 5– When Questions Change Figure 2: Example Questions This Resource package contains an example of a change in a question Compares v1.0.0 and v2.0.0 of QuestionChange_CatSch Compares v1.0.0 and v2.0.0 of QuestionChange_QScheme and QuestionChange_CatSch. Version 2.0.0 added the category "Brush" QuestionChange_QScheme us.example 1.0.0