Grounded metadata: Validating a semantic web

Grounded metadata: Validating a semantic web vocabulary for technical assistance to educators Michael Knapp Green River Data Analysis, USA [email protected] Brandt Kurowski Green River Data Analysis, USA [email protected] Sara Dexter University of Virginia, USA [email protected] David Gibson The Vermont Institutes, USA [email protected]

Robert McLaughlin International Graduate Center, VI [email protected] Abstract: Development of metadata (data about data) can be challenging because the vocabulary needs to be reliable yet flexible and extensible, locally relevant yet widely applicable and interoperable, and both machine and human readable. This article describes the process of creating “grounded metadata” for education, by engaging experts from technical assistance organizations, and the challenges the project faced. An initial ontology for educational technical assistance resources that resulted was then used to catalogue resources for dissemination via the Semantic Web.

Introduction The Semantic Web (Berners-Lee & Miller, 2002) offers much promise for users, potentially allowing them to better work with web-based applications that can share and process the “meaning” of data (e.g. how one source of data relates to another, what data a user might be interested in today, what new data an application needs now in order to better serve a user). Metadata – data about data - defines and links the “meaning” level of data on the Internet and is essential for a vision of the Semantic Web as a new approach to network-based information systems. But metadata development can be challenging. The vocabulary and organization selected for exchanging information among machines and applications needs to be reliable and yet flexible and extensible, highly locally relevant and yet widely applicable and interoperable, and both machine and human readable. The explosion of information on the Internet has spawned a sub-industry in the science and art of metadata “language development”: how to name, organize, access and use web-based information resources. The conceptual and technical protocols that govern the development of such languages, developed thus far mostly by web-savvy information engineers, are highly subjective, time consuming, and expensive (Nilsson, Palmer & Naeve, 2002). More efficient and user-friendly methods for generating metadata, especially by engaging or harvesting the natural interactions of professionals in various fields of knowledge, would proliferate the development of professionally relevant vocabularies and structures for knowledge and thus accelerate the deployment of Semantic Web applications in those fields (Koivunen & Miller, 2001).

In the field of education, the existing metadata structures have been constructed (primarily in computer science departments) to provide for resources for instruction such as “learning objects.” The vocabularies refer to considerations like “Interactivity Level” (grade level of intended audience) and “Interactivity Type” (type of interactivity with learning resource) (IEEE, 2004). In contrast with people concerned with courseware, educational research and technical assistance experts have yet to develop a structure or vocabulary for their knowledge and practices. Without ontologies and vocabularies to describe them, the technical assistance field will not be able to take advantage of the Semantic Web. Technical assistance in education bridges the gap from research to practice by moving recent finding and best practices into the field of practice, but the delays across that gap are often great, and the volume and pace of new findings makes it difficult to “keep up.” Examples of potential Semantic Web applications that would help include an application that would scan the information contained in an artifact (e.g. a research paper, a poem, a drawing) and would bring similar resources to the user as needed, for comparison or contemplation. Another application could take a measure (e.g. a test, a survey, an observation) and build a relevant collection of readings and other resources for further study. A teacher could assemble a collection of readings for a class by “giving” an application the outline of course content. School leaders could give a survey to their faculty and receive an analysis and supplemental readings for various subgroups within the school. Responsive applications like these could “disseminate” resources (a vocabulary word in the technical assistance field) that would assist networks of educators across the country who support teachers and school leaders. Seeing this potential, we have focused our recent Semantic Web efforts on networks of experts in education topics such as professional development, urban teacher preparation, equity, and school improvement in order to develop and refine the ontology and vocabulary needed for network-based applications in technical assistance to teachers, schools and larger educational systems. The resulting metadata has been used since 2001 to tag collections of resources in a Semantic Web application in order to “responsively disseminate” (Gibson, 2003; Sherry, Havelock & Gibson, 2004) articles, reports, conferences and other resources. In the following sections, we outline the process of grounded metadata development in education, its challenges and outcomes.

The Grounded Metadata Development Process Organizations that participated in the effort included: Great Cities' Universities (GCU), the Urban Network to Improve Teacher Education (UNITE), the Society for Information Technology in Teacher Education (SITE), the International Society for Technology Education (ISTE), the National Staff Development Council (NSDC), the newly formed National Center for Culturally Responsive Educational, the US Department of Education, and various US State Governments, non-profit educational organizations, as well as individuals. Prior to working with us, these organizations' representatives had not been concerned about cataloging and metadata, and had no knowledge of the Semantic Web. Their work with us was initially motivated by their desire to support educational reform practices through the sharing of knowledge among the networks of practitioners who shared their passions and goals for improving education. Their work was either voluntary or done on limited resources, so for our continued success in motivating them to stick with the task, we had to provide efficient and relatively natural methods for generating metadata as well as develop tools through which they realized concrete benefits from its creation in order for these organizations to use our Semantic Web application. To help their members access resources, we had to first work with them to develop and refine metadata. These organizations represent communities of practice (Wenger, 1998). They thought of the specialized collections of resources they wanted to disseminate in terms of their local knowledge. That is, their conceptual frameworks and descriptions had been developed over time as that community socially constructed knowledge together. Sometimes these ways of describing important ideas and resources were not explicit as a framework so they found it difficult at the outset of the metadata development process to

specify beforehand a comprehensive and cohesive set of qualifiers (definitions and data types) and control vocabularies (examples, keywords and phrases). Because it was both time-consuming and frustrating for them when we tried to engineer aspects of a metadata vocabulary in advance, we switched to a “grounded” co-development process to generate the metadata. We use the term “grounded metadata” because we started with the knowledge of these communities of practice and let the metadata vocabulary arise from it, so it could best convey their meanings. Our strategy was to let the organizations rely upon their own community's vocabulary for organizing resources and modify it over time as they saw fit. We wanted vocabularies to also aid the search of the resources by their eventual users, and so it was critical that the ontology we created across the communities allow each one to use descriptors relevant for its group of users. As subject headings slowly evolved, they had to be combined into a meaningful hierarchy. As we engaged these educators in the process of knowledge representation, we had to refine our cataloguing tool and web site to accommodate the constantly evolving schema. Cataloging resources proved to be a major bottleneck. Any public user of the web site could contribute three kinds of resources: files, Web sites, and references to people. The contributions came from suggestions made by end-users and collection editors from the organizations. User-suggested resources were queued up for review by the editors, who would verify that selection criteria were met. Accepted resources were then sent to a librarian for extensive cataloging. There were bottlenecks at every juncture. We came up with several solutions. The simplest is to have end-users provide some of the metadata. Even if the quality of these user-supplied metadata is lower than what a librarian would generate, limiting users to a few choices that place the resource “close” to ideal significantly improved the cataloging process. We also stopped using metadata qualifiers that required extensive time investment by a professional librarian. For example, we originally cataloged all resources to the Dewey Decimal System, but this approach turned out to be the most costly part of the process, and was not that helpful to users. Users were instead more interested in browsing the subject headings presented by the organizations and professional communities that were building the sites. Users were also interested in the beneficiary of the resource, the age of the intended beneficiary, and the vocabulary that described the beneficiary (student) and mediator (instructor). As a result, we began to develop implementations that minimized our dependency on professional library-quality cataloging, and focused on tagging those terms that end-users seemed to value based on server log analyses. Over time, we used an increasingly limited subset of the “professional” metadata. The professional organizations were interested in adding and using metadata they needed to meet the immediate demands of the people they supported. We found that the Dublin Core (DC) elements “Author”, “Title”, and “Description” and the Dublin Core-Education (DC-Ed) elements of “Standards” and “Audience” were extensively used. Many of the other elements and qualifiers were ignored or used inconsistently, suggesting they were not of relevance to this community. Another bottleneck has been the need for volunteer or paid reviewers to screen user-suggested resources to determine whether they should be included. In the first stages of our work, we relied on end-users to nominate resources and on paid editors to serve as “gatekeepers” of content. Later, we piloted an alternative approach in which certain members of a community were given log-in access and authority to publish new resources directly to the Web site, reducing the time and expense involved in screening nominated materials. This also enhances the sustainability of the work, as it is a vehicle by which networks of experts share the modest cost of cataloging and publishing content.

The EdRef Metadata Framework When initiating development of our ontology in Spring 2001, we decided to adopt the Dublin Core (DC) metadata set because it appeared to be the most widely recognized metadata standard for online resources as well as the de facto standard element set for the Resource Description Framework (RDF) (W3C, 1999).

At that time, the 15 elements and qualifiers in the DC provided a generalized framework for describing resources (DC, 2004). Since we recognized that DC was general in scope and would not provide sufficient detail on which to develop our application, we chose to leverage DC's modularity and extensibility that allows refined metadata to be added within the DC framework. Following the recommendations of Duval, Hodgins, Sutton and Weibel (2002) we began the process of developing the Education Reform Metadata Framework (EdRef) by selecting relevant elements from existing standards (Table 1). To DC's 15 basic elements and qualifiers we added the two DC education (DC-Ed) elements “Audience” and “Standard” (DCMI, 2004). We added elements and control vocabularies selected from the Gateway to Educational Materials (GEM, 2004) and Institute of Electrical and Electronics Engineers Learning Objects Metadata (IEEE, 2004). For example, our control vocabulary for the element “Resource Type” includes control vocabularies from GEM, IEEE LOM, and DC. Similarly, under “Audience” from DC-Ed, we included the qualifier ”Typical Age Range” as derived from IEEE LOM. We added our own vocabulary to the ontology only if we could not locate any existing metadata to meet community needs. Because we worked with communities from specialty areas of education we postulated that for each, highly specific metadata would be required to describe resources and organize collections according to their conceptual frameworks. We allowed for this specificity through the use of a qualifiers and control vocabularies. For example, we added control vocabulary under the DC-Ed qualifier for “Audience”. We termed this control vocabulary “Expertise Values” because it allows organizations to present resources appropriate to the experience and knowledge of the learner. In addition, a control vocabulary termed “Systemic Values” was included under both the “Mediator” (teacher) and the “Beneficiary” (student) qualifiers of the DC-Ed “Audience” element. Among the educational communities we worked with, the distinction between educator and student needed to be blurred since the same learner could be both a mediator and a beneficiary of the same resource. In other words, the learner could be receiving instruction from their instructor and also coaching a fellow student, and therefore acting as both mediator and beneficiary. In order to reduce the distinction between teachers and students these “Systemic Values” were the same for both qualifiers. We subdivided the control vocabulary of the “Systemic Values” into five categories, consisting of “Individuals”, “Work Groups”, “Learning Organizations”, “Regions”, and “The Globe”. Under each of these headings, another level of control vocabulary was added - some from the GEM metadata and some of our own - to further refine each of the five.

Subject Headings in the EdRef Metadata Framework A central task in building the EdRef metadata framework – and one that is always underway - is the generation of lists of new subject headings, which serve as control vocabularies for the “Subject and Keyword” elements. This task was deemed necessary because neither Dewey nor Library of Congress headings offered a “currently relevant” vocabulary pertaining to the content of the organizations with whom we worked. In addition, technical assistance providers in education tend to need, and thus borrow, terms from a variety of fields including systems theory, organizational development, politics, finance, cognitive psychology, and equity. Our subject headings needed to combine, span, and integrate these fields of knowledge to create a comprehensive framework. We also had to develop new subject headings for another reason - the vocabularies for technical assistance need to address concepts and terms understood and used not only by researchers but also by practitioners and learners. It was important to organize the new subject headings we developed according to the most credible and widely recognized conceptual framework or set of standards. This approach offered the advantage that content could be organized according to issues of widespread concern to the communities we worked with. Our hope was that this better enables technical resource providers, (or funding agencies, policymakers, professional developers) to assess and target resources and respond to practitioners' needs.

Subject headings were constructed by participating communities with specialized expertise in each topic area. For example, the US Department of Education at one time had an active Digital Equity Task Force whose mission was to address inequitable access to technology tools, computers, and the Internet. Its work with us led to the establishment of five subheadings within the digital equity subject area (Wallace & McLaughlin, 2004): 1. Technology resources (access to the learning technology resources, including hardware, software, wiring, and connectivity) 2. Quality content (access to high-quality digital content to achieve digital equity) 3. Cultural relevance (access to high-quality, culturally relevant content) 4. Effective use (access to educators skilled in using these resources effectively for teaching and learning) 5. Content creation (opportunities for learners and educators to create their own content) The process of defining the dimensions of topics was carried out by all participating organizations. The Urban Network to Improve Teacher Education (UNITE), a consortium of college and university education faculty with a research interest in issues of relevance to urban education in the US, defined the subheadings of the urban teacher preparation subject heading. The National Staff Development Council (NSDC), an organization developing standards for professional development, identified the subject headings describing professional development. Each of the organizations working with us was asked to develop a set of “dimensions” by which they could organize and present their framework. We labeled the results of this process the EdRef subject headings and here provide two examples, for “Research on Best Practice” and “Urban Teacher Education” (Table 2). Avoiding Duplication of Terms As we expanded our scope to describe additional areas of education to accommodate the involvement of a growing number of organizations, we insisted on data sharing between organizations. We had numerous related, topic-specific cataloging efforts sharing one dataset, and it made sense to provide an interface to the complete set of resources. In deciding how to implement this, we realized that our EdRef subject headings were overlapping a great deal, but continuous work needed to be done to integrate the efforts of organizations into a single hierarchical set of subject headings. For example, UNITE, working on urban teacher preparation, had developed the subject heading: Urban teacher preparation - continuing professional growth – professional development. At the same time, NSDC had an entire set of subheadings focusing on professional development. As we reviewed - and anticipated - different kinds of overlapping subject headings, we concluded that we had to develop a methodology to simplify and organize the vocabulary that was evolving. It concerned us that the subject headings reflected the biased selection process that led to particular organizations becoming involved in the project. It was also clear to us that, as participation broadened, we would increasingly be handling subject headings that are conceptually contained within other subject headings. For example, the “Digital Equity” subject heading is probably contained within the “Equity” heading as well as within the “Technology Planning” and “Technology Applications” headings. Moreover, some headings have the same meaning. For example, both the “Technology Planning” subject heading and the ”Digital Equity” subject heading have subheadings that describe access to technology resources. An essential oversight role that will be necessary on an ongoing basis is the continual simplification and reorganization of the evolving metadata to avoid overlap between organizations wherever possible. We are exploring possibilities for automation of this function. While working on the EdRef metadata vocabulary, we noticed that different people tend to use different words to mean the same thing. This could easily become a problem in a system that lets people add their own subject headings at will, since a proliferation of synonyms could make the vocabulary unwieldy. We

envision that the technique of cluster analysis to find terms expressing similar concepts might allow us to combine terms and thus eliminate redundancy. We have also been working to ensure that resources tagged by one organization are readily available for tagging by another. This minimizes duplication, and shares the cost of cataloging among the organizations using the application. As more organizations become involved in the project, and as the metadata matures, we see the increasing value of presenting resources to learners from the full collection, regardless of the organizations entering data into the application.

Conclusions Organizations that provide technical assistance to educators can be engaged in grounded metadata development that catalogues and presents their research-validated recommendations. Application frameworks for organizing information using metadata can evolve to keep metadata terms relevant to the communities of practice that they serve. The grounded metadata process contributes to the provisions on the Semantic Web of (1) content about effective professional practices, (2) resource providers available to assist educators in planning for and implementing these practices, and (3) research and evaluation data that attest to the efficacy of these practices and resource providers. Berners-Lee and Miller posit that “the most exciting feature of the Semantic Web is not what we can imagine doing with it, but what we can't yet imagine it will do”(Berners-Lee & Miller, 2002). However, the Semantic Web depends on widespread adoption of interoperable metadata (Nilsson, Palmer et al., 2002). By definition, communities of practice, such as those described in this paper, have specialized knowledge, which could make it more likely that their metadata not be interoperable. When determining what knowledge to export to the Semantic Web, they have an opportunity to make a formal commitment to interoperability. Interoperability can be gained by a cross-walk of their specific terms to the DC; however, treating this as a simple technical practice could result in losing specific meanings that are key to that community of practice. Our metadata co-development process helped these communities of practice determine what knowledge to export. Through the integration of their new metadata with existing standards, elements, and control vocabularies, we ensured that the metadata was both relevant to them and interoperable. Metadata cannot be rigid and inflexible; they must evolve and be constantly refined (Nilsson, Palmer et al., 2002). As new stakeholders use Semantic Web cataloging software, they exert pressure to ensure that the metadata evolve. Since it is often easer to be redundant than to collaborate, useful Semantic Web applications must facilitate detecting and addressing duplication of metadata on a continual basis. Effective collaboration between organizations will remain a central challenge to realizing the vision of the Semantic Web. For us, operationally, this has meant the development of software that: allows constant refinement of metadata categories to avoid overlap; allows organizations to make incremental improvements to an integrated ontology; and minimizes the labor of cataloging, combining, and re-cataloging resources.

Acknowledgments The authors would like to thank Laura Sperazi and Ben Tucker for their work editing this manuscript. The work has been funded by the US Department of Education Preparing Tomorrow's Teachers to Use Technology Catalyst Grant (The TEN Project) Number P342B000036, and (SimSchool) Number P342A030033, the US Department of Education Technology Innovation Challenge grant programs, and the National Science Foundation under Grant Number 0092129. Any opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the US Department of Education or the National Science Foundation.

Table 1. EdRef, a metadata framework for technical assistance in education Dublin Core Metadata

Qualifier

Title (DC) Creator (DC) Subject and Keywords (DC)

Description (DC) Publisher (DC) Contributor (DC) Date (DC) Resource Type (DC)

Resource Identifier (DC) Language (DC) Coverage (DC) Rights Management (DC) Audience (DC-Ed)

Standard (DC-Ed)

text text Dewey Subject Headings Library Of Congress Subject Headings EdRef Subject Headings (See Table 2.) text text text Created Cataloged Dublin Core Type GEM Type (GEM) Learning resource Type (IEEE LOM 5.2) URI (DCQ) ISO 639-2 Spatial text Typical Age Range (IEEE LOM 5.7) Expertise (EdRef) Mediator (DC-Ed) Beneficiary (DC-Ed) ISTE Standards NSDC Standards ISLLC Standards NBPT Standards

NOTE: The () after an attribute identifies its source, where DC = Dublin Core, DC-Ed = Dublin Core Education committee, IEEE = IEEE Learning Objects Metadata, GEM = Gateway to Educational Materials, and EdRef = our own metadata developed for this project.

Table 2. EdRef subject headings for the collections on “Best Practices” and “Urban Teaching” Research on Best Practices Assessment Classroom Management Curriculum

Data-driven Decision-Making Equity Instruction

Leadership Learning Climate

Early Literacy English/ Language Arts Mathematics Science Social Studies

Cooperative learning Generating and testing hypotheses Homework and practice Identifying similarities and differences Nonlinguistic representations Questions, cues, and advance organizers Reinforcing effort and providing recognition Setting objectives and providing feedback Summarizing and note taking

Professional Development Teacher Preparation Technology

Continuing Professional Growth

Induction

Preservice Curriculum

Recruitment

Digital Equity Preservice Technology Infusion Research on Education Technology Technology Applications for Learning Technology Planning Urban Teacher Education Advocacy and Equity Communities of Practice License and Credential Partnerships Professional Development Retention Technology Teacher Leadership Mentoring and Coaching Models Partnerships Resources Socialization and networking Advocacy and Equity Assessing Field Experience Foundations Learning Environment Learning and Development Partnerships Planning Teaching Technology Diversifying the pool Incentives Meeting critical needs Selection Strategic partnerships Technology

References

Berners-Lee, T. and E. Miller (2002). The semantic web lifts off. ERCIM News. DC (2004). Dublin core metadata initiative. DublinCore. DCMI (2004). Dcmi education working group. GEM (2004). Gateway to educational materials. Gibson, D. (2003). Measuring needs and finding resources: XML-based responsive dissemination. Proceedings of the Society for Information Technology in Teacher Education conference, Alberqueque, NM 2003, SITE. IEEE (2004). IEEE learning technology standards committee. Koivunen, M. and E. Miller (2001). W3C semantic web activity. Semantic Web Kick-off Seminar in Finland. Nilsson, M., M. Palmer, et al. (2002). Semantic web metadata for e-learning - some architectural guidelines. W3C 2002 conference.

Sherry, L., B. Havelock, et al. (2004). Responsive dissemination: A model for scaling and sustaining educational innovations. Society for Information Technology in Teacher Education, Atlanta, SITE. W3C (1999). Resource description framework (rdf) model and syntax specification. 2004: W3C Recommendation 22 February 1999. This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from other documents. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web. Wallace, J. and B. McLaughlin (2004). Digital Equity EdReform Network. Wenger, E. (1998). Communities of practice: Learning, meaning, and identity. New York, Cambridge University Press.

Grounded metadata: Validating a semantic web

Grounded metadata: Validating a semantic web

Suggest Documents

Supporting Geosciences Web Services Metadata ... - Semantic Scholar

Streamlining geospatial metadata in the Semantic Web

Metadata on the Web

Web Site Metadata

Web Site Metadata

Web Site Metadata

Using Semantic Web Metadata for Advanced Web Information Retrieval

Metadata and the Web

A Semantic Metadata Generator for Web Pages Based on Keyphrase ...

learning object metadata in a web-based learning ... - Semantic Scholar

Intelligently Authoring Metadata for a Semantic Web Peer ... - CiteSeerX

A metadata for Web services architecture

A Web-Based Metadata Schema Repository - wseas

A Web-Based Metadata Schema Repository - wseas.us

The Semantic Web and expert metadata - Semantic Scholar

Empirically-grounded Reference Architectures: A ... - Semantic Scholar

Validating Orchestration of Web Services with ... - Semantic Scholar

Contextualization of web searching: a grounded theory approach

Validating a Patient-Reported Comorbidity ... - Semantic Scholar

Representation = Grounded Information - Semantic Scholar

Web Services System Development: a Grounded Theory Study

validating javascript guidelines across multiple web browsers

Metadata Overview and the Semantic Web - Max Planck Institute for ...

Taxonomic names, metadata, and the Semantic Web - CiteSeerX