Extracting and Ingesting DDI Metadata and Digital ... - Semantic Scholar

Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS extension of the NARA TPAP using the OAI-PMH Jewel H. Ward SILS UNC-CH

Antoine de Torcy NC-DICE UNC-CH

[email protected]

[email protected]

Abstract This prototype demonstrated that the migration of collections between digital libraries and preservation data archives is now possible using automated batch load for both data and metadata. We used this capability to enable collection interoperability between the H.W. Odum Institute for Research in Social Science (Odum) Data Archive and the integrated Rule Oriented Data System (iRODS) extension of the National Archives and Record Administration's (NARA) Transcontinental Persistent Archive Prototype (TPAP). We extracted data and metadata from a Dataverse data archive and ingested it into the iRODS server and metadata catalog using the OAI-PMH, Java, XML/XSL and iRODS rules and microservices. We validated ingest of the files and retained the required Terms & Conditions for the social science data after ingest.

1. Introduction Over the past 15 years, digital librarians and archivists have focused on interoperability across collections in order to share resources and provide access to materials for as many users as possible. The work of these librarians has led, for example, to the preference for metadata sharing via the Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH) [1] over federated searching [2]. In the meantime, archivists have adapted their preservation model to the digital domain [3]. Researchers in the data grid community have focused on developing interoperable virtual collaborative environments that support distributed but coordinated scientific and engineering research [4,5]. These researchers, librarians, and archivists have shared an interest in preserving data for the indefinite long-term, and have been collaborating to develop persistent collections. One approach to come out of this merging of the digital library information model, the archival

Mason Chua Odum Institute UNC-CH [email protected]

Jonathan Crabtree Odum Institute UNC-CH [email protected]

preservation model, and the data grid community has been the integrated Rule Oriented Data System (iRODS), developed by the Data Intensive Cyber Environments group (DICE). The developers of iRODS had already gained ample experience with their previous generation of Data Grid software: the Storage Resource Broker (SRB), used in production by several international collaborative projects. They designed iRODS to be, like SRB, middleware that enables digital collections shared by multiple organizations to be organized logically while physically distributed across heterogeneous storage resources [4]. In addition, they have implemented a rule engine at the core of iRODS that gives administrators the ability to customize and automate community-based management policies. Digital archivists at the National Archives and Records Administration (NARA) have been applying iRODS in the construction of the Transcontinental Persistent Archive Prototype (TPAP). Their goal has been to federate independent data grids into a viable preservation environment. Researchers and administrators at NARA have used the TPAP as a test bed to evaluate methods for preserving electronic data [6]. As one part of this effort to enable collection interoperability, we developed a prototype to extract both digital objects and metadata from a data archive and ingest them into the preservation grid, based on standards from the digital library, Web, archival, and data grid domains. We used the Odum Institute Data Archive as our test bed and extracted metadata and digital objects from it. We harvested the metadata via the Odum Archive’s OAI-PMH Data Provider (DP). We bulk ingested the metadata into the iRODS metadata catalog (iCAT) via a microservice1 that performs XSL transformations on the original OAI-PMH output. We batch ingested the digital objects into the iRODS 1

A microservice is a small, well-defined procedure/function written in C by systems and application programmers that performs a specific task when compiled into iRODS server code [7].

server via HTTP and the Java interface to iRODS. As part of the extraction and ingest process, we retained the Terms and Conditions of the data’s owner and loaded the license agreement text into the iCAT; guaranteed that access to particular collections remains restricted based on existing written policies and legal agreements; uploaded the digital objects and complete metadata file into the iRODS server; ingested 26 of those metadata elements into the iCAT metadata catalog to enable bibliographic searching within iRODS; and, validated that this new replica of the archive is correct. Our successful development of this prototype has demonstrated that end-to-end integration between digital libraries and an iRODS-based preservation data grid is possible using existing standards familiar to each domain. In particular, we have proven that librarians and archivists, who generally work with limited resources of time, personnel, and funding, will be able to upload their collections into the preservation grid without having to learn or create new standards or technologies. By enabling bibliographic searching and policy enforcement, we have demonstrated that iRODS is not just a dark archive; the system is flexible enough to support multiple types of data management applications and act as a transfer and dissemination system for both digital archives and grid computing. The remainder of this paper is organized as follows: we summarize previous and related work in Section 2. We formally study the ingest model in Section 3. We discuss the method we chose to extract and ingest metadata and digital objects from the digital archive into the iCAT and iRODS server in Section 4. We discuss and analyze our work on this project in Section 5. We detail our conclusions and consider our ongoing and future work with this prototype in Section 6.

2. Previous and Related Work The following sections will discuss the archival preservation model, the digital library information model, and data grid archival storage technology. We will discuss iRODS, the NARA TPAP, the Odum Institute Dataverse Network (DVN), and the OAIPMH. We will end this section with a discussion of our previous work developing the initial proof-of-concept.

2.1. Archival Preservation Model Techniques and methods for preserving materials have changed over time as information has migrated from stone tablets, papyri, manuscripts, and books to digital formats, but the essential functions have remained the same [3]. Regardless of format, archivists

must acquire, appraise, arrange, describe, store, preserve and provide access to the information and materials with which they are charged. The primary change in responsibility for archivists with regards to digital information as compared to physical objects has been that they must focus on preserving and curating representation information for the digital object, as well as the digital object itself [8]. For example, an archivist who has curated a digital image of a 17th Century paper map must preserve not only the digital image, but also representation information that describes the digital image format, authenticity information about the physical source, the types of processing that can be applied to the digital image, and integrity checksums to detect data corruption.

2.2. Digital Library Information Model Kahn and Wilensky defined nine basic entities and structures that complete a high level framework for digital library services: digital objects, handles, metadata, repositories, handle generators, originators, users, naming authorities, and a repository access control [9]. This high level framework has provided the basic information model for digital libraries upon which preservation services and persistent collections may be built. Davis and Lagoze applied a modified version of the Kahn-Wilensky Framework (KWF) to create the Dienst protocol, an open architecture for federated distributed document libraries [2]. The authors outlined four core services in the architecture: a user interface, a collection service, a repository, and an index. They applied Dienst within the Networked Computer Science Technical Report Library (NCSTRL), and provided a searchable, interoperable digital library network for computer science technical reports. However, Davis and Lagoze realized that while the overall architecture was a success, distributed searching did not scale well. They also recognized that the high effort required to maintain and install the Dienst protocol prevented its wider adoption within the digital library community. Lagoze and Van de Sompel improved upon the cross-archive capabilities demonstrated with Dienst by creating the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [1]. These researchers provided a solid foundation for digital library interoperability by providing a low-barrier protocol over which metadata could be shared across multiple institutions. Digital librarians have used this protocol to access outside metadata and provide services to their users such as cross-repository searching of federated collections.

Researchers in the data grid domain defined the digital library information model slightly differently. Moore described it in terms of the properties that are associated with a collection of digital objects (authenticity, integrity, chain of custody) and the properties that govern the digital preservation environment (management policies, procedures, and state information) [10]. Similar to the KWF, he defined the model as building upon the concept of infrastructure independence with the ability to manage the properties of the collection independently of the choice of storage technology. He described data grids as managing the collection properties while enforcing management policies, automating administrative tasks, and validating assessment criteria. The Data Intensive Cyber Environments (DICE) team implemented a modified version of this digital library information model with the development of iRODS.

2.3. Data Grid Archival Storage Technology Researchers have designed data grids to implement the concept of infrastructure independence, which they define as the extraction of a digital object from the environment in which it was created, and the ingestion of the digital object into an environment that the archivist controls [10]. In this process, the archivist imposes a logical name on the digital entity, imposes an arrangement (collection hierarchy) on the digital objects, selects the properties that will be maintained about each digital entity, enforces the properties with management policies, and validates the properties by evaluating assessment criteria. The DICE team applied this archival storage model in iRODS, a second-generation data grid. They developed the software infrastructure features needed “to support the management, collaboration, controlled sharing, publication, replication, transfer, and preservation of distributed data” [11]. This team has implemented features that include specification of logical name spaces to identify the digital object, the users (archivists), and the storage resources; separation of the client interface from the remote storage protocols; integrated authentication and authorization on all accesses; integrated data movement under the control of the data grid to track changes in data location; and support for user-defined descriptive metadata. The DICE group has extended SRB technology to give administrators the ability to create and automate management policies tailored to individual communities [12]. They have facilitated the management of distributed and shared library collections across heterogeneous storage systems and organizations via a data grid. They have created a

system whereby management policies are enforced across the multiple independent administrative domains that control the physical storage resources. They have designed iRODS to implement infrastructure independence through data and metadata virtualization methods [5]. The DICE team has created an architecture in which users interact with the logical name spaces instead of the names used at the physical storage locations, and the data grid manages the mappings from the logical name spaces to the physical infrastructure. They have constructed the data grid to use a peer-to-peer server architecture to access remote storage resources. The DICE team mapped client actions to sets of standard functions, called microservices that encapsulate multiple I/O operations to simplify the expression of client actions. Thus they designed a microservice so that for a replication action it will open the designated file, open the new file at the correct location, copy the bytes between the files, and then register metadata that describes the location of the replica. They expressed the I/O required by the microservices using a standard I/O protocol, based on extensions to POSIX I/O. They constructed the data grid to use storage resource drivers to map the standard I/O protocol to the specific I/O protocol required by each type of storage device, whether it is a tape archive or a Unix or Windows file system. These researchers devised a method for management procedures to map to computer actionable rules. They stored these rules in a rule base at each storage location, and executed them with a rule engine that is installed at each storage location. They designed it so that the rules control the execution of microservices that can be chained and interpreted at run-time, can be recursive, and can be triggered upon an event or executed periodically. The authors of iRODS designed these microservices to communicate through shared parameters, memory structures, and a high-performance message queue.

2.4. The NARA TPAP The archivists at NARA are tasked with preserving data for “the life of the republic” [13]. However, the best method for preserving electronic data has not been determined. Through the development of test beds such as the TPAP, archivists have evaluated a variety of methods for preserving electronic data. Multiple researchers have worked on this prototype as part of an eight-year ongoing collaboration between the University of North Carolina at Chapel Hill, the University of California at San Diego, the University of Maryland, the Allegheny Ballistics Laboratory, the

Georgia Institute of Technology, and the National Archives and Records Administration. The collaborators on the TPAP project have used a test bed of seven independent data grids, each with its own environment, metadata catalog, and storage system [14]. They located the data grids in North Carolina, California, West Virginia, Maryland, and Washington, D.C. The researchers federated the data grids through the creation of policies that establish trust relationships and control the sharing of collections and resources. They replicated NARA’s digital holdings across the data grids so that there is no one point of failure. They built two versions of the NARA TPAP. They used SRB for the first version and iRODS for the second. The administrators of the NARA TPAP currently manage over 6 million records and more than 5 terabytes of data. They internationally distribute more than 200 million files and more than 3 petabytes of collection data.

2.5. The Odum DVN

metadata via the protocol specifications. They designed it so that the administrator of a SP may restrict requests based on sets defined by the DP administrator or by date stamp range. The authors of the OAI-PMH have required DP administrators to expose their metadata using the Dublin Core Metadata Element Set (DCMES), although they have encouraged the use of richer metadata formats. Many developers of digital library systems have made their software applications OAI-PMH-compliant; this includes, but is not limited to, DSpace, FEDORA, GNU ePrints, and Contentdm. Digital librarians have considered the OAI-PMH to be a de facto standard since its release in 2001. As of August 2009, they have registered just under 1100 repositories on the OAI-PMH web site. We have not been able to find a current estimate as to how many metadata records are held in these repositories at this time, but as of 2006, administrators had registered 776 repositories containing over 10 million metadata records [15].

2.7. Proof-of-Concept

The founders of the H.W. Odum Institute for Research in Social Science, since 1924 the oldest institute at the University of North Carolina at Chapel Hill, created it to provide teaching, research, and social science data services. The data librarians at the Odum Institute have used an open-source platform called the Dataverse Network (DVN) to publish and disseminate their extensive collection of social science research studies and related data files. These administrators have built the Odum Data Archive to comprise seven Dataverses that contain 24,299 studies (e.g., collections containing bibliographic metadata about each social science study) with 614,887 digital data files associated with those studies. They chose the DVN digital library software because it is OAI-PMH compliant, uses Data Document Initiative (DDI) social science metadata, and provides federated searching. The data librarians realized when they chose this system that the DVN does not provide preservation capabilities. They have set up the system so that in the event of data loss, they will be able to recover the data via back up and replication agreements with other social science institutions.

During the initial phase of this research we created a proof-of-concept in order to understand how to extract metadata from the Odum DVN and ingest it into iRODS [16]. We “mapped” the DDI bibliographic metadata to Attribute-Value-Units2 (AVUs) triplets for upload into the iCAT, which provided us with an early challenge with regards to combining the archival/digital library model with data grid archival storage technology. We transformed the hierarchical structure of the DDI XML metadata to the flat structure of the iCAT AVUs triplets by using an XSLT to convert the metadata from the former to the latter. We created a process by which social science metadata is transferred from the Odum Data Archive into the NARA TPAP extension of iRODS via the OAI-PMH, XML and iRODS microservices. Our successful development of this prototype enabled bibliographic metadata-based queries within iRODS and provided one method by which administrators of digital libraries and archives with an OAI-PMHcompliant Data Provider may upload their metadata into the preservation grid.

2.6. The OAI-PMH

3. Ingest Model

The creators of the OAI-PMH based it on a clientserver architecture. The authors designed it so that a Service Provider (SP) or harvester requests metadata records or information about those records from a Data Provider (DP), i.e., a digital library or data archive whose administrator has "exposed" the repository's

While the authors of the OAI-PMH have provided a standard method for the extraction of metadata from the data provider to the service provider, as of this 2

A metadata Attribute-Value-Units (AVU) consists of an AttributeName, Attribute-Value, and one or more optional Attribute-Units [17].

writing no standard software protocol has been designed that can ingest the metadata, much less the data, into the service provider. There are many reasons for this, ranging from technical to administrative to resource-related. Thus, in this section we have limited ourselves to a discussion of the general methodological framework behind the ingest process, and we have provided examples of its application in the academic library and social science domains. The Consultative Committee for Space Data Systems (CCSDS) released the Reference Model for an Open Archival Information System (OAIS) in January 2002. The members of the committee defined a model for an “archive, consisting of an organization of people and systems, that has accepted responsibility for preserving information and making it available for a Designated Community” [18]. The authors of the OAIS model established Ingest as one of six functional entities, along with Access, Administration, Archival Storage, Data Management, and Preservation Planning. The committee members detailed five functions of the ingest entity: quality assurance, receive submission, generate AIP, co-ordinate updates, and generate descriptive information. The CCSDS discussed archive interoperability within the framework of four categories of archive association: independent, cooperating, federated, and shared resources. Repository administrators have considered the OAIS model to be the standard for repositories since the CCSDS released the recommendation. While they have found it time-consuming to implement, no one else has developed a competitive preservation model. The developers of iRODS have implemented the OAIS recommendations within the system architecture itself by providing policy virtualization that gives individual communities a mechanism to define their preservation policies coded as rules and microservices. Digital librarians have adapted the models above to their domain. For example, digital librarians at Tufts created an ingest guide for the preservation of electronic records using the FEDORA repository software. They developed a process using the OAIS model that moved electronic records from a record keeping system to a preservation system while maintaining their authenticity and integrity [19]. They detailed two primary sections within the ingest document that list the steps for negotiating the submission agreement and, transferring and validating the digital objects. With the former, the librarians provided the steps required for a Producer and Archive to develop a Submission Agreement, while with the latter they described the transformation, transfer, and validation of the records. The team developed the ingest guide in order to detail the steps university archivists must take to ensure trustworthy ingest.

Two data archivists at the Inter-university Consortium for Political and Social Research (ICPSR), an organization in which the Odum Institute has been a longstanding member, tested the conformance of the ICPSR data archive to the OAIS model [20]. They analyzed the submission, ingest, data release and archival storage, access, OAIS entities and pipeline of the repository. They concluded that overall the repository fulfilled many of the key functions described with the OAIS, but found two areas of improvement. The administrators of the ICPSR repository needed to publish a preservation policy, and they needed to clearly label and complete the preservation description information. Overall, they concluded that social science data and metadata are in an excellent position to be preserved.

4. Method We defined the following steps for the extraction and ingest of metadata and digital objects (Figure 1). Prior to running the batch process, we manually checked a random sample of metadata and digital objects to ensure completeness. Once we validated those files, we ran a batch process script that performed the following: 1.

2.

Harvested the DDI metadata from the DVN via the OAI-PMH with Net::OAI::Harvester, a Perl module from CPAN. a. Cached the metadata in a temporary file. b. Loaded the cached file directly into iRODS. Ingested the metadata. Ensured the file was uploaded to iRODS by checking the replica’s time stamp. a. Read the iRODS server’s clock. b. Checked the creation time of the metadata file in the iRODS server after it was uploaded. c. Made a list of the data objects to extract and ingest. d. Pulled the list from XML nodes in the DDI metadata that stored the URL and file name of the object to be downloaded. e. Performed the extraction of the URL and filename pairs using a SAX parser that is event-based and will work even when the XML file being processed is extremely large. f. Deleted the cached copy of the DDI XML metadata so that the ingest process only required about as much disk space as the largest study.

Figure 1 - Level 1 Dataflow of extraction and ingest process (High Level Diagram)

g.

3.

4.

Used an iRODS rule to call the XSLT and metadata ingest microservices to reformat the DDI metadata into AVUs and ingest them into the iCAT. Harvested the data objects from the DVN to iRODS by calling the included Java program on each URL/filename pair in the DDI XML metadata which: a. Contacted the DVN by HTTP and accepted the click-through agreement by calling code from the DVN source tree and downloading it by HTTP. b. Uploaded each object referenced by the URL/file name pair to the appropriate collection within iRODS using Jargon, the Java interface to iRODS. Validated the metadata and ingested it into the iCAT using an iRODS rule to compare it to a userprovided XML schema. Queried the iCAT to validate that all expected metadata attributes are present against a user-provided list of required attributes. Ensured that there is an isomorphism between the ingested data objects and the references in the metadata.

5. Results and Discussion We found that there were many benefits to migrating our metadata and digital objects to a secondgeneration data grid system using this tool. We built this prototype using existing, validated standards such as the OAI-PMH and XML to provide batch extraction and ingest of metadata and data files. The use of validated standards allowed us to take advantage of inhouse knowledge of existing standards and will allow others to apply the tool to their collections, albeit with some modifications. We enabled bibliographic metadata-based queries within iRODS that included both the studies themselves and the related data files; this allowed us to provide access to the digital material at the same level we provided via the DVN. We maintained the Terms and Conditions associated with the social science data including access controls, rights, and privacy issues by using iRODS access controls that allowed us to write customizable management policies. We automatically validated ingest of the content of 26 DDI elements into the iRODS iCAT and ingest of the DDI XML files and

related data files into the iRODS server using userdefined XML templates and iRODS iquery commands. Therefore, we knew the metadata had actually been ingested into the iCAT and the iRODS server. We have retained the original complete metadata files for preservation purposes and guaranteed that access to particular collections remains restricted based on existing policies and legal agreements. We proved we control who can and cannot access the data we migrated into iRODS by testing the access mechanisms. We tested the tool on other Odum collections and determined that it was generic enough to apply to all of Odum’s digital collections, albeit with some slight modifications to the XSLTs. During the initial proof-of concept we planned to enter all elements of the DDI metadata into the iCAT; we did not plan to upload the complete DDI XML file into the iRODS server [16]. We soon realized, however, that not all of the content available in the DDI file is relevant for users searching for a particular study. For example, we decided against ingesting elements into the iCAT that contain case information, such as a “2”, that would not return meaningful results from a query. We examined current search fields used by the administrators of the Odum DVN. We decided to ingest 26 of the available DDI metadata elements into the iCAT but also upload the entire bibliographic DDI file into the iRODS server as an object that is stored with the data files associated with the metadata record. We therefore retained a complete copy of the original metadata file for preservation purposes, but we also provided access to the relevant data and metadata via iCAT queries. A primary area of concern for us was how to maintain the rights and restrictions attached to these social science studies. We determined that by replicating the written Terms and Conditions into the iCAT as part of the metadata ingest, by uploading the entire DDI metadata file that contains the Terms and Conditions into the iRODS server as an object, and by using a software program that "clicks" the Terms and Conditions prior to downloading the digital objects, we have upheld the required legal declarations. In addition, we defined our policies as rules within iRODS itself, thereby restricting access to the data based on our legal agreements. We tested the rights restrictions by federating with a different iRODS data grid. We confirmed during testing that iRODS maintained the appropriate access to the digital objects and metadata. While we chose to validate the metadata as part of the ingest process, digital and data librarians may also validate metadata against a template at any time during the retention period of a collection. They can perform this on the metadata as part of the ingest process and

decide to accept, reject, or stage new objects according to their preservation policies. In the event of a policy change, these library administrators can apply the new policy to an existing collection in order to identify objects whose metadata requires updating to become compliant with the new policy. We tested the performance of the extraction and ingest process. We defined a “study” as a collection that includes the DDI XML file that describes the social science data plus the digital objects referenced in the DDI metadata, such as SPSS, SAS, or text files. We used poll results by Harris as our test collection in this project. We found that it takes about 10 seconds to load one study, 30 seconds to load three studies, 31 minutes to load 175 studies, and 3 hours to load 1074 Harris studies. We determined that the most timerelevant factor is the network speed from the DVN to the computer running the batch upload script. We planned to improve efficiency by calling the script from within iRODS rather than from outside the DVN and iRODS. We designed an ingest and extraction process that can be replicated easily by archivists and librarians, who are often limited both in the technical resources available and their ability to gain funding for this technical expertise. They are tasked with preserving objects that are important culturally, historically, legally, and politically, and they often do so under legal mandate. We chose to use standards familiar to librarians and archivists in order to minimize the technical burden required to extract and ingest metadata and digital objects from a digital library into the iRODS preservation data grid. We provided a process that librarians and archivists may adopt for their own collections, thus freeing them to focus on developing those, rather than developing new technology for the extraction and ingest of metadata and digital objects from a digital library into a preservation data grid. We have provided documentation for this purpose3.

6. Conclusion and Future Work In conclusion, the migration of collections between digital libraries and preservation data archives is possible using a seamless, automated batch upload for both data and metadata. We achieved this using existing standards and methods such as the OAI-PMH, XML, Java, and iRODS rules and microservices. We are confident that a program based on this prototype would be a useful addition to the archivists’ tool kit 3

Project documentation and code is available at: https://www.irods.org/index.php/Collaborative_Projects and http://verity.irss.unc.edu/pm/index.php?n=DVN-iRODS.HomePage.

and that it is one small step towards the development of a standard ingest protocol. Our next steps will involve refining the ingest process to compare checksums in the DVN with the original set of harvested objects and metadata in iRODS. We plan to test the applicability of the process by collaborating with other organizations within and outside of UNC to develop case studies. We plan to create a process by which we can sync the contents of the Odum DVN with the various replicas within the preservation grid. Our experience has shown us that the Odum Data Archive’s collection is fairly static, but we foresee a need to be able to add, delete, or change an existing metadata file and its associated data files. Finally, in order to build on our current work, we are formalizing social science data archive preservation policies from the Odum Institute and ICPSR and implementing them as iRODS rules.

9. Acknowledgements The authors would like to thank Reagan Moore for his comments on this paper. We would like to thank Patrick King for his work on the data flow diagram. This work is funded by the NSF grant OCI-0848296 and is a collaboration with NARA on the development of the "NARA Transcontinental Persistent Archive Prototype". The initial work on this project was funded by the NARA supplement to NSF SCI 0438741, “Cyberinfrastructure; from Vision to Reality” – Transcontinental Persistent Archive Prototype (TPAP) (2005-2008).

10. References [1] Lagoze, C. and Van de Sompel, H. (2001). The Open Archives Initiative: building a low-barrier interoperability framework. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, June 24-28, 2001, Roanoke, VA. pp. 54-62. [2] Davis, J. R. & Lagoze, C. (2000). NCSTRL: design and deployment of a globally distributed digital library. Journal of the American Society for Information Science, 51(3), 273280. [3] Tibbo, H.R. (2003). On the nature and importance of archiving in the digital age. Advances in Computers, 57, 167. [4] SDSC. (2008). Storage Resource Broker. Retrieved January 28, 2009 from http://www.sdsc.edu/srb [5] DICE. (2009). integrated Rule-Oriented Data System. Retrieved July 28, 2009 from https://www.irods.org/pubs/iRODS_Overview_0903.pdf [6] Tooby, P. (2007). Award-Winning TPAP Digital Preservation Prototype Keeps Growing. D-Lib Magazine, 13(7/8). Retrieved January 9, 2009 from http://www.dlib.org/dlib/july07/07inbrief.html

[7] DICE. (2009). Micro-services. Retrieved June 11, 2009 from https://www.irods.org/index.php/Micro-Services [8] InterPARES. (2001). The long-term preservation of authentic electronic records: findings of the InterPARES project. Retrieved October 5, 2007, from http://www.interpares.org/ip1/ip1_index.cfm [9] Kahn, R., & Wilensky, R. (1995). A Framework for Distributed Digital Object Services. Retrieved August 2, 2009, from http://www.cnri.reston.va.us/k-w.html [10] Moore, R. (2006). Building Preservation Environments with Data Grid Technology. American Archivist, 69(1), 139158. [11] Moore, R., Schroeder, W., Wan, M., Rajasekar, R., & Marciano, R. (2007). Information Management and Distributed Data. Presentation given at TeraGrid 2007, Madison, WI. Retrieved January 9, 2009 from http://www.teragrid.org/events/teragrid07/archive/presentatio ns/wednesday/srb-teragrid.ppt [12] SRB. (2008). SRB main page. Retrieved January 9, 2009 from http://www.sdsc.edu/srb/index.php/Main_Page [13] Thibodeau, K. (2007). The Electronic Records Archives Program at the National Archives and Records Administration. First Monday 12(7). Retrieved January 15, 2009 from http://firstmonday.org/issues/issue12_7/thibodeau/index.html [14] RENCI. (2008). NARA Transcontinental Persistent Archive Platform. Retrieved January 9, 2009 from http://www.renci.org/focusareas/hass/nara_tpap.php [15] McCown, F., Liu, X., Nelson, M.L., & Zubair, M. (2006). Search engine coverage of the OAI-PMH corpus. IEEE Internet Computing, 10(2), 66 - 73. [16] Ward, J., de Torcy, A., Mantooth, J., Chua, M., and Crabtree, J. (2009). Integrating Metadata into the NARA Transcontinental Persistent Archive Prototype via the OAI-PMH. In Proceedings of DigCCurr 2009: Digital Curation Practice, Promise and Prospects, Chapel Hill, NC, April 1-3, 2009. Retrieved September 30, 2009, from http://www.lulu.com/content/paperback-book/proceedingsof-digccurr2009-digital-curation-practice-promise-andprospects/6800574 [17] DICE. (2009). iRODS iMeta. Retrieved July 29, 2009 from https://www.irods.org/index.php/imeta [18] Consultative Committee for Space Data Systems. (2002). Reference model for an Open Archival Information System (OAIS) (CCSDS 650.0-B-1). Washington, DC: National Aeronautics and Space Administration (NASA). Retrieved April 3, 2007, from http://nost.gsfc.nasa.gov/isoas/ [19] Fedora and the Preservation of University Records Project. (2006). 2.1 Ingest Guide, Version 1.0 (tufts:central:dca:UA069:UA069.004.001.00006). Retrieved April 16, 2009, from the Tufts University, Digital Collections and Archives, Tufts Digital Library Web site: http://repository01.lib.tufts.edu:8080/fedora/get/tufts:UA069. 004.001.00006/bdef:TuftsPDF/getPDF [20] Vardigan, M. & Whiteman, C. (2007). ICPSR meets OAIS: applying the OAIS reference model to the social science archive context. Archival Science, 7(1). Netherlands: Springer. Retrieved February 20, 2008, from http://www.springerlink.com/content/50746212r6g21326/

Extracting and Ingesting DDI Metadata and Digital ... - Semantic Scholar

Extracting and Ingesting DDI Metadata and Digital ... - Semantic Scholar

Suggest Documents

Repurposing ProQuest Metadata for Batch Ingesting ...

Extracting metadata from fundus images for the ... - Semantic Scholar

Storing and Searching Metadata for Digital ... - Semantic Scholar

The DAF DDI Profile, a Metadata Set to Address Digital Curation and

Extracting Surface Textures and Microstructures ... - Semantic Scholar

Quantifying Product Favorability and Extracting ... - Semantic Scholar

Extracting functional, phylogenetic and structural ... - Semantic Scholar

Data and Metadata Management - Semantic Scholar

Metadata and Cooperative Knowledge Management - Semantic Scholar

OntoMiner: automated metadata and instance ... - Semantic Scholar

Metadata Management - Semantic Scholar

Metadata Matters - Semantic Scholar

Metadata Matters - Semantic Scholar

DDI Best Practice: Workflows for Metadata Creation Regarding ...

Extracting Metadata from Web Databases and ... - Columbia CS

Metadata harvesting in regional digital libraries in ... - Semantic Scholar

A Metadata Framework for Long Term Digital ... - Semantic Scholar

Digital Video Archives: Managing through Metadata - Semantic Scholar

Using metadata to support digital preservation - Semantic Scholar

Data and Metadata on the Semantic Grid - Digital Science Center ...

Repurposing MARC metadata: using digital project ... - Semantic Scholar

Enriching e-learning metadata through digital ... - Semantic Scholar

Embedding and Extracting Digital Watermark Based

Extracting Metadata From the Data Analysis Workflow