Dec 1, 2009 - Tupelo-based components deployed in desktop applications, on portals, and in AJAX applications interoperate to allow researchers to develop,.
Semantic Middleware for E-science Knowledge Spaces Joe Futrelle, Jeff Gaynor, Joel Plutchak, James D. Myers , Robert E. McGrath , Peter Bajcsy, Jason Kastner, Kailash Kotwani, Jong Sung Lee, Luigi Marini, Rob Kooper, Terry McLaren, Yong Liu National Center for Supercomputing Applications University of Illinois at Urbana Champaign 1205 W. Clark St., Urbana IL, 61801, USA {futrelle, jgaynor, plutchak, jimmyers, mcgrath, pbajcsy, jkastner, kkotwani, jonglee, lmarini, kooper, tmclaren, yongliu}@illinois.edu
ABSTRACT The Tupelo semantic content management middleware implements Knowledge Spaces that enable scientists to locate, use, link, annotate, and discuss data and metadata as they work with existing applications in distributed environments. Tupelo is built using a combination of commonly-used Semantic Web technologies for metadata management, content management technologies for data management, and workflow technologies for management of computation, and can interoperate with other tools using a variety of standard interfaces and a client and desktop API. Tupelo’s primary function is to facilitate interoperability, providing a Knowledge Space “view” of distributed, heterogeneous resources such as institutional repositories, relational databases, and semantic web stores. Knowledge Spaces have driven recent work creating eScience cyberenvironments to serve distributed, active scientific communities. Tupelo-based components deployed in desktop applications, on portals, and in AJAX applications interoperate to allow researchers to develop, coordinate and share datasets, documents, and computational models, while preserving process documentation and other contextual information needed to produce a complete and coherent research record suitable for distribution and archiving.
Categories and Subject Descriptors C.2.4 [Distributed Systems]: Distributed applications
Keywords Semantic web, content management, e-science
1. INTRODUCTION Scientific research is becoming increasingly distributed and multi-disciplinary, which brings with it new challenges of integrating the work of scientific communities across organizational and technical boundaries [29, 31]. A number of best-practice technologies from digital libraries, enterprise computing, and web publishing have been applied to scientific work, but have met with limited success because most of the technologies focus on centralized management of static content collections and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MGC'09, 30 November - 1 December 2009, Urbana-Champaign, Illinois, US. Copyright 2009 ACM 978-1-60558-847-6/09/11... $10.00
are therefore primarily used to archive or disseminate scientific results after the fact, which does little to help scientists produce better results more efficiently [36]. Science automation work has instead focused largely on workflow and Grid technologies for automating routine computational analysis, which not only makes dynamic exploratory development of models cumbersome (e.g., by requiring that scientists adopt a batch programming approach) [12] but also in practice leaves most of the intermediate data products in complex scientific work processes unaccounted for outside of the immediate execution context that produced them. Because of the limitations of these approaches, much scientific data has been embedded in structural containers (e.g., file systems, specialized databases) that are typically organized based on assumed subject matter, level of granularity, hierarchical organization, and object structure, preventing users with different assumptions or organizational schemes from finding or accessing relevant information. Knowledge is typically managed by embedding it as metadata in content objects or in similarly rigid containers, such as scripts or application code, limiting its ability to be shared and “remixed” with other metadata to enable new ways of exploring data and to augment existing knowledge. Several existing approaches address some of these issues but are missing capabilities that are critical for large-scale e-Science. Semantic web technologies provide explicit semantic representations and global identification, providing strong guarantees that heterogeneous metadata can be represented and linked without reducing its semantic specificity. But semantic web technologies are difficult to apply to the scientific use case because much of the tooling available provides only centralized indexing and querying of metadata, with little attention given to linking it to data or providing services for collaborative authorship and exchange of metadata descriptions. Content management systems (e.g., Jackrabbit [1], Drupal [5]) and institutional repositories (e.g., Fedora [7], DSPACE [6]) as well as more automated systems such as iRODS [25] provide generalized management of content and provide support for curation and collaborative authorship (e.g., OAI-ORE [18]), but assume centralized control of data and metadata, and either preserve metadata as a static content object or provide only “dumbed down” metadata support, such as
tagging [2] or static schemas, limiting the ability to migrate or reuse content objects as they are actively developed in complex, distributed, heterogeneous work processes. An emerging “semantic grid” practice has begun to address parts of this problem by applying semantic web technologies e-Science [4]. Efforts such as CombeChem [8] and MyExperiment [3] represent a new emphasis on active, shared development of content and workflow by scientists using a variety of web-based, desktop, and handheld interfaces that are integrated using semantic metadata. Other efforts such as nanoHub [16] extend the notion of Grid computing to include more contextual support, such as linking scientific computation through social networks. In part this new practice represents the influence of “web 2.0” practices on science automation [10]. In our view, it also points to a broader vision of digital scholarship enabled by situating scientific activity in systems that provide support not just for creating and sharing processes and data, but also for dynamically recontexualizing, annotating, revising, and tracking information as it is disseminated and used across widely separated communities and disciplines.
We have developed the Tupelo semantic middleware to enable new, more integrated Knowledge Spaces that combine the strengths of semantic web technologies and content management and address their shortcomings [30]. Knowledge Spaces enable users to locate, use, link, annotate, and discuss data and metadata as they work, without having to co-locate all data and metadata at a single institution or in a single repository, or having to abandon existing applications and services. Instead of requiring that data and metadata be restructured, packaged, and submitted to a storage management service, Tupelo allows users and applications to manage descriptions and linked information alongside existing content, as well as providing mechanisms to dynamically locate and recombine content from otherwise uncoordinated sources at various levels of granularity and specificity. At the publication phase, scientists can use knowledge spaces to publish intermediate data and executable descriptions of analytical processes so that other researchers and the public can reproduce, modify, and further share the content of complex ongoing scientific investigations, reducing time to discovery and providing a richer and more complete research record for preservation. The implementation of knowledge spaces in Tupelo middleware is guided by several important principles that have been derived from best practices in data and metadata interoperability [36]: •
• •
Data and metadata should retain its meaning when it is moved from one container to another, because otherwise its meaning will degrade as it migrates through the network. Metadata should be able to be interpreted automatically as much as possible, because manual effort is not available at scale. An account of how data was produced is often more valuable than the data itself, and can span multiple, independent processes.
To implement these principles we have adopted semantic web technologies for representing metadata, ideas from content management systems (CMS) for managing data, and have co-developed and implemented the Open Provenance Model (OPM) [27] for describing complex process and data provenance. Rather than assuming that a single deployment framework such as a service oriented architecture (SOA) or workflow engine can subsume all the distributed resources that make up an e-Science environment, we have developed a “Context” abstraction that provides applications in a variety of different deployment scenarios (e.g., desktop, web, Grid) with a semantic content “view” of the resources at hand (e.g., file systems, databases, web services). Contexts can be “wrapped” around existing data providers, storage technologies, query engines, and services. Implementations are provided for file systems, relational databases, and RDF triple stores. Contexts can also be aggregated to provide unification, mirroring, failover, and a variety of other configurations that coordinate access to distributed, heterogeneous sources. Finally Contexts can be used to perform computational inference, extract metadata from data, and enforce local access rules and policies. Tupelo has been used to develop a suite of interoperable, context-aware tools, including the CyberIntegrator provenance-aware exploratory workflow tool, the CyberCollaboratory web-based collaboration tool, and the Digital Synthesis Framework for publishing interactive datasets. These tools have been deployed to create Knowledge Spaces supporting environmental and other sciences, and to provide provenance support for a growing collection of workflow projects as part of the Provenance Challenge workshop series [26, 28], which has brought together developers of workflow systems such as Kepler [21] and Taverna [32] in an attempt to achieve interoperability.
2. TUPELO ARCHITECTURE The design of Tupelo has been informed by a number of other middleware architectures, most notably content management systems (CMS) (e.g., [1, 5]) and Grid computing [42]. Like content management systems, Tupelo manages information using an extensible content model that is decoupled from the storage and indexing technology used to manage it. Like Grid computing, Tupelo assumes that operations can be delegated and transported over the network, to allow large-scale distributed resources to be used. Unlike both approaches, Tupelo can provide uniform access to local or remote resources, even resources that are not under its control. Tupelo is based on an abstraction called “context”, which represents a kind of semantic “view” of distributed resources. Context implementations are responsible for performing “operators”, which are atomic descriptions of requests to either retrieve or modify the contents of a context. Two primary kinds of operations are provided: 1.
Metadata operations, including asserting and retracting statements (i.e., RDF statements) and searching for statements that match a query; and
2.
Data operations, including reading, writing, and deleting binary large objects (BLOB’s), each of which is identified with a URI.
Tupelo implements metadata operations typically by delegating them to an RDF triple store, but can also wrap non-RDF data sources or sinks by interpreting those data sources using arbitrary RDF ontologies and vocabularies. For instance, Tupelo provides a context implementation that can describe the hierarchical structure of a local filesystem using RDF terms for directories, files, and permissions. Above the Tupelo context layer, facts about the contents of the filesystem can then be inferred using non-filesystem-specific metadata operations. Data operations are performed against Tupelo contexts by delegating them to a storage service, such as a filesystem or database. Because most of these back-end technologies address data items using local identifiers rather than URI’s, Tupelo provides a number of configurable mechanisms for mapping between global URI’s and local identification schemes. In addition to wrapping existing content and services, contexts can also delegate operations to other contexts, a capability that forms the basis of Tupelo’s aggregation mechanism. Because operations are stateful and can be transformed before and after they are performed, chains or networks of contexts can negotiate with each other in response to an application request and can cooperate to perform mirroring, failover, validation, notification, rules-based inference, metadata extraction, and other capabilities that may be required for an application domain. The dynamic, modular nature of contexts and operators can help bootstrap simple context implementations by allowing them to use existing contexts to implement portions of their capabilities. For example, Tupelo can perform a SPARQL [34] query operation directly against a Context backed only by an RDF/XML file (which normally has no query engine) by staging the RDF/XML data into an in-memory Context that can service the SPARQL query. Tupelo Contexts provide a point of interaction with Tupelo and also serve as an extension point for both data provider and consumer applications. But they are not necessarily points of ownership and control, and so therefore can be used in the case when multiple users/applications want to access the same information, but in different ways. In this case, different Contexts provide different views into the same information. Similarly, a variety of different information may need to be combined for a single use or application. In this case, a single Context aggregates and unifies access to the multiple information resources providing and managing that information. Tupelo currently provides a number of context implementations including: RDF triple stores (Jena [15], Sesame [33]); file systems (local, via SSH); HTTP and WebDAV ([11]) clients; and services (e.g., RSS). By combining these Context implementations, a number of novel capabilities can be implemented with relatively minimal configuration. For example, filesystems that do not support annotation and tagging can be annotated and
tagged by combining a filesystem-backed context with a context backed by an RDF triple store, thereby allowing RDF statements to be made about files and their contents without having to manage copies of the files in a metadatacapable repository. Tupelo middleware provides uniform global identification across the metadata and data in a Context and across Contexts. This makes it possible to use Semantic Web linking even when interacting with resources with nonstandard or non-global identification schemes. Tupelo cannot solve all problems with identifiers, but Tupelo allows different identification policies and mappings to be applied in different contexts, without requiring underlying applications to agree on a single identification scheme or adopt a centralized identification service.
2.1
Server Implementation
To facilitate use in web applications and other networked components, Tupelo provides a client/server protocol that is compatible with, and extends, Nokia’s “URI Query Agent” (URIQA) protocol [38]. The protocol provides a REST endpoint for data objects such as binary streams as well as providing extended HTTP methods allowing clients to query and modify RDF metadata by URI (i.e., the “MGET” and “MPUT” methods). Tupelo extends this protocol to allow clients to issue SPARQL queries and retrieve results in standard SPARQL results document formats, as well as adding extensions for traversal of non-hierarchical relationships expressed in RDF (e.g., social networks). Tupelo’s server implementation provides tools for configuring and monitoring back-end contexts, as well as JAAS-based authentication ([40]) and context-level access control via the Web server, to allow integration with a variety of authentication methods and access control policies. The Tupelo server can be used in much the same way as a WebDAV server can, with the added benefit that it can be used to manage RDF nodes and blobs with arbitrary URI’s and not just URI’s that share the server’s URL prefix. Like other Tupelo contexts, the content in multiple Tupelo servers can be combined with a simple generic aggregation construct, which is more lightweight than the batch restructuring and reorganization required to merge content collections in the Content Management Systems or Institutional Repositories backing typical portals. By the same token, a Tupelo server can be used to filter, augment, or transform the content of another Tupelo server, allowing data and metadata from one context to be shared in environments with differing requirements for terminology, level of specificity, and granularity.
2.2
Client Interfaces
On the client side, Tupelo provides a number of API’s and libraries, including a Java “kernel” API allowing Java applications to use Tupelo capabilities, a Python client to the Tupelo web service, and user interface models for trees, tables, and networks, that can dynamically present multiple structured interactive views based on user-specified semantic relationships. For Java clients, Tupelo provides a dynamic Java to RDF mapping that allows Java Beans
([39]) to be mapped to RDF vocabularies, enabling applications to read and manage RDF data as Java objects. Mappings can be altered at runtime and shared as RDF metadata, so applications with different Java API’s can interoperate by being configured to map the relevant parts of those API’s to shared RDF terms. Tupelo provides mappings for a number of standard and proposed RDF vocabularies defining relevant core abstractions such as people, datasets, documents, workflow, and provenance. For AJAX clients ([9]), Tupelo’s server implementation provides several JSON endpoints enabling basic context operations and client-side processing of query results.
2.3
Computational Inference
Semantic web technologies provide the ability to infer metadata from other metadata using declarative languages such as OWL and SWRL [13]. These techniques greatly ease the process of linking or aggregating metadata from a variety of sources, as well as providing a means of codifying and implementing domain-specific assumptions, taxonomies, and other aspects of description and organization. But these techniques are essentially limited to the simple statements of predicate logic, and so are unable to cope with the kinds of computations scientists routinely perform in order to validate, clean, resample, extract features from, and classify scientific data [23], although some work has been done to support more restricted kinds of computation [37]. To address this missing capability, Tupelo can serve as a computational extension mechanism that can execute arbitrary domain-specific code and workflows as users interact with data, automating the production and management of metadata which can then be used to annotate, organize, or contextualize the data for the task at hand. This uses a general “plug-in” capability similar to Jena plug-ins, which have been used in similar ways [17]. Specifically, this feature can be used to support “computational inference” in which analytic code is used to infer metadata in one vocabulary from metadata in another. For example Tupelo can use GIS code to infer location place names from location metadata and information about the location and extent of named places, so a point located inside a complex region called “Illinois” can be tagged as being located in Illinois, allowing applications without GIS capabilities to search by place name. This capability is analogous to the “datablade” concept (e.g., [41]), but is more general in that it can be used not just for indexing but also for hypothesis testing (e.g., validating computationally-derived inferences against ontologies) and other data-driven computational processes. The computational inference mechanism can also be used to support automated translation and metadata extraction. Such capabilities could be implemented via format-specific external translators or, as currently being pursued by Yigang Zhou as part of a Google Summer of Code fellowship [43], through generic tools that interpret declarative descriptions of how metadata is embedded in data formats [24].
3
APPLICATION USE CASES
The following sections highlight areas where the Knowledge Spaces approach has enabled new capabilities and reduced the effort required to build and deploy domainspecific applications.
3.1
Virtual Rainfall Sensors
In the NCSA virtual sensor project [19], spatial, temporal and thematic transformations of real sensor data implemented as computational models are used to create virtual streams of sensor data. Tupelo middleware is used to characterize virtual sensors spatiotemporally and to manage provenance metadata. Computation is orchestrated via CyberIntegrator to execute the spatial, temporal and thematic transformations and track provenance [20]. To describe virtual sensors, a Tupelo context combines descriptions from several ontologies. First, a streaming data ontology is used to model the time aspects such as “most recent frame”, “last 10 frames” etc. [35] Second, for the spatial aspect of the virtual sensor, we reuse geometric primitives from GML (Geography Markup Language). For point-based virtual sensor, we reuse WGS84 concepts such as geo:latitude and geo:longitude. Third, for the thematic aspect of the virtual sensor, we define a new concept “hasThematicInterest” which describes the meaning of the observation produced by the virtual sensor. For example, the National Weather Service (NWS)’s NEXRAD (Next Generation Weather Radar) Level II data measures reflectivity, which can be used to derive rainfall rate. In our current work, ThematicInterest is a list of values (such as rainfall rate, or rainfall accumulation), of which a virtual sensor can pick one. Fourth, we leverage the open provenance model (OPM) and use concepts such as “wasDerivedFrom” to record the causal relationships between the raw and/or other virtual sensor and the new virtual sensor. This aids the interpretation and validation of products derived from virtual sensor data.
3.2 DigiDirt: Plant Growth Modeling for Education Developed in conjunction with the University of Illinois College of Education and the University of Illinois Extension, the DigiDirt project provides students in the middle school range with a web-based educational environment in which they can run state-of-the-art computational models to explore factors that influence plant growth and assess the effect of climate change on yield and the implications for agricultural decision-making. Student activities are implemented as scientific workflows in the Cyberintegrator scientific workflow management system [22]. The workflows wrap the WIMOVAC model [14], specify how to execute the model and post-process the results. Each time a student submits a new execution, a specific Cyberintegrator Server out of a service pool handles the request, launches five executions of the model, post-processes the results, and provides the web application with the information it needs to visualize the results. Tupelo’s architecture and associated tool suite made it
possible to rapidly assemble the plant growth application from reusable components designed to manage general capabilities such as workflow execution and metadata management, with less development effort required to adapt the tools to a specific deployment profile. Generic desktop, web-based, and server-side components all used the same core set of Tupelo operations to adapt to domainspecific metadata descriptions, alleviating the need to “hard-wire” domain semantics into the component design.
4. CONCLUSION Knowledge Spaces are novel approach to supporting digital scholarship, allowing scientists to manage not just digital objects such as files and datasets, and processing such as workflow execution and provenance information, but also domain-level descriptions that link the digital objects together into a coherent account of scientific discovery (e.g., associating CAD data, GIS data, and JPG imagery into a composite digital representation of what is known about a particular building, which can then be directly shared and annotated in domain-relevant terms rather than having to be referred to and treated as a collection of data structures or files). Tupelo’s interoperability-based architecture allows it to be used to connect (i.e., without replacing or displacing) existing software stacks to add context and help integrate the heterogeneous aspects of large-scale scientific work, including observation, analysis, organization, and publication. Contexts enable domain specific views of the Knowledge Space, as well as aggregation and computational inference, without requiring centralized stores or agreement in advance about metadata. Tupelo has made it possible to develop and deploy Knowledge Spaces for a variety of applications, providing end-to-end provenance tracking, active curation of data and documents as well as the links between them, and collaborative annotation and tagging. In addition to reducing development effort required to support scientific domains, Tupelo’s architecture allows an active view of scientific work with strong guarantees of reusability based on explicit semantics and declarative descriptions of analytic processes, opening new opportunities for more effectively disseminating and preserving the fruits of ongoing, evolving scientific discovery.
5. ACKNOWLEDGEMENTS This material is based upon work supported by the National Science Foundation (NSF) under Award No. BES-0414259, BES0533513, and SCI-0525308 and the Office of Naval Research (ONR) under award No. N00014-04-1-0437. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of NSF and ONR.
6.
REFERENCES
[1] Apache. Jackrabbit. 2006, http://incubator.apache.org/jackrabbit/.
[2] DCMI Usage Board. DCMI Grammatical Principles. 2007, http://dublincore.org/usage/documents/principles/. [3] De Roure, David, Carole Goble, and Robert Stevens, The design and realisation of the Virtual Research Environment for social sharing of workflows. Future Generation Computer Systems, 25 (5):561-567, 2009. [4] De Roure, David and James A. Hendler, E-science: The Grid and the Semantic Web. IEEE Intelligent Systems, 19 (1):65-71, February 2004. [5] drupal.org. drupal.org | Community plumbing. 2008, http://drupal.org/. [6] dspace.org. dspace.org - Home. 2008, http://www.dspace.org/. [7] fedora-commons.org. Fedora Commons. 2008, http://www.fedora-commons.org/. [8] Frey, J., D. De Roure, K. Taylor, J. Essex, H. Mills, and E. Zaluska, CombeChem: A Case Study in Provenance and Annotation Using the Semantic Web. Lecture Notes in Computer Science, 4145:270, 2006. [9] Garrett, Jesse James. Ajax: A New Approach to Web Applications. 2005, http://www.adaptivepath.com/ideas/essays/archives/00 0385.php. [10] Goble, C., e-Science is me-Science: What do Scientists want?, in Enabling Grid for E-Science. 2006: Geneva, Switzerland. [11] Goland, Y., E. Whitehead, A. Faizi, S. Carter, and D. Jensen, HTTP Extensions for Distributed Authoring -WEBDAV. IETF RFC 2518 1999. [12] Hannay, Jo Erskine, Hans Petter Langtangen, Carolyn MacLeod, Dietmar Pfahl, Janice Singer, and Greg Wilson, How Do Scientists Develop and Use Scientific Software?, in Second International Workshop on Software Engineering for Computational Science and Engineering. 2009, IEEE. [13] Horrocks, Ian, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean, SWRL: A Semantic Web Rule Language Combining OWL and RuleML. W3C Member Submission, 2004. http://www.w3.org/Submission/SWRL/ [14] Humphries, S. W. and S. P. Long, WIMOVAC: a software package for modelling the dynamics of plant leaf and canopy photosynthesis. Comput. Appl. Biosci., 11 (4):361-371, August 1, 1995 1995. [15] JENA. JENA. 2003, http://www.hpl.hp.com/semweb/jena.html. [16] Klimeck, Gerhard, Michael McLennan, Sean P. Brophy, George B. Adams III, and Mark S. Lundstrom, nanoHUB.org: Advancing Education and Research in Nanotechnology. Computing in Science and Engineering, 10 (5):17-23, 2008. [17] Kumar, Vijay S., Sivaramakrishnan Narayanan, Tahsin Kurc, Jun A. Kong Jun Kong, Metin N. A. Gurcan Metin N. Gurcan, and Joel H. A. Saltz Joel H. Saltz, Analysis and Semantic Querying in Large Biomedical Image Datasets. Computer, 41 (4):52-59, 2008. [18] Lagoze, C., H. Van de Sompel, P. Johnston, M. L. Nelson, R. Sanderson, and S. Warner, Open Archives Initative Object Reuse and Exchange (OAI-ORE). Technical report, Open Archives Initative, December
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26] [27]
[28]
[29]
[30]
2007. Available at: http://www.openarchives. org/ore/0.1/toc, 2007. Liu, Yong, D. Hill, A. Rodriguez, L. Marini, R. Kooper, J. Myers, Wu Xiaowen, and B. Minsker. A new framework for on-demand virtualization, repurposing and fusion of heterogeneous sensors. In Collaborative Technologies and Systems, 2009. CTS '09. International Symposium on, 2009, 54-63. Liu, Yong, David J. Hill, Alejandro Rodriguez, Luigi Marini, Rob Kooper, Joe Futrelle, Barbara Minsker, and James D. Myers, Near-real-time precipitation virtual sensor using NEXRAD data, in Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems. 2008, ACM: Irvine, California. Ludäscher, Bertram, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao, Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, 18 (10):10391065, 2006. http://dx.doi.org/10.1002/cpe.994 Marini, L., R. Kooper, J. Myers, and P. Bajcsy, Towards Digital Watersheds using Dynamic Publications. Cyberinfrastructure special issue of Water Management, September 2008. McGrath, Robert E. and Joe Futrelle, Reasoning about Provenance with OWL and SWRL, in AAAI 2008 Spring Symposium "AI Meets Business Rules and Process Management". 2008: Palo Alto. McGrath, Robert E., Jason Kastner, Alejandro Rodriguez, and Jim Myers, Towards a Semantic Preservation System. National Center for Supercomputing Applications, Urbana, 2009. http://cet.ncsa.uiuc.edu/publications/Semantic_Preserv ation_System.pdf Moore, Regan and MacKenzie Smith. Assessment of RLG Trusted Digital Repository Requirements. In Workshop on "Digital Curation & Trusted Repositories: Seeking Success" at Joint Conference on Digital Libraries (JCDL 2006), 2006. Moreau, Luc, Special Issue: The First Provenance Challenge. Concurrency and Computation: Practice and Experience (on-line), November 2007. Moreau, Luc, Juliana Freire, Joe Futrelle, Robert McGrath, Jim Myers, and Patrick Paulson, The Open Provenance Model: An Overview, in Provenance and Annotation of Data and Processes. Springer, Berlin, 2008, 323-326. Moreau, Luc, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, and Laszlo Varga, The provenance of electronic data. Communications of the ACM, 51 (4):52-58, April 2008. Myers, James D. and Thom H. Dunning. Cyberenvironments and Cyberinfrastructure: Powering Cyber-research in the 21st Century. In Foundations of Molecular Modeling and Simulation (FOMMS 2006), 2006. Myers, James D., Joe Futrelle, Jeff Gaynor, Joel Plutchak, Peter Bajcsy, Jason Kastner, Kailash
[31]
[32]
[33] [34]
[35]
[36]
[37]
[38] [39]
[40] [41] [42]
[43]
Kotwani, Jong Sung Lee, Luigi Marini, Rob Kooper, Robert E. McGrath, Terry McLaren, Alejandro Rodriguez, and Yong Liu. Embedding Data within Knowledge Spaces. 2009, http://arxiv.org/abs/0902.0744v1. Myers, James D. and Robert E. McGrath, Cyberenvironments: Ubiquitous Research and Learning (in press), in Ubiquitous Learning, B. Cope and M. Kalantzis, Editors. University of Illinois, Urbana, 2009. Oinn, Tom, Mark Greenwook, Matthew Addis, M. Nedim Alpdemir, Justin Feris, Kevin Glover, Carole Gobel, Antoon Goderis, Duncan Hull, Darren Marvin, Petr Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipar, and Chris Wroe, Taverna: lessons in creating a workflow environment for the life scienes. Concurrency and Computation: Practice and Experience, 18 (10):1067-1100, 2006. openrdf.org. Sesame. 2008, http://www.openrdf.org/. Prud'hommeaux, Eric and Andy Seaborne, SPARQL Query Language for RDF. W3C W3C Recommendation, 2008. http://www.w3.org/TR/rdfsparql-query/ Rodriguez, Alejandro, Robert E. McGrath, and Jim Myers, Semantic Management of Streaming Data (submitted), in Workshop on Semantic Sensor Nets at International Semantic Web Conference. 2009: Washington, DC. Ross, S., Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries, in 11th European Conference on Digital Libraries. 2007: Budapest. http://www.ecdl2007.org/Keynote_ECDL2007_SROS S.pdf Sánchez-Macián, Alfonso, Encarna Pastor, Jorge de López Vergara, and David López, Extending SWRL to Enhance Mathematical Support, in Web Reasoning and Rule Systems, 2007, 358-360. http://dx.doi.org/10.1007/978-3-540-72982-2_30 Stickler, Patrick, URIQA: The Nokia URI Query Agent Model. Nokia, 2004. http://swdev.nokia.com/uriqa/URIQA.html Sun Microsystems. Java SE Desktop Technologies Java Beans. http://java.sun.com/javase/technologies/desktop/javab eans/index.jsp. Sun Microsystems Inc. Java SE Security. 2009, http://java.sun.com/javase/technologies/security/. Ubell, Michael, The Montage extensible DataBlade architecture. SIGMOD Rec., 23 (2):482, 1994. von Laszewski, Gregor, Ian Foster, Jarek Gawor, and Peter Lane, A Java commodity grid kit. Concurrency and Computation: Practice and Experience, 13 (89):645-662, 2001. http://dx.doi.org/10.1002/cpe.572 Zhou, Yigang. Integration of Tupelo and defuddle for supporting arbitrary data translation and management using DFDL (Google Summer Of Code Student Fellowship). 2009, http://socghop.appspot.com/student_project/show/goo gle/gsoc2009/ncsa/t124022775909.