Learning Semantic Web Technologies with the Web-Based SPARQLTrainer Daniel Gerber
Marvin Frommhold
Universität Leipzig
Universität Leipzig
[email protected] [email protected] Sören Auer Michael Martin Sebastian Tramp Universität Leipzig
Universität Leipzig
Universität Leipzig
[email protected]
[email protected]
[email protected]
ABSTRACT The success of the Semantic Web in research, technology and standardization communities has resulted in a large variety of different approaches, standards and techniques. This diversity and heterogeneity often involve an increasing difficulty of becoming acquainted with Semantic Web technologies. In this work, we present the SPARQLTrainer approach for educating novices in semantic technologies in a playful way. With SPARQLTrainer educators can devise a SPARQL course by defining a number of exercises either generically or for a specific domain. Learners can complete courses by stepwise answering questions of increasing complexity. These questions usually require the learner to build a SPARQL query for querying a certain knowledge base and using certain SPARQL features. The SPARQL queries created by a learner are compared with example solutions given by the instructor. This comparison takes possible variations into account and gives specific feedback to the learner.
Keywords SparqlTrainer, SPARQL, RDF, training, e-learning
1.
INTRODUCTION
Recently, the Semantic Web and related technologies have visibly gained traction. Oracle has, for example, integrated support for semantic knowledge management into their database product, Google started to evaluate RDFa annotations and the W3C has lately launched the second revision of the Web Ontology Language (OWL) standard. The success of the Semantic Web in research, technology and standardization communities has, however, also resulted in a large variety of different approaches, standards and techniques. For example, a variety of knowledge representation formalisms with different expressivity is available with RDF,
RDF-Schema, and various OWL flavors. Moreover, there exist different serializations such as RDF/XML, N3, NTriple, RDFa, Trix, while the semantic web technology space is complemented by a wealth of different reasoners, triples stores, rule processors, semantic web service infrastructures, various APIs, etc. This diversity and heterogeneity involve an increasing difficulty of becoming acquainted with Semantic Web technologies. In this work, we present the SPARQLTrainer approach for educating novices in semantic technologies in a playful way. We selected the SPARQL query language as a means for getting acquainted with semantic web technologies at large for the following reasons: • In many ways SPARQL (e.g. its syntax) is inspired by, and very similar to, the well-known SQL query language for querying relational databases. This makes it easy for learners to get started. • By exploring different features of SPARQL, learners do not only get acquainted with the query language itself, but also with the underlying RDF data model, various serializations, vocabularies and ontologies (such as e.g. RDF-Schema and OWL as well as domain ontologies) or with semantic technology pillars such as triple stores. With SPARQLTrainer educators can devise a SPARQL course by defining a number of exercises either generically or for a specific domain. Learners can complete courses by stepwise answering questions of increasing complexity. These questions usually require the learner to build a SPARQL query for querying a certain knowledge base and using certain SPARQL features. The SPARQL queries created by a learner are compared with example solutions given by the instructor. This comparison takes possible variations (such as a different triple pattern order and variable naming) into account and gives specific feedback to the learner for improving her solution. The SPARQLTrainer is complemented by functionality for user and course management as well as email notifications. This demonstration paper is structured as follows: We give an overview of the general architecture of the SPARQL-
Trainer in Section 2. We describe the course creation and execution in Section 3 and conclude with related and future work in Section 4.
2.
ARCHITECTURAL OVERVIEW
We have implemented the SPARQLTrainer as a typical Java web application that runs as a servlet within a servlet container (e.g. Apache Tomcat1 ). The SPARQLTrainer is based on an extendable and modular architecture whose main components are depicted in Figure 1. We used the Jena framework [2] and the SPARQL processor ARQ2 to query local or remote ontologies. The components of SPARQLTrainer are:
• The course factory is responsible for generating course objects. Courses represented in the SPARQLTrainer XML format are loaded in the course pool and are additionally checked for errors. • The course pool is a Java representation of a collection of XML files. Each file represents a single course and is linked to a Java object as described in [3]. A detailed description of courses follows in Section 3.1. • The course runtime is responsible for evaluating submitted solutions and provides the user with corresponding feedback. It handles tasks such as guiding the learner question by question through the course, offers help and collects all answers for subsequent analyses. • User management enables administrators to add, delete and edit users. • The Core component comprises a large set of standard functionality. This includes servlet elements like dispatchers, classes for configuration and multilingualism, HTML template engines as well as the Spring framework3 and session management helpers. With regard to SPARQL, the core also contains forms, tables, templates and query execution classes. • The email utility sends the collected course results to the responsible tutor and the participant.
3.
FEATURES OF THE SPARQLTRAINER
The main purpose of the SPARQLTrainer, i.e. explaining and practicing SPARQL with students, is determined by two distinguished work flows. At first tutors need to elaborate a didactic concept which embraces finding or generating a suitable dataset and the composition of various questions on the basis of this dataset to illustrate the foundations of SPARQL and RDF. The process of creating courses based on this concept is described in Section 3.1. As soon as this process is completed and the course is properly deployed, learners are able to participate in the course as described in Section 3.2. 1
http://tomcat.apache.org/ http://jena.sourceforge.net/ARQ/ 3 http://www.springsource.org/ 2
3.1
Course Creation
A course consists of a set of elements containing questions related to a specific topic. Such topics should have different levels of difficulty and deal with various aspects of the SPARQL standard [5]. For example, a tutor can create exercises, which introduce a particular SPARQL feature, such as the query forms (SELECT, ASK, DESCRIBE and CONSTRUCT), pattern matching (OPTIONAL, UNION) or testing values (FILTER conditions and operators). Apart from some general information, the author of a course is able to configure different settings to enable different usage scenarios. Besides sequentially passing through a particular course, it is, for example, possible also that each participant has to solve a random selection of questions. The currently available parameters for course configuration are: • Title: This element defines a short title of the course, which is also used in the course selection menu. • Description: This description should include information on the topic which is covered by the individual elements of the course. • Course activation: It is possible to disable a course without having to remove it from the course folder. • Email of the tutor: The protocols of participants who successfully completed the course will be sent to this email address. If omitted no protocols will be created. • Number of questions: This parameter represents the number of questions the participant has to solve to pass the course successfully. • Random question selection: The set of questions which have to be solved by the participant will be randomly selected from the set of available questions. • Question execution: Sometimes it is important that the participant can only solve the questions one by one, for example, if the subsequent question contains hints about the solution of the previous one. • Password protection: By setting a password, it is possible to secure the course against public access. • Set of course elements: The set of all available course elements (i.e. exercises) from which the elements of an individual course instance will be selected. Now, we explain the representation of course elements in more detail. A course element deals with a particular task according to the course topic which has to be solved by the participant. To solve such a task, the participant has to find a correct SPARQL query for the given question. In this context a correct query means that the result set of this query has to be minimal and complete (i.e. must contain all information answering the question). • Title: A short title, which describes the course element, will be shown as heading of the course element view.
Figure 1: Architectural overview of the SPARQLTrainer. • Question: A textual description of what information the participant has to identify by the query which must be entered. • Dataset: The dataset which will be used for evaluation of the query if no dataset description4 is provided. • Solution query: The query which determines the correct result set for comparison with the result set of the participant’s solution query. • Set of conditions: For some queries the order of elements of the result set is crucial. It is, for example, necessary that the order of the individual rows in the result set of the solution query and the result set of the participant’s query is the same if the question requires the data to be sorted (ORDER BY clause).
tistically helpful information, such as information about the overall course time, the time elapsed for the current query and a counter of the number of submitted queries per question and course. If a participant actually submits a query, it is executed against the knowledge base and the subsequent result set is compared against the one given in the course element definition. If the results match, the user may proceed to the next course element. If the query contains syntactical issues, a warning message will be displayed as shown in Figure 2. In case the query is syntactically correct, but does not deliver the desired solution, different forms of feedback will be displayed: • Columns: The number of columns determined through this query does not match with the correct answer.
The definition of a course has to be represented in XML and adhere to SPARQLTrainer’s XML schema definition. For deployment, a course has to be put into a specific folder in the webapp directory and a reload of all courses in the administration view has to be performed. This triggers a validation process, which checks for consistency of all course XML files and for correct SPARQL syntax of the solution queries of all course elements.
3.2
A typical question view will then present the question’s title, a description, the default graph, a set of predefined namespaces and an input field to enter the desired query. In order to give the participant the opportunity to comprehend the knowledge base to be queried, we provide the possibility to review it in different RDF serialization syntaxes by using the Triplr5 web service. Additionally, there is a variety of sta5
• Triple pattern: The number of containing triple patterns does not match with the correct answer. Note, that this does not necessarily indicate an issue with the given query. • FILTER/OPTIONAL: The number of used FILTER and OPTIONAL clauses is too high or low.
Course Execution
After Section 3.1 has given an overview of how a course may be defined, this section will now present how courses are executed. There are two kinds of courses, public and restricted ones. Everybody may participate in public courses without registering, while for restricted courses login credentials and access control mechanisms are in place.
4
• Rows: The number of rows determined through this query does not match with the correct answer.
The query contains no FROM or FROM NAMED clause. http://triplr.com
After a user has successfully finished a course, all entered queries and some statistical information can be sent to the user and the responsible tutor.
4.
RELATED AND FUTURE WORK
Related Work. Web-based training of query languages is known from many different areas of computer sciences. Especially at the Institut f¨ ur Informatik of the University of Leipzig there are a number of such projects which provide web-based services for learning and training query languages. One of these projects is the Leipzig Online Test System, abbreviated as LOTS [1], which is used, for example, to train
Figure 2: Screenshot of SPARQLTrainer during course execution. SQL queries6 and XQuery7 . Even in the area of Semantic Web there already exist a number of such web-based services which allow to deal with SPARQL queries. For example, it is possible to validate the syntax of SPARQL queries with the help of the SPARQL-Validator8 . But it does not allow to understand the semantics of SPARQL queries. In order to understand the influence of the various parts of the SPARQL query language specification on the result set, it is important to be able to test different SPARQL queries on a given RDF data model. SPARQLer9 is an RDF query demo to execute SPARQL queries on such a small example model. SPARQLer can also be used to query any other graph on the web. However, this service does not provide a feature to solve a given set of successive questions. To the best of our knowledge, there is no e-learning environment for SPARQL available, which offers didactically useful courses, consisting of successive questions, to learn and train SPARQL queries in a guided way. It is in particular important, to guide the user towards the correct solution through appropriate feedback.
Future Work. Experiences with SPARQLTrainer have shown that the currently available features are sufficient to support learners in comprehending the SPARQL query language. However, one area for improvement is the administration view of the web interface. This includes the uploading of courses, editing course configurations as well as the generation of user accounts and groups. Since the current SPARQL specification does not include data manipulation features, an update language, known as SPARQL-Update [4], has been introduced. In order to provide a similar learning experience for SPARQL-Update, it 6
http://lots.uni-leipzig.de/sql-training/ http://lots.uni-leipzig.de/xqtrain/index.jsp 8 http://www.sparql.org/validator.html 9 http://www.sparql.org/query.html 7
is necessary to extend the SPARQLTrainer to support these new features. For example, it has to be considered how to evaluate SPARQL-Update queries accordingly and how to provide learners conveniently with appropriate feedback.
5.
REFERENCES
[1] T. B¨ ohme, E. Rahm, and D. Sosna. Konzeption und ¨ Einsatz eines Online-Ubungssystems an der Universit¨ at Leipzig. Technical report, Universit¨ at Leipzig, Institut f¨ ur Informatik, 2005. [2] J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the semantic web recommendations. In WWW Alt. ’04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 74–83, New York, NY, USA, 2004. ACM. [3] J. Fialli and S. Vajjhala. Java Architecture for XML Binding (JAXB) 2.0. Java Specification Request (JSR) 222, October 2005. [4] P. Gearon and S. Schenk. SPARQL 1.1 Update. W3C Working Draft, W3C, Oct. 2009. http://www.w3.org/TR/2009/WD-sparql11-update20091022/. [5] E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C Recommendation, W3C, Jan. 2008. http://www.w3.org/TR/2008/REC-rdf-sparqlquery-20080115/.