A semantic Repository for Storing and accessing e-learning Objects

0 downloads 0 Views 310KB Size Report
ROSA: a Repository of Objects with Semantic Access for e-Learning. Fábio Porto1,2, Ana Maria de C. ...... UF Query By Example. AT Relational Database.
ROSA: a Repository of Objects with Semantic Access for e-Learning Fábio Porto1,2, Ana Maria de C. Moura1, Fabio José da Coutinho da Silva1 Military Institute of Engineering - IME/RJ1 Department of Computer Engineering - Rio de Janeiro - Brazil [fporto,anamoura,fabiojcs]@de9.ime.eb.br Laboratório Nacional de Computação Científica – LNCC2

Coordenação de Ciência da Computação – Petrópolis – Brasil [email protected] Abstract. Learning Content Management Systems (LCMS) supports e-learning applications with storage and efficient access for elearning objects (LO)s. ROSA is a LCMS built as a semantic layer on the top of an XML native DBMS. In this paper, we present ROSA Data Model designed as an extension to RDF data model with ordered collections and relationship properties and cardinalities. An algebra is proposed with operations supporting queries over LO metadata and navigation through LO conceptual maps. Equivalence properties of relationships are implicitly considered, during navigation, by a query engine that supports ROSA operations and data structure. Additionally, a thesaurus is used during query processing extending the domain vocabulary with synonyms, generic/specific and associated terms. Finally, the system implements web semantic technology based on a formal data model tailored for an efficient storage and access of e-learning resources.

1. Introduction Distance Education, also known as open, flexible or distributed learning, is a mode of education whereby learners are physically separated from the institution, and where the learning process takes place outside the education establishment. Students learn where and when it suits them, at their own pace [BC01]. This education mode resorts to various educational technologies to facilitate both the learning and the communication processes between tutors and learners. LCMSs are being developed with the intention of abstracting e-learning material storage and access from applications. The fundamental unit of reference on e-learning data modeling has been specified as a Learning Object (LO). A LO is a collection of reusable material used to support learning, i.e., an entity that can be digital or not, and can be used for learning, education or training [Fri01]. A LO is identified by a set of metadata descriptors established by an international metadata standard, such as LOM (Learning Object Metadata) [IEEE02] and SCORM (Sharable Content Object Reference Model) [ADL02], an extension of LOM. The data elements that constitute a metadata instance of a LO are organized into a hierarchy, providing information about: its general characteristics, life cycle, meta-metadata, technical requirements, educational characteristics, intellectual property, relationships between other LOs, LOs annotations and LO classification system. In a LCMS, the process of designing e-learning courses may be supported by query processing techniques that explore metadata and semantic relationships to aid in LOs search. This is especially useful in an academic community willing to

publish and to share their courses materials. Thus ROSA has been conceived to support the design phase of e-learning courses, where tutors want to share and search for course material. In this work we propose an environment that provides the ability to store and retrieve LOs based on their semantic contents. Hence, an academic user such as a professor, could create or improve his own course exploiting subjects not contemplated in other courses already published in the system, as well as a student might access the system to get information on a subject based on specific characteristics. So, queries such as: “which topics are relevant to database learning?”; “which course material comprehends an OO Database topic?”; “which subjects are basis for teaching Query Optimization?”; or even “which courses are prerequisite for a database course?” are supported by ROSA system, which takes into account predicate semantic content. Questions related to synonyms, specific/generic and associated terms are dealt by a domain thesaurus that helps during query processing. The ability to express queries over ROSA conceptual maps with thesaurus support leads to simple and powerful search engine, adequate for supporting large communities of tutors. In addition, students may also navigate through conceptual maps looking for interesting LOs. The relationships defined among LOs aid in both discovering materials and in establishing a didactic sequence of learning. This project uses Semantic Web technology to enable semantic query capability into LO repositories. The Semantic Web is built upon two W3C standards, corresponding to the second and third layers respectively of Berners-Lee’s architecture [Ber00]: the XML language [W3C03], which provides standard syntax for data representation, ensuring data exchange; and the framework Resource Description Framework – RDF [RDF], a standard data model for expressing structure and representing metadata vocabularies, serving as a metadata language for the Semantic Web. This model allows the user to define relationships between resources as semantic graphs on the Web, using XML as the syntax language to transport data. In addition, RDF includes a schema, named RDF Schema (RDFS) that can be used to define application specific vocabularies, defining taxonomies of resources and properties used by specific RDF descriptions. This paper focuses on presenting ROSA data model as a formal basis for the development of ROSA LCMS, in the context of the Semantic Web. LO is the main structure of the data model, which is complemented with semantic and aggregation types of relationships. Thus ROSA has been conceived to support the design phase of e-learning courses, where tutors want to share and search for course material associations. These are modeled as an extension to RDF statements. ROSA data model provides a powerful algebra that makes it possible to query on LOs metadata, as well as on the semantic associations between LOs. ROSA is designed as a semantic layer middleware on the top of a DBMS. The project has been sponsored by Consist, the Brazilian Software AG representative, which provides the Tamino XML native DBMS as a storage layer [TAM]. As a result, some of the propositions in this paper are influenced by the target XML data model, although mappings to other data models may also be envisaged.

2

The main contributions of this work are as follows: • specify a data model tailored to represent and evaluate queries over a semantic repository to be used as a basis for query evaluation and optimization; •

specify ROSAQL, a query language supporting the proposed data model;



allow users to name and classify relationships. Relationship types can be classified as reflexive, symmetric and transitive, which provides implicit semantic processing of relationships, during query evaluation;



allow users to query, data and relationships indistinctly, as both are expressed as part of the instance database, which offers a great advantage when compared to OO models;



integrate a thesaurus into the semantic query processing strategy in order to extend the semantic capabilities;



simplify, with respect to RDF, the representation of attribute values, by using a traditional database attribute representation approach, which both simplify query formulation and predicate definition. Additionally, this model extends RDF tuple concept considering n-ary cardinality embedded into relationship collections;



model collections of objects associated through predicates, giving associated objects an uniform treatment;



implement a query engine using a Query Execution Engine Framework (QEEF) to support operations over LOs and their associations, using XML internal representation;



define a tuple data structure for an XML internal LO representation;

The rest of the paper is organized as follows: section 2 presents a LO conceptual map that serves as a use case to describe the data model in section 3. This section gives a comprehensive description of its structure and algebra, with examples based on the queries presented before. Section 5 gives an overview of the system implementation.. Section 6 presents related work, and finally, section 7 concludes the paper with additional comments and future work.

2. LO conceptual map This section presents a use case for the ROSA project. It comprehends a RDF style representation of a LO conceptual map for a Master program in Computer Systems and a class diagram representing its schema. Use case data and queries reflect ROSA objective of supporting the design of e-learning courses. LOs metadata and their relationships offer a rich semantic model for searching the repository during class preparation. A conceptual map is an abstract mechanism that helps the e-learning designer to graphically model relationships among LOs of his e-learning application. It is represented by a directed-labeled graph, where nodes represent LOs, identified by their names, and directed labeled edges model relationships or semantic predicates the origin node has with the target one, like predicates in RDF. Figure 1 presents a partial LO conceptual map of the Master Program in Computing Systems design at IME and the corresponding schema describing and classifying LOs and their relationships. The classes Program, Course and Topics, which

3

correspond to domain specific types of LO, represent the schema. In fact, they model the main composition structure of courses in most school programs in Brazil. In this map, some LOs are associated to their classes in the schema through a dashed line. Relationships among LOs can express, for example, how courses of a specific program can be related to the topics they cover, or when it should be taught, whether before or after a certain topic or course according to the corresponding semantic of predicates, such as is-prerequisite for, is basis for or depends on.

comprehends

Program

Course

covers

Topic covers

Master Program

PHD Program

E-learning SQL nts SQL me a d fun ts Relat. Relat. en m Algebra Algebra s a r d ve fun co QUEL Query covers Relat. Lang. covers Calculus

Data Struct comprehends

is_prereq_of

IME Computing Computing Systems comprehends IME Systems comprehends

is _

s ba

....

rs ve co Data covers Query Optimiz. bases co ve rs r fo is _

PUC Computing Systems

...

XOR

QBE is_basis_for

Storage Technique

Conc Control

Figure 1: Master Program in Computing Systems – a LO Conceptual Map In addition, the classes e-Learning Program, Master Program and PhD Program specialize the Program class. In order to give a general idea of the semantic meaning expressed by the LO conceptual map of Figure 1, the following assertions can be made:



IME Computing Systems program comprehends some courses, such as Data Structure and Databases, where the first is prerequisite of the latter;



the Database course covers some topics such as Storage Techniques, Query Languages, Query Optimization, and Concurrency Control; however, Storage Techniques should be taught before Query Optimization, since the first is basis_ for the latter;

2.1.



Query languages cover Relational Algebra and Calculus topics and these fundament SQL learning.



Relational Calculus covers one of the query languages (exclusive OR): QUEL or QBE.

Relationships

Relationships in a conceptual map are also referred to as predicates. They can be classified according to their semantic as aggregation type or domain specific. They place LOs into context by meaningfully associating them with other LOs. 4

Users may choose terms to represent predicates that best represent their semantic in a domain. The computability of such terms is obtained by classifying them according to known types for which equivalence properties (reflexive, symmetric and transitive) have been defined. In the context of e-learning, aggregation type relationships are specially relevant because they express the set of LOs to be visited when covering a certain subject (LO). Thus, predicates of type aggregation appear very frequently in conceptual maps. They present a transitive characteristic that is implicitly explored during query evaluation by the query engine. Predicates may also be defined in a thesaurus that extends the conceptual map with associated terms, synonyms, specific/generic terms, etc. To better illustrate the use of predicates, consider those appearing in the conceptual map of Figure 1: • Aggregation type predicates a. comprehends and its synonyms: has, covers, contains, includes, etc.; • Domain specific type predicates b. is_prerequisite or is preceded _by; c. fundaments and its synonyms: is basis for, is condition for, etc.;

3. The ROSA Data Model In this Section, a data model for the ROSA system is proposed. This data model is characterized by two basic structures: LOs and relationships associating them. The LO structure is composed of a set of metadata attributes in the Education domain and of a Navigational Search Context (NSC). A NSC is a runtime attribute updated during the conceptual map navigation that links a source to a target LO through all the predicates traversed during the navigation. The set of LOs metadata have been extended from the IEEE-LOM standard [IEEE02] and it contains information such as identifier, title, language, contributors, version, etc. In this Section, a Data Model for the ROSA system is proposed. Initially, the structural aspect of the model is specified, followed by an algebra defining a set of valid operations. 3.1.

Structure

Figure 2 presents the LO data model expressed in a UML class diagram. The ComplexResource class is the root of the class hierarchy, providing a common representation for LOs, Relationships and Dynamically created objects, which represent views in the database. LO class derives directly from ComplexResource class. It includes metadata attributes according to the LOM metadata standard. LOs are uniquely identified by a resource identification attribute and are classified according to a type attribute.

5

Complex Resource Relationship Type

LO

Collection

Relationship

LOM attributes

1..n

NSC Relationship Collection

1..n

1..1

Inner Collection

1..n

1..1

LogicalLO PhysicalLO 1..1

collection

Ci List

Bag

Set

min card max card

min card max card

Cj

Hierarchy

Figure 2: The Rosa Data Model The NSC structure of a LOi includes a set of pairs . Each pair represents a partial path navigated from a LO (identified by its id) through a relationship (ri). Hence, concatenated pairs comprehend a complete path arriving at a specific LOi. In fact, a single LO can be retrieved from different navigational paths. Thus a NSC is represented by the following format: {, ..., }. When a LO is directly retrieved from the database repository, the corresponding NSC is empty, represented by the symbol ε. Figure 3 shows an example of LOs association. Considering that LO3 will be retrieved, the set of expression paths in the NSC are: •

NSC = ε if LO3 is retrieved without navigation;



NSC = {}if LO3 is recuperated through LO2;



NSC = {< LO1 ~fundaments~ LO2 ~ is_pre_requ_of >}if LO3 is recuperated via navigation from LO1.

LOs instances represent concepts (Logical LO) and documents (PhysicalLO) that take part in an e-learning repository. LogicalLOs correspond to different levels of aggregations in the domain, such as: Topic, Course and Program in Figure 1. PhysicalLO corresponds to files and their metadata. An instance of PhysicalLO is considered atomic, which means that it

6

cannot be composed by other LOs, whereas LogicalLOs may contain a collection of other LOs associated by aggregation and semantic relationships. The Relationship class contains all verbs employed in a conceptual map. Its instances are classified according to a relationship type that defines common equivalence properties for its instances. As mentioned in Section 2.2, relationship types are classified according to the equivalence properties they present: reflexive, symmetric and transitive. The system explores implicitly such properties when evaluating a query over relationships. As an example, a conceptual map may use the verbs comprehends and cover indistinctly since they are classified as being of the same type, thus providing the same transitive equivalence property. Collection is a class that represents physical collections of objects. A collection may impose a certain order to be followed during a visit to contained LOs. Thus, the model specializes the collection class into: set, bag, list or hierarchy. Set and list represent the corresponding mathematical concepts, whereas bags may contain duplicates in the collection. Hierarchy corresponds to LOs whose contents follow a tree structure. The set and bag classes include attributes to identify maximum and minimum cardinalities. Those attributes introduce restrictions over the collections that specify, respectively, the maximum and minimum number of elements that may be extracted from the collection. In Figure 1, for instance, the exclusive or aggregation relationship between Relational Calculus and QUEL and QBE can be modeled by a set collection

with maximum cardinality 1 and minimum cardinality 1, meaning that at least one and at maximum one of the LOs in the set should be exhibited when presenting Relational Calculus. The representation of sets and bags with unlimited access to its elements is represented by maximum cardinality=unbounded. Associations among LOs, whose cardinality is typically one to many, are modeled as relationship collections, which inherit both from Relationship and Collection classes conceptually define triples t(subject, property, object), extended to nary relationships, where subject and objects are of type LO and property is a relationship. The schema specifies valid associations through triples T(class, relationship type, class). Thus, valid instances of t are in accordance to T. User queries may join LOs from different database collections (or database repository), according to a certain boolean predicate. The Inner collection Class presents the similar concept of tuple in the relational model. It is used to support combination of LOs resulted from a join operation. As an example, considers a query that wishes to retrieve LOs stored at IME repository and at LNCC repository whose author is “Fabio Porto”. For pairs of LOs that agree on the join predicate, an inner collection is formed containing both LOs. Initially, when a LO is retrieved from a repository a single LO inner collection is created to hold it. Subsequent join operations would eventually fill inner collection with joined LOs. fundaments

LO1

LO2

is_prereq_ of

LO3

Figure 3: NSC for LO3 retrieval

7

Finally, relationships implement semantics similar to that of statement in RDF. The data model, nevertheless, differ from RDF data model. Firstly, LOM metadata attributes are specified as properties of the LO class, rather than statements about the LO resource. Hence, attribute values are not identifiable and are not considered first class objects, as in the case of literals in RDF [FHV02]. Aggregation and semantic relationships are modeled as collections of LOs. This is more general than in RDF, since it requires a bag resource and a specific list of predicates to model collections, leading to a complex representation and query evaluation. Moreover, relationship cardinalities in ROSA may restrict the participation of members in collections, which is not available in the RDF statement semantic. When representing, for example, the predicate covers of Figure 4a, which associates Query Language to Relational Algebra and Calculus in RDF, it is not possible to ensure that all the collection members would be visited when reaching the Query Language LO. On the other hand, the proposed model considers the collection members as a whole unit, as shown in Figure 4b, assuring that all its elements are effectively visited. Query Language

covers

rdf:bag

Relational Algebra LO

rdf:type bagid

rdf:_1 rdf:_2

Relational Algebra

Query Language LO

covers

Relational Calculus

4a- Collections in RDF

Relational Calculus LO

4b- Collections in ROSA

Figure 4: Relationship Collections

In addition, ROSA data model represents relationships as part of their instance data. So users may query associations and data indistinctly. This differentiates ROSA from OO data model, where relationships are part of the schema and cannot be queried. 3.2.

The ROSA Algebra

In order to be complete, a data model for LOs requires the specification of valid operations, which is considered the model algebra. The algebra definition makes it possible to express a high-level query language into a formal canonical representation, as well as the conception of optimization strategies. Many algebras have been proposed covering operations in different data models [Cod70,CBB00, W3C02, FHV02, FHP02]. This work presents an algebra, which, in some sense, borrows from all these works. Firstly, the majority of the operators are adapted from their relational algebra counterparts. Path expressions through LO relationships are similar to path expressions in the OO data model [CBB00].

8

In this algebra, operators receive collections of LOs as input and equally produce collections of LOs as output. The closure of the model enables the composition of operators in an algebra expression. The algebra includes operators to explore relationships between LOs as one of the semantic enrichments provided by the proposed model. This is achieved by exploring relationship type equivalence properties within the semantics of navigation operators. Relationships of aggregation type receive special treatment once their transitive property is known to the systems. Operators over domain specific type of relationships consider the equivalence class of its corresponding relationship type, such as: reflexive, symmetric and transitive. As an example, reflexive predicates include, implicitly, the association between a LO and itself through that predicate. Transitive predicates make it possible the evaluation of transitive closure operators and a symmetric predicate enables answering queries independently of the LO role in an association, as subject or object. A list of operators and their semantics are presented next: •

Object functions

Object functions implement object properties access. Three types of object functions are proposed, corresponding to: access to LO attributes, navigating through aggregation and domain specific type of relationships. In the following presentation, a target LO is considered the one the one over which the function is being invoked. gettype() – obtains a literal corresponding to the type of the target LO. getLOM_attributei() – returns a literal corresponding to the value of the attribute named LOM_attributei; semantic() – returns a collection of LOs associated to the target LO through any domain specific type of relation-

ship; semanticrelationshipi() – returns a collection of LOs associated to the target one through the relationship named semanticrelationship i. comprehends() - returns a collection of LOs associated to the target LO by an aggregation type of relationship. property() – returns a collection of relationships associated to target LO has with other LOs. •

Relationship fuctions getId() – invoked on relationship objects, returns the literal corresponding to the value of its id attribute.



Collection Operations

Collection is a generic type over which ROSA algebra operations process. Such processing may require the invocation of object functions over collection members. When these functions return collections, a set-collapse operator groups the individual collections into a single one. As presented in Section 3.1, relationship collections may implement specific ordering behavior through classes: bag, set, list and hierarchy. In operations involving collections of different ordering type, the 9

resulting collection produced by ROSA operations is of type bag, otherwise the collection ordering type is maintained in the output collection. Source (c(expression)) – This operation produces a collection c of LOs from an expression. The expression syntax is

left to be defined by an implementation. It is equivalent to its homonym operation in [NWM+01].

o

getNext ([r1,..]c) – This function obtains the next element in a collection. If applied over a relationship collec-

tion, retrieves the following inner collection. On the other hand, if applied over an inner collection, returns the following LO in that collection. The navigation order is defined by the operation implementation according to the collection ordering type. If the collection has a maximum cardinality n specified, then a set of LOs {r1, r2,.., rk}, where k ≤ n, identifies LOs to be retrieved, otherwise, the first n LOs are retrieved. Projection Π [p1[^p2[^..^[pn]]] ] (c) – This operation receives a collection c of type C of LOs and produces a new

collection with complex resources according to p, which defines the projection behavior. The behavior associated to p modifies the resulting collection structure. The specification of a list of pi defines a conjunction of each pi behavior. Possible projection behaviors are presented next: a)

Structure - Π(a1,a2,...,an) (c) 

if p is a list of LO attributes (i.e. (a1,a2,...,an) ) then the resulting LO structure is limited to the specified attribute list. Attribute values are obtained by invoking the getLOM_attributei() function on each object in c, for ≤ i ≤ n. o

Which are the difficulty levels of the registered Topics? Π

o

(difficultylevel)

(IME.Topic)

What are the formats of PhysicalLOs of type File? Π (format) (IME.File)

b) LO - Π (c) If p is not present, then the resulting collection is a copy of the input collection. o

Which are the registered Programs? Π (IME.Program)

c)

If p is a relationship function, the resulting collection corresponds to the union of all the collections obtained by invoking the respective function on each LO in c.

10

Generic association - Π(semantic()) (c) o

Which are the Topics associated to the Database Course? Π (semantic()) (c) – where c is a collection composed of the Database course LO.

Domain specific relationship - Π (semanticrelationshipi()) (c) o

Which are the prerequisite Courses for the Database Course? Π (prerequisite) (c) – where c is a collection composed of the Database Course LO.

Aggregation relationship - Π (comprehends()) (c) o

Which Topics does the Database Course comprehend? Π (comprehends()) (c) - where c is a collection composed of the Database Course LO.

d) Relationship - Π (property()) (c) if p is the property() function, then the resulting collection will include all complex resources of type Relationship associated to the objects in c. o

Which are the associations established with Program LOs? Π (Property())(IME.Program)

e)

Path expression - Π (semanticrelationshipi().semanticrelationshipn ()) (c)

If p is a path expression, then the resulting collection correspond to the union of the collections obtained by successively applying the relationship functions, starting from LOs in collection c. This operation updates the NSC. o

Which Topics are covered by the Database Course and are the basis for other Topics? Π (comprehends().isbasisfor()) (c) – where c is an unitary collection containing the DatabaseCourse LO.

Selection σ

(δai)

(c) – restricts the set of LOs (source) in c according to a boolean expressions δai (logical predi-

cates) specified against LO metadata or relationship verbs. When δ includes a path expression, then the boolean expression is evaluated against LOs (target) in the collection obtained following the path. In this case, source LOs are selected as long as at least one target LOs returns true in δai. Relationship equivalence properties extend the scope of LOs to be evaluated. When searching for target LOs through relationships, its semantic is taking into consideration. As an example, reflexive relationships include subject LO in the collection of LOs to be evaluated.

11

a)

Selection on metadata attribute σ (δai) (c)

For each object in c, evaluate the expression δ on LOM attribute ai, including in the resulting collection those evaluating as true. o

Which Topics are classified with level ‘b’? Π (σ (difficultylevel= ”b”) LO)

b) Selection based on a relationships. For each LO in a collection, invokes the corresponding function, obtain the value of attributei and evaluate the δ expression over it. For those evaluated as true, return the subject LO. based on a generic domain specific type of relationship σ (δsemantic().attributei()) (c). o

Which are the LOs associated to the Database Course? Π (σ ((semantic().id=‘Database’) (IME.Course) )

based on a domain specific relationship σ (δsemanticrelationshipi ().attributei) (c)

Which are the LOs that fundament the Topic “SQL”? Π (σ ((fundaments().id=‘SQl’) (IME.Topic) ) based on relationship identification σ (δproperty ()) (c)

For each LO in c, evaluate the existence of an association with a specific id. If it evaluates as true, return the LO to the resulting collection. o

Which are the fundamental LOs? Π (σ ((property()=‘fundaments’) (IME.LO) )

based on a path expression σ (δ(semanticrelationshipi ()..semanticrelationshipn ().ai) (c)

For each LO in c, obtain the target collection by iteratively invoking the semantic functions in the path expression and evaluating a boolean expression over attribute ai. o

Which Topics created by “Ana Maria Moura” are covered by the Database Course and are the basis for other Topics? Πσ

(δ(comprehends().isbasisfor().author=’Ana Maria Moura’)

(c) – where c is a unitary collection containing the

DatabaseCourse LO. Transitive Closure ϕ (relationship [iteration max], up|down) (c) – Obtains a collection by recursively invoking the referred re-

lationship function over LOs, starting in collection c. Relationships must qualify as transitive property. Aggregation

12

relationship type is transitive by definition. Other semantic relationships must have transitive property set. The transitive closure iteration ends when either there are no more paths to follow or, no more LOs are produced, or an iteration limit, such as iteration max, is reached. The path from a LO may follow a down direction (default), where relationships are followed from LOs playing the subject role towards LOs playing object roles, or the opposite direction up, conversely. This operation updates the NSC. a)

Transitive closure (down and up) o

Which Topics do the Program MS in Computing Systems comprehend?

Π ϕ (comprehends())(c) – where c is a unitary collection containing the LO MS in Computing Systems. o

Which Topics do fundament SQL?

Π ϕ (fundaments(), up)(c) – where c is a unitary collection containing the SQL LO. Union (c1 ∪ c2) - The resulting collection contains the distinct LOs in collections c1 and c2. The LO equality op-

eration considers the identification attribute value for duplicate elimination. o

Which Topics and Courses were created by ‘Ana Maria Moura’?

Π (σ(autor=”Ana Maria Moura”) (Topic) ∪ σ(autor=”Ana Maria Moura”) (Course)) Join

(c1)p(c2)- The resulting collection includes the LOs in collections c1 and c2 whose attribute values match

the predicate in p. The join operation produces inner collections containing joined LOs, so that pairs can be identified apart in a collection. LOs in a inner collection keep a metadata attribute identifying their original collection, so future operations can be evaluated over LOs of a specific collection. o

Produce LOs from collections Topic and Course that agree on their authors?

Π ( (t Topic) (t.author=c.author) (c Course) )

4. The Query Language Ad-Hoc data base queries are directly submitted to the ROSA system through ROSAQL query language. The system also includes a QBE like environment for submitting queries following a user-friendly interface. In the next subsection the ROSAQL query language is briefly presented. 4.1.

The ROSAQL return [[p1[^p2[^..^[pn]]] ] [C] (c)] [from Ei(Ci) [v] [in δ ai] [, Ej(Cj)] [on δ] [where {(δai)Ci, δ (collection-function),

13

δ (p1^p2^...^δpn)}] [start at [Ci with δpi] through [relationship, . ] {up,down} [until limit]] [order by {LOM attribute } {asc,desc] [union] a) return – specifies the complex resource structure to be produced. Its behavior is defined by the pi term ac-

cording to the algebra. In the absence of the from clause, a collection c may be specified as a source for the database collection. b) from – specifies the input collections to be operated on by the query. A collection is specified by a pair

(schema, class) where schema (E) identifies a database and class c is the collection class. A list of n terms in this clause should be followed by n-1 join predicates between the collections. An instance variable v may be defined to iterate over a collection, in which case a selection may restrict the collection elements. c)

where – specifies a selection predicate over collections involved in the query. Selections may be specified

over LOs metadata attributes, or over relationship identifications. The user may also write path-expressions to navigate through a list of relationships and define a terminal collection over which a predicate would operate. d) Start – specifies the transitive closure operation. The input collection may be specified in the from clause in

conjunction with the where clause or directly in the start clause. The semantic relationship defined to guide the inference must have the transitive property set, unless a dot ‘.’ is specified, in which case an aggregation relationship is assumed. The inference stops when one of the events occurs: no more relationships to follow; no new LOs are obtained; a limit of recursion is achieved. The resulting collection contains LOs obtained from each recursive navigation. The clauses, up and down, indicate the path navigation direction to be followed. A down clause processes relationships from a subject LO to an object LO, whereas an up clause works in the opposite direction. 4.2.

Some Query Examples

(a)

Which Topics are covered by the Database Course and are the basis for other Topics? return comprehends().isbasisfor() Topic (IME.Course) c where c.id=’Database’

(b)

Which Topics do the Program MS in Computing Systems comprehend? return Topic from IME.Program p in p.id= ‘MS in Computing Systems’ start at comprehends() Which Topics and Courses were created by ‘Ana Maria Moura’? return Topic where author= ‘Ana Maria Moura’ union return Course where author= ‘Ana Maria Moura’

(c)

14

5. Implementation Issues ROSA is being implemented as a semantic query processing system based on ROSA data model. The prototype extends the framework QEEF [APM03], designed to support new data processing requirements. This Section, briefly introduces QEEF and discusses the extensions required to adapt it to ROSA data model. 5.1.

QEEF (Query Execution Engine Framework)

The objective of QEEF is to specify a framework that abstracts common aspects of query execution engines that take part in most of know implementations, as well as identify those that depend on specific requirements of application data model. QEEF identifies five main components of a query engine that must be implemented for a new application: logical operators, physical operators and control operators, tuple data structure and data sources. Logical operators correspond to the implementation of algebraic counterparts; physical operators correspond to data access operations; control operations provide for flow control between operators and data source correspond to data providers. Developers of a new query engine identify the components that need to be implemented or borrowed from other implementations. In order to implement ROSA using QEEF, a major concern was to model a tuple for representing LO structure and its relationships and the implementation for logical operators of ROSA algebra. 5.2.

ROSA extension in QEEF

This prototype considers an XML representation for the storage of LO and its associations. Basically, a data source collection corresponds to the root of the XML tree, followed by LOM metadata and associations. LOM metadata data is represented in a subtree and each predicate leaving a LO is represented by an element with predicate cardinality represented as attributes. The ids of LOs taking part in a predicate collection are specified as child elements. 5.2.1.

A LO tuple

Once an internal representation for objects in the data model has been conceived, it is possible to specify an adequate mapping for a tuple data structure, which is the basis for specifying the implementation of logical operations. As presented in other works[JKC+02], the representation of XML data into a tuple data structure is considerably more complex than in the Relational model. This is mainly as a result of the variable structure of an XML document, with missing or duplicate elements, which could produce tuples of different formats and sizes. As a result, this implementation considers the concepts of pattern and witness trees as presented in [JKC+02]. Pattern trees are produced from references to data elements in the user query. It can be seem as a function q(d):p, that maps a document d by a query q into a pattern tree p, with the following properties: o

The root element of d is directly mapped into the root element of p;

o

Each reference for a data element in q has its path included into p; 15

o

If a reference to an element occurs as part of a final projection, then all referenced element subtree is also included into p, in the same relative position;

A resulting tree r is produced by mapping the pattern tree p against an input document d. A mapping may produce many r trees, as is the case of LOs with many authors (repeating elements), in which case, each occurrence of the author ele-

ment produces a new tuple. Having a model for representing LO tuples, it was possible to conceive an efficient data structure for holding tuples and accessing their values. The main concern was to have a structure where one could conduct a direct access to element values, without having to traverse the whole XML hierarchy. Again, the intuition was to use access patterns defined by user queries as the basis for an efficient design. As an example, if a selection predicate is specified against a LO title attribute (LOM/general/title – LOM attribute hierarchy example), then the corresponding selection operation requires direct access to the title element. As such, the conceived tuple data structure is composed of two parts. The first part uses java MapTree data structure to implement a B+tree based access pattern to nodes in the r tree. The access keys for the MapTree correspond to the ids associated to the r tree nodes, traversed in pre-order, and assigned to them during the tree construction. A value is stored associated to each leaf node in the MapTree. For LOM attributes, this corresponds to the respective attribute value, whereas for associations (predicates), this value includes: cardinality attributes, and an array with ids of LOs taking part in that collection. Observe that the MapTree provides both direct access to elements according to their ids and a indexed sequential access for retrieving a whole subtree of LOs. The second data structure offers direct association between a referenced element path and its corresponding id in the MapTree data structure. This is obtained through a java Hashtable collection, having access paths as keys and ids in the r tree as objects. As a result of these two data structures, access to data during logical operation evaluation is standardized even having a variable XML representation of a LO. Access to each element node in the r tree is achieved by first getting its id in the Hashtable, using the access path and probing the MapTree using the id. Finally, a tuple stores the content of inner collections (refer to Section 3.1). When a join combines pairs of LOs they are physically store in a java Vector data structure. 5.2.2.

Implementing algebra operations

Selection and Projection operations evaluate by matching their attributes and path expressions against data stored in tuples. Projections and Selections over LOM attributes, simply access parts of the tree structure that match the attribute list. This operation is facilitated by the tuple structure that stores LOM paths in operation arguments as access keys to data (see Section 5.2.1).

16

When navigation is required than a special caching mechanism is used so that the same LO does not have to be reloaded every time a reference is done to it. Is interesting to observe that relationships on ROSA are mostly self-referencing, in the sense that both, source and target LOs, are of the same collection. As a result, to efficiently implement navigation a caching table indicates if a certain loaded LO has being evaluated by the query, once it might have been loaded as a function of a path expression reference. 5.2.3.

Thesaurus support

Since ROSA data model and query language have been presented, we are now able to describe how a query is processed by the system, taking into account a domain thesaurus support. In order to describe this processing, consider the query below, followed by its representation in ROSAQL: a) Which LOs does the topic “Query-By-Example” cover? ROSAQL (Q1): Return (IME.LO)l Where l.comprehends().id= ‘Query-By-Example’

During execution, the following steps are performed: Step1: Q1 is submitted to the system; Step2: Plan Generator transforms Q1 into a QEP with operations in ROSA algebra: Π (σ ((comprehends().id=‘Query-By-Example’) (IME.Topic) ) Step3: Query engine evaluates the plan and returns results to user;

Now suppose the user is not satisfied with the LOs retrieved by Q1, and he wants to continue his search using a thesaurus support, from whom synonymous and associated terms can be derived. Figure 5 presents some terms of a thesaurus, designed in the context of the database domain. In ROSA, the user selects a thesaurus to support queries in a certain conceptual map. The thesaurus is also modeled according to ROSA data model, where its association types are represented through domain specific relationships. The thesaurus is stored in Tamino using the same schema as the one used for storing LOs conceptual maps, allowing queries on thesaurus terms and associations to be processed uniformly within the system. Step 4: the user chooses a reference term for his search, selected from RS1 or Q1. This corresponds to rephrasing his

initial query with synonymous or associated terms, which are associated to the reference term. Step 5: the new request is received by the query processing module that submits a query to obtain the synonymous and

the associated terms from the thesaurus. In our example, the reference term “Query By Example” corresponds to “QBE”, as it is preceded by the association USE (or preferred term), and its associated term (AT) refers to “Relational Database”, as illustrated in the thesaurus (Figure 5); Step 6: the query re-writing module rewrites Q1 with the terms obtained from the thesaurus search. Hence,

Q2: Return (IME.LO)l 17

Where l.comprehends().id= ‘QBE’ Q2 is transformed into ROSA Algebra and then resubmitted to the system. The same procedure is followed for the associated terms; Step 7: after executing step 3 the user can still continue filtering his initial query Q1 as previously, or by visualiz-

ing the thesaurus in its systematic representation from a term selected from Q1 or RS1. Step 8: the systematic thesaurus is presented to the user, who eventually re-submits the query using one of its terms.

Languages Programming Languages OO Programming Languages C++ JAVA Relational Query Languages SQL QBE QUEL .......

(a) Systematic View

Programming Languages BT Languages ST OO Programming Languages QBE BT Query Languages UF Query By Example AT Relational Database Query Languages BT Languages AT Database Query By example USE QBE BT: broader term ST: specific term …….. AT: associated term UF: use for (b) Alphabetic View Figure 5: Terms of a Database Thesaurus

6. Related Work The research on algebra languages has been very intense since Codd [Cod70] presented the relational algebra. The majority of the algebras proposed since then adapted the basic relational operators set to new data models, in addition to proposing new operators to deal with special aspects of the data model. Research on e-learning data models is still in its infancy and, to the best of our knowledge, no previous work has proposed a complete algebra for query languages in this domain. A very interesting project, however, is EDUTELLA [NWQ+02]. Edutella is a RDF infra-structure for P2P-networks aiming at supporting educational resources sharing between institutions. Resources and their relationships are represented through a data model, based on the Datalog nonprocedural query language. It expresses a RDF model as logical predicates in DataLog, which forms a knowledge base. Queries and results are expressed in accordance to the Datalog data model. Comparing to ROSA, EDUTELLA is built with the intention of integrating peers collections of educational material following a metadata standard, such as IEEE LOM. Its data model reflects a more generic view of LOs associations, whereas in ROSA, our intention is to conceive a data model that will give support to a flexible and meaningful LO conceptual model. In addition, the ROSA data model builds on the established success of relational and object-oriented data models.

18

In respect to works on algebras for the Semantic Web, RAL is an algebra for querying resources described through RDF [FHV02]. RAL models RDF as a finite set of triples composed of resources associated through properties, which form a directed labeled graph. The model integrates RDF and RDFS definitions. Nodes in the graph are either resources or literals, both labeled by unique identifiers. The algebra operators receive a collection of nodes and process then with adapted relational operators, which include: projection, selection, cartesian product, join, union, difference, intersection and loop operators that iterate over collection nodes. The operations navigate through the RDF graph nodes using uniformly the projection operation to access object values from resources, playing the role of subjects in triples. ROSA data model differs from RAL in many aspects. Firstly, the model accepts attribute definition for LOM metadata attributes, whose literal values are not identifiable. In addition, although conceptually forming triples, the data model represents relationships as one-to-many associations that take part in the LO structure. Moreover, associations in ROSA may be restricted by maximum and minimum cardinality definitions, and properties qualify associations as reflexive, symmetric and transitive. These aspects extend RDF data model towards a more semantic model. In the context of the algebra, ROSA includes path expressions and transitive closure operations that provide inferences on the intentional data model. Another important contribution in our work concerns the query processing supported by a domain thesaurus, not contemplated in other similar works.

7.

Conclusion

Nowadays, Distance Education (DE) is a rapidly expanding mode of education. Many courses traditionally taught in a classroom environment by a teacher, are now being transformed into an electronic sibling, in order to be adapted to this new education methodology. Learning objects have been used as a promising solution to develop standard infrastructures enabling tutors to create and organize their didactic material. This paper presented ROSA, a data model and a query language to access LOs, based on their semantic properties. The system includes a powerful data model whose algebra is completed described. A collection of LOs is the main processing unit in the model, in which LOs are described according to IEEE LOM metadata attributes and LO relationships. The proposed data model extends RDF by: simplifying the representation of attributes, which simplifies query formulation; considering relationship equivalence properties, implicitly used during query evaluation; modeling relationships as n-ary ordered collections and restricting LO access in a collection through maximum and minimum cardinalities. Applications interface with ROSA through ROSAQL, a query language adapted from SQL, OQL and RQL to support ROSA algebra operations. A first protoype of ROSA has been implemented using Tamino native XML DBMS Tamino 4.0 [TAM] as a repository of ROSA objects according to ROSA data model. The QEEF, query execution engine framework, was extended to support ROSA algebra and data structures. 19

The current prototype interfaces with Tamino by submitting direct XQuery queries. A domain thesaurus supports user queries, guided by user interaction. We are now working on implementing the plan generation module and the mapping between ROSA algebra expressions and the Tamino XQuery query language. These modules will give the ROSA system a flexible and efficient queryprocessing component. There is a vast spectrum of research issues to be addressed in the near future. The algebra can be extended to support inferences and analysis on the database schema. Equivalence rules and heuristics must be conceived in order to enable queries optimization.

References [APM03]

Ayres, F.M., Porto, F., Melo, R., Na Extensible Machine for Supporting New Query Execution Models (In Portuguese), XVIII Brazillian Database Symposium, Manaus, Oct., 2003.

[ADL02]

Advanced Distributed Learning Sharable content Object Reference Model Version 1.2 – The Scorm Overview, http://www.adlnet.org.

[Ber00]

Berners-Lee, T.,The Semantic Web - XML2000. http://www.w3.org/2000/Talks/1206xml2k-tbl/Overview.html.. The British Council, Educação a Distância, 2001 available at www.britishcouncilpt.org/education/distance.htm. Codd, E.F., A relational model of data for large shared data banks, Comm. Of the ACM, 13(6), June 1970, pp. 377-387. Catell, R.G.G., Barry, D.K., Berler, M., Eastman, J., Jordan, D., Russell, C., Olaf, S., Stanienda, T., and Velez, F., editors. Object Data Standard ODMG 3.0. Morgan Kaufman, January 2000. Francincar,F., Houben G., Vdovjak R., RAL: an Algebra for Querying RDF, 3rd Conf. on Web Information Systems Engineering, Singapore,12-14 Dec. 2002. Francincar,F., Houben G., Pau C., XAL: an Algebra for XML Query Optimization, in Database Technologies 2002, 13th Australasian Database Conference, Volume 5 of Conferences in Research and Practice in Information Technology, pp. 49-56. Austraolian Computer Society Inc., 2002. Friesen N., What are Educational Objects?, Interactive Learning

[BC01] [Cod70] [CBB00] [FHV02] [FHP02]

[Fri01]

Environments, Vol. 9, No. 3, Dec. 2001.

http://ltsc.ieee.org/wg12/, last access 31/01/2004 Jagadish, H.V., Al-Khalifa, S., Chapman, A., et al., TIMBER: A native XML database, The VLDB Journal, V(11), N(4), Dec., 2002. [NWQ+02] Nejdl, W., Wolf, B., Qu, C., et all, “EDUTELLA: A P2P Networking Infraestruture Based on RDF, 11th Intl. World Wide Web Conference, Honolulu, Hawaii, USA, May, 2002. [NWM+01] Naughton, J., DeWitt, D., Maier, D. et all, The Niagara Internet Query System, IEEE Data Engineering Bulletin 24(2): 27-33 (2001). Resource Description Framework (RDF) Model and Syntax Specification. [RDF] [IEEE02] [JKC+02]

[TAM] [W3C01] [W3C02]

1999. http://www.w3.org/TR/PR-rdf-syntax/, 1999. Last access Oct. 2001. Tamino, www.softwareag.com/tamino, last access 31/01/2004

W3C (World Wide Web Consortium), XQuery 1.0: An XML Query Language, Working Draft, May 2003, http://www.w3.org/TR/xquery/, last access, June 2003. W3C (World Wide Web Consortium), The XML Query Algebra, Working Draft, May 2000,http://www.w3c.org/TR/2000/WD-queryalgebra-20001204, last access, June 2003.

20

Suggest Documents