Knowledge Management for Collaborative Learning

Knowledge Management for Collaborative Learning Ismail M. Bhana, David Johnson Reading University, UK

Key words: Knowledge Management, Semantic Web, RDF, Peer-to-Peer, Collaboration, e-Learning Abstract: The power of e-Learning lies in the flexibility and control that learners are able to exercise over the learning process. Collaborative learning systems allow individuals and groups to manage the progression of teaching and learning in a way that fits the abilities and skills of participants. By empowering learners with the ability to direct their learning, supported by their peers in a collaborative setting, learning becomes more effective and enjoyable. The term ‘knowledge management’ refers to the process of continually gathering, filtering, and distributing, sharing and analyzing information. knowledge management will be highly beneficial in Collaborative Learning Systems and Learning Management Systems as it facilitates the dissemination and gathering of relevant information and allows learners to build on the knowledge and experience gained by their peers. In this paper we present a knowledge management framework for collaborative learning that builds on a scalable Peer-to-Peer infrastructure based on our own collaborative middleware.

1 Introduction Knowledge flows from a variety of sources and the popularity of the World Wide Web has seen the volume of information available for consumption on web-based media streams increase dramatically. Currently, knowledge capture and exchange occurs through the disparate (sometimes unconnected) streams of corporate or individual websites, forums, blogs, newsgroups, wikis, documents, email, IRC and Internet chat. Powerful web search engines such as Google provide the limited capability of searching multiple unconnected streams of web-based content but finding meaningful answers to one’s questions often requires that we search multiple streams, read or post forum messages, or perhaps consult an expert via email or instant messaging. For the most part, the mass of information created is lacking semantics, HTML pages or other media streams. Capturing and extending knowledge in these sources in a compact, accessible and searchable way is a growing problem, particularly in e-learning. Using semantics through metadata descriptions enables us to link, interrelate, and extend knowledge and unify relevant resources that might normally be unconnected, examples of which may include:

Annotation of documents – it is fairly common in collaborative workflows for participants to comment, amend, and edit the work of others. Annotations and comments relating to a resource or document that can be identified uniquely (by a URN, for example) can be automatically linked and related to relevant searches.

Responses to questions – Web forums traditionally provide an arena to discuss issues and ask questions related to some topic but are inadequate as they do not directly relate to learning material in a meaningful or structure way. In an e-learning context, where the time of teachers and experts is limited, learners may be supported by their peers. The ability to leverage the experiences, knowledge and expertise gained by others is desirable in an e-learning system.

References to related material – references to related material can be provided in a way that is meaningful. Instead of using keywords for relevance, relevance can be related by individual or combined properties of a resource, such as author, subject, or publisher.

The purpose of this research is to develop a knowledge management framework for e-learning applications that supports collaboration between learners, enabling them to self-organise, share tasks, workloads, and content, and interact across multiple different computing platforms and mobile devices enabling them to:

Form groups with shared interests and goals - groups may be used to represent any community of users, from schools to universities and companies, or used to partition users according to some shared interest, such as clubs or discussion forums, and consist of arbitrary, dynamic and transient membership.

Share of arbitrary content and documents between users – shared information, resources and content are important aspects of collaborative work that allow users to exchange knowledge pertinent to some task or goal coupled with the ability to annotate and extend the work of others.

There is also a desire to achieve interoperability between modular components. Building interoperability into the knowledge management architecture provides scope for third-party elearning technologies and applications to integrate with and transparently use knowledge management services. Using open standards, such as XML, SOAP [8] and WSDL [7] allows us to design such systems with such interoperability requirements in mind. Our approach builds on our own Peer-to-Peer computing middleware called Coco and our rationale behind this design decision is presented in the following sections of this paper. The JXTA CMS [1], the Edutella project [2], and the Hausheer and Stiller [3] are related research efforts that attempt to approach the problem of knowledge representation, dissemination and search in a P2P context.

2 Knowledge Representation Knowledge, in this context, consists of the set of resource descriptions available for consumption by users. Resources range from files and documents to services and computing resources and include things that do not generally have an opaque representation, such people (represented, for example, using a VCard [4]). This formulation has much in common with the Resource Description Format (RDF) [5] and it is, in fact, RDF that is used in our implementation as the language for resource description. Like XML, RDF is intended for situations in which information needs to be processed by applications. RDF incorporates standard mechanisms for representing semantics and facilitates interoperability between separate metadata sets whilst enabling disciplinary

communities to define their own vocabularies (e.g. Dublin Core [11]). RDF was initially intended for representing metadata about Web resources but by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified, even if the subject itself cannot be directly retrieved over the Web. RDF uniquely references Web resources using Web identifiers (such as a URL), and describes a resource in terms of properties and property values (where properties are unambiguously identified using namespaces), as shown below: Computer Aided Learning A Fictional Paper on Computer Aided Learning InFiction Productions, Inc. AFPL Ghostscript 8.0 D:20040507154801 application/pdf Victor Von Doom, PhD.

To describe non-Web resources in a P2P context they must be represented using a unique naming scheme such as Universal Resource Names (URNs). In the above example, the resource is a document named A Fictional Paper on Computer Aided Learning and is represented by a content identifier (CID) generated using a cryptographic digest. Whilst it is normal to uniquely reference an RDF resource using a URL, there may be many situations in a replicated P2P network in which a given resource is duplicated across many network nodes or devices and hence a location dependent representation would be inadequate. The given representation allows a resource to be referenced without a priori knowledge of its location within the P2P network. Taking the above example further, we are also able to extend the existing descriptions and statements about a resource with additional meaningful data, such as comments or references to related material. This therefore provides the basis for a powerful framework for representing knowledge in the system, as shown below: David Johnson The

3 Knowledge Processing Knowledge processing is handled through the process of searching the network for matching and relevant queries. Currently, there are three methods of searching the content system for

metadata related to resources – simple keyword search, property-based search, and a querybased search that builds search queries based on Boolean operators. Most users are, of course, familiar with keyword searching from their experience using web search engines such as Google. However, users are also aware of the obvious weaknesses of this approach particularly in relation to finding relevant learning materials on the web. Introducing semantics into web searching (via RDF and metadata) allows for much more meaningful acquisition of relevant resources. Learners are now able to find papers ‘published’ by Einstein rather than pages that simply contain ‘Einstein’ as a keyword. The resulting RDF data model can be queried using XML query languages such as XPATH or RDF query languages such as RDQL which enables us to build more powerful query strings. For instance, we might want to search for papers published by Einstein on the subject of fluid dynamics. Our system builds strongly on the use and generation of metadata. Metadata is processed locally by peers where it is cached and manipulated according to local as well as group requirements. Peers are responsible for purging stale resources in order to keep (replicated) content current. Metadata is cached by peers in the network to ensure high availability and high performance of search queries. Peers are also able to link and relate resources, merge resources, or split them according to requirements. Primarily, however, the peer is responsible for servicing the queries of other peers and communicating metadata with them in order to generate meaningful results from searches.

4 Knowledge Dissemination Our rationale for designing a P2P system is based on the desire to achieve scalability, enabling a collaborative system to scale with a dynamically changing user base. Our goal was to support self-organisation by developing systems and network services that support the organisation of individuals into groups with shared interests, enabling users to form dynamic collaborations. As a starting point, our model builds on the general lifecycle depicted in figure 1. This model places collaboration at the heart of the process of knowledge management. The figure illustrates that knowledge management is a continual process of content creation, collaboration, and dissemination.

Figure 1: The knowledge management lifecycle The system is designed around a P2P network middleware called JXTA [6] that is responsible for routing messages, discovery and monitoring of peers. JXTA is a set of protocols that standardise the ways in which peers discover resources and services, organise into peer

groups, communicate, and monitor each other, and is designed to support interoperability through homogenous peer connectivity in heterogeneous networks by hiding the underlying complexity of the network via multiple layers of abstraction (figure 2). There is enormous flexibility afforded by the JXTA protocol layer in terms of network interoperability. JXTA protocols are platform independent and are designed from the standpoint of resilience and scalability. The capabilities of peers in the network may range greatly, from simple resourcelimited devices to network servers and supercomputers.

Figure 2: Layers of Abstraction The network can be viewed as a hybrid P2P system built above the JXTA topology depicted above, as illustrated by figure 3. The network consists of self-regulating regions called Provider Networks that will typically represent some sort of real world enterprise, such as a university, local education authority, company, or organisation, but may be any arbitrary collection of peers with shared interests. The provider networks act as a trusted region (or a secure domain of trust) where peers are not limited to the same network, they may be geographically dispersed, or behind firewalls or routers, or may be mobile devices.

Web Portal

Portal

Contract

Portal

Firewall

Firewall

P2P Provider Network

P2P Provider Network

Web Portal

Figure 3: Accessing Live P2P content via the Web Whilst peers are able to interact freely in a standard P2P manner, each provider network consists of at least one peer (the portal peer) that acts as a gateway and portal into the provider network for external peers. It is this peer that enables internal peers to interact with external peers and assists in the process of authentication and access control. The portal peer is also able to act as a web proxy to the P2P network residing within the institution, enabling users to search and retrieve content over the Web (using the company or university website, for instance) without requiring them to install the relevant P2P software. This content is live, meaning that the state of the network is continually changing as peers appear and disappear at will. The system also enables agreements to be formed between provider networks supporting (in future) logging and reporting. For networks to communicate in such a way it is important to define a set of interoperability standards – standard ways of representing resources as well as invoking operations on remote peers in the network. The system is deployed as a P2P Web service using open protocols such as WSDL and SOAP. Search queries are submitted using Web service invocations. Using open standards such as SOAP provides us with flexibility as it abstracts away the service interface from the underlying implementation. Content can therefore be searched and retrieved over the P2P network as well as over the Web as a standard Web service would allow – where the individual peer acts as a web server and is able to tunnel requests through the P2P network. An XML metadata repository (using Xindice [9]) is used to store and retrieve metadata. RDF descriptions are simply added to a database of resources and can retrieved later using XPath [10]. Metadata is normally created according to some form of formal schema, such as Dublin Core.

5 Knowledge Use The Coco system was designed primarily around e-learning requirements and with use in academia as well as corporate training in mind. Extending traditional popular P2P concepts (such as file sharing, messaging, and collaboration) with added elements of security and accountability that enable academics to exchange papers, share knowledge, comment, annotate and collaboratively generate and publish searchable content. As such, the KnowledgeSurge application, refer to figure 5, is a conceptually simple prototype application that has been developed to achieve these goals. It is still in an early development cycle but users are able to join provider groups in which they can search networks for content specific to a domain – for instance, we have predefined domains for commercial enterprises, entertainment, education and academia, and non-profit organisations. These groups are intended to be broadly equivalent to Internet domains (such as com, edu, and org).

Figure 5: The KnowledgeSurge Application Interface KnowledgeSurge displays networks of providers that are dynamically detected in a list on the left of the screen, such as Reading University, as illustrated. Users are then able to query a given network (or the set of selected networks in the list) using the query bar at the top of the screen (or a custom query builder in the menus). The metadata results are returned reformatted and displayed using HTML. By following links, users are able to query the metadata repositories further to obtain annotations, version histories and issue tracking information, links and references, or attempt to obtain the resource itself.

6 Conclusion One of the key challenges of knowledge management is in the representation of data. Data needs to be compact, extensible, and easily manipulated by automated algorithms. RDF goes further and the basis of RDF's strength as a knowledge-management tool is that it allows you to organize, interrelate, classify, and annotate knowledge, whilst supporting interoperability through XML, increasing the overall value of stored information. In addition, RDF provides a

powerful language to express the semantics of data and it is likely to become a common tool for knowledge management in collaborative learning systems. Whilst the P2P approach has a number of advantages in terms of scalability the side-effect is that it inevitably increases the complexity of the design of our systems and makes it more difficult to achieve many of the things that traditional client-server based systems do well. For instance, strong consistency is required to ensure that all elements of a distributed RDF data model are always accessible. Workflow is another area that can be complicated with a decentralised model – it requires flexible organisational models that can be easily customised, which in turn rely on security and access control mechanisms.

References: [1] Project JXTA, CMS; http://cms.jxta.org/servlets/ProjectHome [2] EDUTELLA: A P2P Networking Infrastructure Based on RDF; http://edutella.jxta.org/ [3] Hausheer, D., Stiller, B., Design of a Distributed P2P-based Content Management

Middleware; In Proceedings 29th Euromicro Conference, IEEE Computer Society Press, Antalya, Turkey, September 1-6, (2003) [4] VCard Overview; http://www.imc.org/pdi/vcardoverview.html [5] Resource Description Framework (RDF), W3C Semantic Web; http://www.w3.org/RDF/ [6] Gong, L: Project JXTA: A Technology Overview, Palo Alto, CA, USA: Sun Microsystems, Inc., 2002. http://www.jxta.org/project/www/docs/jxtaview_01nov02.pdf. [7] Web Services Description Language (WSDL) 1.1, http://www.w3.org/TR/wsdl [8] W3C SOAP Specification; http://www.w3.org/TR/soap/ [9] Apache Xindice; Apache Software Foundation, http://xml.apache.org/xindice/ [10] XML Path Language (XPath), W3C XML Path Language (XPath); Version 1.0, http://www.w3.org/TR/xpath [11] Dublin Core Metadata Initiative (DCMI), Interoperable online metadata standards; http://dublincore.org/

Author(s): Ismail M. Bhana The University of Reading, ACET Group School of Systems Engineering Reading, Berkshire, RG6 6AY United Kingdom [email protected] David Johnson The University of Reading, ACET Group School of Systems Engineering Reading, Berkshire, RG6 6AY United Kingdom [email protected]