The Massive User Modelling System (MUMS) - Semantic Scholar

The Massive User Modelling System (MUMS) Christopher Brooks1, Mike Winter1, Jim Greer2, and Gordon McCalla2 1 Department

of Computer Science, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada {cab938, mfw127}@mail.usask.ca 2 Department of Computer Science, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada {greer, mccalla}@cs.usask.ca

Abstract. Effective distributed user modelling in intelligent tutoring systems requires the integration of both pedagogical and domain applications. This integration is difficult, and often requires rebuilding applications for the specific elearning environment that has been deployed. This paper puts forth both an architecture and an implementation prototype for achieving this integration. It focuses on providing platform and language neutral access to services, without commitment to any particular ontology.

1. Introduction A recent trend within intelligent tutoring systems and related educational technologies research is to move away from monolithic tutors that deal with individual learners, and instead favour “adaptive learning communities” that provide a related variety of collaborative learning services for multiple learners [9]. An urgent challenge facing this new breed of tutoring systems is the need for precise and timely coordination that facilitates effective adaptation in all constituent components. In addition to supporting collaboration between the native parts of a tutoring system, an effective intercomponent communication system is required to provide the ability to know of and react to learner actions in external applications. For example, consider the kinds of errors a student encounters when trying to solve a Java programming problem. If the errors are syntactical, a tutor may find it useful to intervene directly within the development environment that student is using. If the errors are related to the higher level course concepts, the tutor may instead find it useful to dynamically assemble and deliver external resources (learning objects) to the student. Finally, if an appropriate solution can not be found that helps the student to resolve their errors, the tutor may find it useful to refer the user to a domain expert or peer who has had success at similar tasks. To provide this level of adaptation, the tutor must be able to form a coherent model of students as they work with different domain applications. The tutor must be able to collect, understand, and respond to user modelling “events” in both real time and on an archival basis. These needs can be partially addressed by integrating intelligent tutoring system functionality within larger web-based e-learning systems including

2 Christopher Brooks1, Mike Winter1, Jim Greer2, and Gordon McCalla2

learning management systems such as WebCT [28] and Blackboard [3] or e-learning portals like uPortal [26]. These applications provide an array of functionality meant to directly support learning activities including social communication, learner management, and content delivery functions. An inherent problem with these e-learning systems is that they are often unable to capture interaction between a learner and other applications the learner may be using to complete their learning task. While a potential solution to this problem is to integrate all possible external applications that may be used by the student within an e-learning system, this task is difficult at best due to proprietary API’s and e-learning system homogeneity. In [27] we proposed a method of integrating various e-learning applications using a multi-agent architecture, where each application was represented by an agent that negotiated with other agents to provide information about learners using the system. A learner using the system was then able to see portions of this information by interacting with a personal agent, who represented the tutor of the system. In this system, the tutor’s sole job was to match learners with one another based on learner preferences and competencies. This system was useful at a conceptual level, but suffered from the drawbacks of being difficult to implement and hard to scale-up. The integration of agent features (in particular reasoning and negotiation) within every application instance required high computational power forcing the user into a more centralized computing environment. To further provide the performance and reliability required, agents had to be carefully crafted using a proprietary protocol for communication. This hindered both agent interoperability and system extensibility. This paper presents a framework and prototype specifically aimed at supporting the process of collecting and disseminating user information to software components interested in forming user models. This framework uses both semantic web and web service technologies to encourage interoperability and extensibility at both the semantic and the syntactic levels. The organization of this paper is as follows: Section 2 describes the framework at a conceptual level. Section 3 follows with an outline of the environment we are using to prototype the system, with a particular emphasis on the integration of our modelling framework with the legacy e-learning applications we are trying to support. Section 4 contrasts our work with similar work in the semantic web community. Finally, Section 5 concludes with a look at future goals.

2. The MUMS Framework We present the Massive User Modelling System (MUMS) framework, which is inspired by traditional event systems such as CORBA [20] and JINI [25]. The principle artefact within MUMS is the modelling opinion being expressed. We adopt the definition of an opinion as a temporally grounded codification of a fact about a set of users from the perspective of a given event producer. Opinions are passed between three independent entities in the framework: 1. Evidence Producers: observe user interaction with an application and publish opinions about the user. These opinions can range from direct observations of the interaction that has taken place, to beliefs about the user’s knowledge, desires, and in-

The Massive User Modelling System (MUMS) 3

tentions. While the opinions created can be of any size, the focus is on creating brief contextualized statements about a user, as opposed to fully modelling the user. 2. Modellers: are interested in acting on opinions about the user, usually by reasoning over these to create a user model. The modeller then interacts with the user (or the other aspects of the system, such as learning materials) to provide adaptation. Modellers may be interested in modelling more than one user, and may receive opinions from more than one producer. Further, modellers may be situated and perform purpose-based user modelling by restricting the set of opinions they are interested in receiving. 3. Broker: acts as an intermediary between producers and modellers. The broker receives opinions from producers and routes them to interested modellers. Modellers communicate with the broker using either a publish/subscribe model or a query/response model. While the broker is a logically centralized component, different MUMS implementations may find it useful to distribute and specialize the services being provided for scalability reasons. While the definition of an opinion centers on human users, it does not restrict the producer from describing other entities and relationships of interest. For instance, an evidence producer embedded within an integrated software development environment might not just express information about the particular compile-time errors a student receives, but may also include the context of the student’s history for this programming session, as well as some indication of how the tutor should provide treatment for the problem. The definition also allows for evidence producers to have disagreeing opinions about users, and for the opinion of a producer can change over time. This three-entity system purposefully supports the notion of active learner modelling [17]. In the active learning modelling philosophy, the focus is on creating a learner model situated for a given purpose, as opposed to creating a complete model of the learner. This form of modelling tends to be less intensive than traditional user modelling techniques, and focuses on the just-in-time creation and delivery of models instead of the storage and retrieval of models. The MUMS architecture supports this by providing both a stream-based publish/subscribe and an archival query/response method of obtaining opinions from a broker. Both of these modes of event delivery require that modellers provide a semantic query for the opinions they are interested in, as opposed to the more traditional event system notions of channel subscription and producer subscription. This approach decouples the producers of information from the consumers of information, and leads to a more easily adaptable system where new producers and modellers can be added in an as-needed fashion. The stream-based method of retrieving opinions allows modellers to provide just-in-time reasoning, while the archival method allows for more resource-intensive user modelling to occur. All opinions transferred within the MUMS system include a timestamp indicating when they were generated, allowing modellers to build up more complete or historical user models using the asynchronous querying capabilities provided by the broker. By applying the adaptor pattern [8] to the system, a fourth entity of interest can be derived, namely the filter.


4. Filters: act as broker, modeller, and producer of opinions. By registering for and reasoning over opinions from producers, a filter can create higher level opinions. This offloads the amount of work done by a modeller to form a user model, but maintains the more flexible decentralized environment. Filters can be chained together to provide any amount of value-added reasoning that is desired. Finally, filters can be specialized within a particular instance of the MUMS framework by providing domain specific rules that govern the registration of, processing of, and creation of opinions. Interactions between the entities are shown in Fig. 1. Some set of evidence producers publish opinions based on observations with the user to a given broker. The broker routes these opinions to interested parties (in this case, both a filter and the modeller towards the top of the diagram). The filter reasons over the opinions, forms derivative statements, and publishes these new opinions back to the broker and any modellers registered with the filter. Lastly, modellers interested in retrieving archival statements about the user can do so by querying any entity which stores these opinions (in this example, the second modeller queries the broker instead of registering for real time opinion notification).

Fig. 1. A logical view of the MUMS architecture

The benefits of this architecture are numerous. First, the removal of reasoning and negotiation abilities from the producers of opinions greatly decreases the complexity when creating new producer types. Instead of being rebuilt from scratch with user modelling in mind, existing applications (be they applications explicitly meant to support the learning process, or domain-specific applications) can be easily extended and added to the system. Second, the decoupling between the producers and the mod-


ellers serves to increase both the performance and the extensibility of the system. By adding more physical brokers to store and route messages, a greater number of producers or modellers can be supported. This allows for a truly distributed system, where modelling is done on different physical machines throughout the network. Third, the semantic querying and decoupling between modellers and producers allows for the dynamic addition of arbitrary numbers of both types of application to the MUMS system. Once these entities have joined in the system, their participation can increase the expressiveness of the user models created, without requiring modifications to existing producers and modellers. Finally, the logical centralization the broker allows for the setting of administration policies, such as privacy rules and the maintenance of data integrity, through the addition of filters. All of these benefits address key challenges for adaptive learning systems. These systems must allow for the integration of both existing domain applications as well as learning management specific applications. This integration must be able to take place with a minimal amount of effort to accommodate the various stakeholders within an institution (e.g. administrators, researchers, instructional designers), and must be able to be centrally managed to provide for privacy of user data. Last, the system must be able to scale not just to the size of a single classroom, but to the needs of a whole department or institution.

3. Implementation Prototype The MUMS architecture is currently being prototyped within the distributed elearning environment in the Department of Computer Science at the University of Saskatchewan. This environment has been created over a number of years and involves applications from a variety of different research projects. While initially aimed at garnering research data, these applications are all currently deployed in a support fashion within some of our computer science courses. There are four main applications: 1. A content delivery system, which deploys IMS content packaging [15] formatted learning objects to students using a web browser. 2. A web based forum discussion system built around the notions of peer help (I-Help Public Discussions [11]). 3. An instant messaging and chat application (I-Help Instant Messenger). 4. A quizzing system which deploys IMS QTILite [14] formatted quizzes and records evaluations of students We are currently in the process of adding new applications to this list. These applications include development environments, e-learning portals, and web browsers. Each of these systems contribute to and benefit from models of the user and hence require a flexible user modelling environment. To accommodate this, we have implemented the MUMS architecture with three clear goals in mind: scalability, interoperability and extensibility. The technical solutions we are using to achieve these goals will be addressed in turn.


3.1 Interoperability With the goal of distributing the system to as many domain specific applications as is necessary, interoperability is a key concern. To this end, all opinion publishing from producers is done using our implementation of the Web Services Events (WS-Events) [5] infrastructure specification. This infrastructure defines a set of data types and rules for passing events using web services. These rules include administrative information about the producer or modeller (e.g. contact information, quality of service, etc), a payload that contains the semantics of the opinion, and information on managing advertising and subscriptions. Using this infrastructure helps to protect entities from future changes in the way opinion distribution is handled. Further, modellers can either subscribe to events using WS-Events (publish/subscribe), or can query the broker directly using standard web service technologies (query/response). This allows for both the real-time delivery of new modelling information, as well as the ability to access archived information from the past in a manner independent of platform and programming language. We enhance semantic interoperability by expressing the payload of each event using the Resource Description Framework (RDF) [16]. This language provides a naturally extensible and ontology-neutral method for describing modeling information in a format that is easily computer readable. It has become the lingua franca of the semantic web, and a number of toolkits (notably, Jena [13] and Drive [24]) have arisen to make RDF graph manipulation easier. When registering for events, modellers provide patterns to match using the RDF Data Query Language (RDQL) [23]. Finally, design time interoperability is achieved by maintaining a separate ontology database which authors can inspect when creating new system components. This encourages the reuse of previously deployed ontologies, while maintaining the flexibility of opinion routing independent of ontology.

3.2 Extensibility Besides the natural extensibility afforded by the use of the RDF as a payload format, the MUMS architecture provides for distributed reasoning through the use of filters. In general, a filter is a component that masquerades as any combination of producer, modeller, or broker of events. There are at least two specialized instances of a filter: 1. Reasoners: register or query for events with the goal of being able to produce higher level derivative events. For instance, one might create a reasoner to listen for events related to message sending from the web-based discussion and instant messenger producers, and then create new opinions which indicate the changing social structure amongst peers in the class. 2. Blockers: are placed between producers and modellers with the goal of modifying or restricting events that are published. Privacy filters are an example of a blocker. These filters can anonymize events or require that a modeller provide special authentication privileges when subscribing.


While the system components in our current implementation follow a clear separation between those that are producers and consumers of information, we expect most future components will aim at value adding the network by reasoning over data sources before producing opinions. Thus we imagine that the majority of the network will be made up of reasoner filters chained together with a few blockers to implement administrative policies.

3.3 Scalability Early lessons learned from testing the implementation prototype indicated that there are two main factors involved in slowing down the propagation of opinions: 1. Message serialization: The deserialization of SOAP messages into native data types is an expensive process. This process is especially important to the broker, which shares a many-to-one relationship with producers. 2. Subscription evaluation: Evaluating RDF models against a RDQL query is a time consuming operation. This operation grows with the complexity of the models, the complexity of the query, and the number of queries (number of modeller registrations) that a broker has. To counteract this, the MUMS architecture can be extended to include the notion of domain brokers. A domain broker is a broker that is ontology aware, and can provide enhanced quality of service because of this awareness. This quality of service usually comes in the form of more efficient model storage, and thus faster query resolution. Further, brokers are free to provide alternative transport mechanisms which may lead to faster data transfers (e.g. a binary protocol which compresses RDF messages could be used for mobile clients with error-prone connections, while a UDP protocol describing RDF using N-Triples [10] could be used to provide for the more real-time delivery of events). The use of domain brokers can be combined with reasoners and blockers to meet the performance, management, and expressability requirements of the system. Finally, the architectural notion of a broker as a centralized entity is a logical notion only. Physically we distribute the load of the broker amongst a small cluster of machines connected to a single data store to maintain integrity. An overview of the prototype, including the technologies in use, is presented in Fig. 2. Evidence producers are written in a variety of languages, including a Java producer for the course delivery system, a C# producer for the public discussion system and a C++ producer (in the works) for the Mozilla web browser. The broker is realized through a cluster of Tomcat web application servers running an Apache Axis application which manage subscriptions and semantic routing. This application uses a PostreSQL database to store both subscription and archival information. Subscriptions are stored as a tuple indicating the RDQL pattern that should be matched, and the URL at which the modeller can be contacted. At this moment there is one Java-based modeller which graphically displays aggregate student information for instructors from the I-Help public forums. Besides a description of student posting frequency, this modeller displays statistics for a whole forum, as well as a graphical picture of


student interaction. In addition there are two other applications under development including a pedagogical content planner and a peer help matchmaker.

Fig. 2. Prototype implementation of the MUMS architecture

4. Related Work While inspired by the needs for distributed intelligent tutoring systems, we see this work overlapping three distinct fields of computer science; distributed computing, the semantic web, and learner modelling. Related research in each of these fields will be addressed in turn. The distributed systems field is a mature field that has provided a catalyst for much of our work. Both general and specific kinds of event systems are described throughout the literature, and a number of mature specifications, such as the Java RMI and the CORBA, exist. Unlike MUMS, these event systems require the consumers of events (modellers) to subscribe to events (opinions) based on the expected event producer or the channel (subject) the events will arrive on. This increases the coupling between entities in the system, requiring that either the consumer is aware of a given producer, or that they share a strict messaging ontology. In [4], Carzaniga et al. describe a model for content-based addressing and routing at the network level. We build upon this model by applying similar principles in the application layer, allowing the modellers of opinions to register for those opinions which match some semantic pattern. This allows for the ad hoc creation and removal of both evidence producers and modellers within the system. While the semantic web as a research area has been growing quickly for a number of years, the focus of this area has been on creating formalisms for knowledge man-


agement representation. The general approach with sharing data over the semantic web is to consider it just “an extension of the current web” [2], and to follow a query/response communication model. Thus, a fair amount of work has been done in conjunction with database research to produce efficient mechanisms for storing (e.g. [22] [12]) and querying data (e.g. [23]), but new methods for transmitting this data have largely been unexplored. For instance, the HP Joseki project [1] and the Nokia URI Query Agent Model [19]provide methods for publishing, updating, and retrieving RDF data models using HTTP. This approach is useful for large centralized models where data transfer uses more resources than data querying; however, it provides poor support for the real-time delivery of modeling information. Further, it supports the notion of a single model per user which is formed through consensus between producers, as opposed to the more lightweight situated user modeling suggested by active modeling researchers. We instead provide a method which completely decouples producers from one another, and offload the work in forming user modellers to the consumers of opinions. The work done by Nejdl et al. in [18] and Dolog and Nejdl in [7] and [6] marries the idea of the semantic web with learner modelling. In these works the authors describe a network of learning materials set up in a peer-to-peer fashion. Resources are described in RDF using both general pedagogical metadata (in particular the IEEE Learning Object Metadata specification) and learner specific metadata (such as the IMS LIPS or PAPI). The network is searchable by end-users through the use of personal learning assistants who can query peers in the network for learning resource metadata, then filter the results based on a user model. While this architecture distributes the responsibility for user modeling, it also limits communication to the query/response model. Thus, personal learning agents must continually query data sources to discover new information about the student they are modelling. In addition, by arranging data sources in a peer network the system loses its ability to effectively centrally control these sources. For instance, an institution would need to control all of the peers in the network to provide for data integrity or privacy over the data being shared.

5. Conclusions and Future Work As cited by Picard et al., the ITS working group of 1995 described tutoring systems as: “…hand-crafted, monolithic, standalone applications. They are time-consuming and costly to design, implement, and deploy. Each development teams must redevelop all of the component functionalities needed. Because these components are so hard and costly to build, few tutors of realistic depth and breadth ever get built, and even fewer ever get tested on real students.” [21] Despite research invested into providing agent based architectures for tutoring systems, tutors remain largely centralized in deployment. These tutors are generally domain specific, and are unable to easily interface with the various legacy applica-


tions that students may be using to augment their learning. When such interfacing is available, it comes with a high cost to designers, as integration requires both a shared ontology to describe what the student has done, as well as considerable low level software integration work. MUMS provides an alternative architecture where producers can be readily associated with legacy applications and where modellers and reasoners can readily produce useful learner modelling information. This paper has presented both a framework and a prototype to support the just-intime production and delivery of user modelling information. It provides a general architecture for e-learning applications to share user data, as well as details on a specific implementation for this architecture, which builds on technologies being used within the web services and semantic web communities. It provides an approach to student modelling that is platform, language, and ontology independent. Further, this approach allows for both the just-in-time delivery of modelling information, as well as the archival and retrieval of past modelling opinions. Our immediate future work involves further integration of domain specific applications within this framework. We will use this new domain specific information to provide for more accurate resource suggestions to the learner, including both the acquisition of learning objects from learning object repositories as well as expertise location through peer matchmaking. Tangential to this, we are interested in pursuing the use of user defined filters through personal learning agents. These agents can act as a “front-end” for the learner to have input over the control and dissemination rights of their learner information. Finally, we are examining the issue of design time interoperability through ontology sharing using the Web Ontology Language (OWL). Acknowledgements. We would like to thank the reviewers for their valuable recommendations. This work has been conducted with support from a grant funded by the Natural Science and Engineering Research Council of Canada (NSERC) for the Learning Object Repositories Network (LORNET).

References 1. 2. 3. 4. 5.

6.

7.

Joseki - The Jena RDF Server. Available online at http://www.joseki.org/. Last accessed March 22, 2004. Berners-Lee, T., Hendler, J., and Lassila, O., The Semantic Web Scientific American, May, 2001. Scientific American, Inc. Blackboard Inc. blackboard. Available online at http://www.blackboard.com/. Last accessed March 22, 2004. Carzanigay, A., Rosenblumz, D. S., and Wolfy, A. L. Content-Based Addressing and Routing: A General Model and its Application. In Technical Report CU-CS-902-00 . Catania, N., et al. Web Services Events (WS-Events) Version 2.0. Available online at http://devresource.hp.com/drc/specifications/wsmf/WS-Events.pdf. Last accessed March 22, 2004. Dolog, P. and Nejdl, W. Challenges and Benefits of the Semantic Web for User Modelling. In Workshop on Adaptive Hypermedia and Adaptive Web-Based Systems 2003, Held at WWW 2003. Dolog, P. and Nejdl, W. Personalisation in Elena: How to cope with personalisation in distributed eLearning Networks. In International Conference on Worldwide Coherent


8. 9. 10.

11.

12. 13.

14. 15. 16.

17. 18.

19. 20. 21.

22.

23. 24. 25. 26. 27.

28.

Workforce, Satisfied Users - New Services For Scientific Information. Oldenburg, Germany. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (eds) Design Patterns, 1st edition. Addison-Wesley, 1995. Gaudioso, E. and Boticario, J. G. Towards Web-based Adaptive Learning Communities. In Artificial Intelligence in Education 2003. Grant, J. and Beckett, D. RDF Test Cases. Available online at http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/. Last accessed March 22, 2004. Greer, J., McCalla, G., Vassileva, J., Deters, R., Bull, S., and Kettel, L. Lessons Learned in Deploying a Multi-Agent Learning Support System: The I-Help Experience. In Artificial Intelligence in Education 2001. San Antonio, TX, USA. Harris, S. and Gibbins, N. 3store: Efficient Bulk RDF Storage. In Workshop on Semantic Web Storage and Retrieval 2003. Vrije Universiteit, Amsterdam, Netherlands. Hewlett-Packard Development Company, L.P. Jena 2 - A Semantic Web Framework. Available online at http://www.hpl.hp.com/semweb/jena.htm. Last accessed March 22, 2004. IMS Global Learning Consortium Inc. IMS Question & Test Interoperability Lite Specification, Version 1.2. 2002. IMS Global Learning Consortium Inc. IMS Content Packaging Specification version 1.1.3. 2003. Klyne, G. and Carroll, J. J. Resource Description Framework (RDF): Concepts and Abstract Syntax. Available online at http://www.w3.org/TR/2004/REC-rdf-concepts20040210/. Last accessed March 22, 2004 . McCalla, G., Vassileva, J., Greer, J., and Bull, S. Active Learner Modelling. Nejdl , W., Wolpers, M., Siberski, W., Schmitz, C., Schlosser, M., Brunkhorst, I., and Lser, A. Super-peer-based routing and clustering strategies for rdf-based peer-to-peer networks. In 12th International World Wide Web Conference. Budapest, Hungary. Nokia. URIQA: The Nokia URI Query Agent Model. Available online at http://sw.nokia.com/uriqa/URIQA.html. Last accessed March 22, 2004. Object Management Group. Common Object Request Broker Architecture (CORBA/IIOP). Picard, R. W., Kort, B., and Reilly, R. Affective Learning Companion Project Summary: Exploring the Role of Emotion in Propelling the SMET Learning Process. Available online at http://affect.media.mit.edu/AC_research/lc/nsf1.html. Reggiori, A., van Gulik, D.-W., and Bjelogrlic, Z. Indexing and retrieving Semantic Web resources: the RDFStore model. In Workshop on Semantic Web Storage and Retrieval 2003. Vrije Universiteit, Amsterdam, Netherlands. Seaborne, Andy. RDQL - A Query Language for RDF: W3C Member Submission. Singh, R. Drive - An RDF Parser for the .NET Platform. Available online at http://www.driverdf.org/. Last accessed March 22, 2004. Sun Microsystems, Inc. Jini Technology Core Platform Specification. uPortal. uPortal by JA-SIG. Available online at http://uportal.org/. Last accessed March 22, 2004. Vassileva, J., McCalla, G., and Greer, J. Multi-Agent Multi-User Modeling in I-Help. User Modeling and User-Adapted Interaction: Special Issue on User Modelling and Intelligent Agents, 13(1):1-31, 2002 WebCT. WebCT.com. Available online at http://www.webct.com/. Last accessed March 22, 2004.