Evolving semantic web with social navigation - CiteSeerX

7 downloads 0 Views 555KB Size Report
In this paper, we introduce our system KAPUST1.2 (Keeper And Processor of User ... car advertisements or online motor shops, can greatly ben- efit from someone .... comes across a web page p1, which has been visited by stu- dent X but it ...
Expert Systems with Applications Expert Systems with Applications 32 (2007) 265–276 www.elsevier.com/locate/eswa

Evolving semantic web with social navigation Ghassan Beydoun a

a,*

, Roman Kultchitsky b, Grace Manasseh

b

School of Information Systems and Economics, University of Wollongong, Australia b Department of Computer Science, American University of Beirut, Beirut

Abstract The Semantic Web (SW) is a meta-web built on the existing WWW to facilitate its access. SW expresses and exploits dependencies between web pages to yield focused search results. Manual annotation of web pages towards building a SW is hindered by at least two user dependent factors: users do not agree on an annotation standard, which can be used to extricate their pages inter-dependencies; and they are simply too lazy to use, undertake and maintain annotation of pages. In this paper, we present an alternative to exploit web pages dependencies: as users surf the net, they create a virtual surfing trail which can be shared with other users, this parallels social navigation for knowledge. We capture and use these trails to allow subsequent intelligent search of the web. People surfing the net with different interests and objectives do not leave similar and mutually beneficial trails. However, individuals in a given interest group produce trails that are of interest to the whole group. Moreover, special interest groups will be higher motivated than casual users to rate utility of pages they browse. In this paper, we introduce our system KAPUST1.2 (Keeper And Processor of User Surfing Trails). It captures user trails as they search the internet. It constructs a semantic web structure from the trails. The semantic web structure is expressed as a conceptual lattice guiding future searches. KAPUST is deployed as an E-learning software for an undergraduate class. First results indicated that indeed it is possible to process surfing trails into useful knowledge structures which can later be used to produce intelligent searching. Ó 2006 Published by Elsevier Ltd. Keywords: Cooperative systems; Intelligent searching; Semantic web; E-learning application; Interactive knowledge acquisition; Formal concept analysis application; Machine learning application

1. Introduction Social navigation (Dieberger, 1997) is an aspect in our daily life and a very efficient social mechanism for acquiring knowledge; examples are: asking for directions on the road, consulting family members for advice, calling a doctor when feeling ill, or meeting with colleagues at university to discuss a research topic. The essence of social navigation is that people keep track of experiences and contexts regardless of whether or not those contexts/experiences have been useful to a past goal of theirs. Harnessing social interactions to information retrieval on net enables consulting and benefiting from other people experiences. For *

Corresponding author. E-mail addresses: [email protected], ghassan.beydoun@unsw. edu.au (G. Beydoun), [email protected] (R. Kultchitsky). 0957-4174/$ - see front matter Ó 2006 Published by Elsevier Ltd. doi:10.1016/j.eswa.2005.11.035

example, a shopper looking to buy a new car, searching car advertisements or online motor shops, can greatly benefit from someone else’s car shopping experience, even though they both might be looking for cars with different characteristics. A person who has just bought a car has lots of useful advice to a prospective buyer, e.g. the former can at least point out models and their expected prices. If the prospective buyer could go online and see the pages that other people have found useful, it would make his/her search a lot easier. This social navigation inspires our approach, described in this paper, towards building a semantic web incrementally and in a distributed and collective way. In our view, web-surfing experience of people is beneficiary and should be stored and reused, this requires a user friendly exposure of the experience that all people can understand. We represent surfing experiences as surfing

266

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

trails. These are virtual surfing trails created by users as they surf the net and these are typically lost current browsers. We introduce our system, KAPUST1.2 (Keeper And Processor of User Surfing Trails). It captures user trails and it organizes the trails according to the browsing topic. KAPUST then processes intersections of trails into a knowledge base (a conceptual lattice). This later allows intelligent search of the web. We advocate building a SW incrementally created by the users themselves, where no intermediate expert is needed, and ontologies are not predetermined. Users provide their topic of interest, within their interest group, and begin browsing web pages. Submitted topics of interest and trails left behind by users form the raw information to constitute the SW structure, and determine how it evolves. We process this raw information using machine learning. Our approach is inspired by the work of Wexelblat and Maes at the MIT Media Lab, which highlighted the importance of tracking user trails and interactions to benefit from interaction history for information navigation (Wexelblat, 1999). We expand that idea by applying Formal Concept Analysis (FCA) (Ganter & Wille, 1999) to reason about the traces instead of only displaying and browsing trails as in Wexelblat (1999). Our system KAPUST1.2 collects and stores user session naming and their trails using a web browser plug-in interface, this as well connects the browser and the reasoning component of KAPUST1.2. The reasoning component has a retrieval knowledge base (the actual SW) which integrates users knowledge scattered in their left-behind surfing traces. We tested our approach for E-learning in a class environment in an undergraduate course at the Political Science and Public Administration Department at the American University of Beirut, PSPA289: Information Technology and Public Administration. Our results in that domain illustrate the utility of user trails to allow intelligent web searching. This paper is organized as follows: Section 2 overviews related work to our system KAPUST, Section 3 describes the architecture of KAPUST. It details steps involved starting from the interface, which collects user traces, to creating knowledge out of the traces and making use of this knowledge in later searching. Section 4 describes the distributed deployment of KAPUST. Section 5 describes experiments and results of using KAPUST for E-learning in an actual university environment. Section 6 discusses results and concludes with future plans for the work in emarkets. 2. Related work and our approach The SW enables automated intelligent services such as information brokers, search agents, information filters etc. The first step to realize the SW is making data smarter using languages such as XML, XAL (Lacher & Decker, 2002) and HTML-A (Benjamins, Fensel, Decker, & Perez, 1999). A further step is creating ontologies to guarantee interoperability between ‘‘smart data’’ and to allow infer-

ence over for the ‘‘smart data’’. Towards this, many technologies were developed for example: OIL (Ontology Inference Layer) (Davies, Fensel, & Harmelen, 2003), DAML (Darpa Agent Markup Language) (Hendler, 2001), Haystack (Quan, Huynh, & Karger, 2003) or OntoPad (Benjamins et al., 1999). Other tools to manage collaborative ontologies sharing and creation were also developed such as OntoEdit (Sure et al., 2002) for combining methodology-based ontology development with capabilities for collaboration, or Annotea for a web based shared annotation for RDF documents (Kahan, Koivune, Prud’Hommeaux, & Swick, 2001). Regardless of whether the tool is collaborative or not, manual marking up is still an essential component of current methods and tools to build the SW. Marking up is a cumbersome process that might be accompanied by errors and is time consuming. In this paper, we explore a novel approach using a machine learning technique, Formal Concept Analysis (FCA) (Ganter & Wille, 1999) and only requiring users to name their browsing session—rather than high load annotation of pages. Pages visited are only labeled as good or bad hits (with an optional weight). This imposes very little effort on users. Our system described in this paper is in fact an example of an interaction system. It captures users behaviors and stores them for analysis and reasoning. In capturing user traces, it is similar to Laus (2001) and to Wexelblat (1999) and Wexelblat and Maes (1999) which store interactions history on a user basis. Interactions systems differ in the assumptions they make about users and what kind of interactions is logged and used. How users interact while browsing the web behavior is affected by several factors related to the user himself, the tool being used and the domain under study. The kind of information of interest to us in this paper is the internet parallel of social navigation, which is human interactions in pursuit of information gathering (Dieberger, 1997). In our work, we use user trails to model unintended and indirect social navigation over the web: users are not intentionally helping each other (e.g. following footsteps in a forest) and they do not directly communicate (Forsberg, Hook, & Svensson, 2001). In the process we build a complex information space (the SW), where we analyze traces left by users, using a Formal Concept Analysis (Ganter & Wille, 1999) algorithm. This has been found efficient when applied to document retrieval systems for domain specific browsing (Kim & Compton, 2001). Our approach is similar to Footprints (Wexelblat, 1999; Wexelblat & Maes, 1999), where a theory of interaction history was developed and a series of tools were built to allow navigation of history rich pages and to contextualize web pages during browsing. However, unlike our approach, it does not use history to make recommendations and nor does it have a reasoning technique embedded in it. Following Forseberg’s (Forsberg et al., 2001) characterization of design issues for any electronic social navigation system, our system KAPUST adheres to the following four: First is integration: KAPUST is integrated in an internet browser. Second is

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

presence of other users, which is a necessity for KAPUST; otherwise, social navigation is not applicable. Third is trust of the source of information, this varies in importance according to the domain of study, KAPUST has an authentication step which can become mandatory if required. Fourth and last is privacy of the advice giver, this is handled during the deployment process of KAPUST. Some more useful properties used by Wexelblat and Maes (1999) to generally characterize the problem space of interaction history system are also characteristic of KAPUST: 1. Proxemic versus dystemic: A proxemic space is one that the users feel to be transparent, where they do not have to put extra effort to understand the signs and structures used. Conversely, a nonproxemic (i.e. dystemic) spaces are opaque—KAPUST interface is simple to use. A user familiar with an internet browser can immediately use it. It is therefore said to be providing a proxemic space (i.e. it is easy to learn how to use it because its interface is familiar). 2. Active versus passive recording of the interaction history (from the point of view of the user). In KAPUST, the recording is passive. 3. Rate/Form of change of visited data as interaction history information builds up. KAPUST currently assumes that the URLs do not change or become obsolete after they are visited. 4. Degree of permeation between the history information and the object it represents. History information can be attached too tightly to the object by being embedded in it, or stored as a separate document. In KAPUST’s database, the traces are stored without the web pages themselves; only URLs to web pages are stored. 5. The kind of information collected which depends on the domain the users are interested in and what they are trying to accomplish. Our KAPUST’s architecture is domain independent. We later use it in an E-learning environment in an undergraduate political science class. KAPUST has a dictionary module to prohibit entering invalid words (see Section 3.1 later). In the next section, we present details of our approach in KAPUST, we illustrate its architecture and describe technical details. 3. Constructing the semantic web with KAPUST1.2 It is of high interest to learn from colleagues of the same community because of common interests and aims. Our

Student X Session name = “IT, E-Learning”

Web Page 1

267

system KAPUST1.2 (see Fig. 2) converts surfing traces of users in the same community into a Semantic Web. For example a group of students sharing an assignment problem usually discuss the assignment topic meeting face to face every day at university. Using KAPUST1.2, it will be as if those students are discussing their thoughts online so that not only the students of their class will benefit from, but also students who will take the same class in following semesters. User traces are stored as a sequence of URL of pages that users visit in a browsing session when a user from particular interest group is searching for a specialized topic. For example, in E-learning, users are students who provide one or more keywords to identify their search domain at the beginning of each session (Fig. 1). Their trails consist of sequence of URL annotated by the session title word(s) entered at the beginning. Web page addresses and session title keywords are the building blocks for our SW. Initially entered words are checked against dictionary of existing set of keywords in the database. This minimizes the redundancy of keywords (e.g. synonyms) and corrects any syntactical errors by users. The evolved SW structure gives authenticated users recommendations in the form of categorized web page links, based on session keywords. In addition, they can browse any notes added previously by authorized users, who can also rate a page relevancy as Poor, Not Bad, Good, or Excellent. Page notes and their ratings provide another level of knowledge sharing between users. As user trails are accumulated, browsing sessions begin to intersect one another to create new knowledge. For example, while a student A searches web course notes for pages about the ‘‘Public Sector’’, she comes across a web page p1, which has been visited by student X but it was related to ‘‘IT, E-learning’’ in his session. This creates new knowledge relating ‘‘IT, E-learning’’ and ‘‘Public Sector’’ due to intersection in the corresponding trails. 3.1. KAPUST1.2 user interface KAPUST architecture (Fig. 2) has two components: an extensive interactive part (visible to the user), and a reasoning/knowledge creating part (invisible to the user). In this section, the visible user interface is detailed (the reasoning component is detailed in the next section). KAPUST user interface main role is gathering user trails and providing feedback for the users from the SW constructed. It is implemented as a browser plug-in. It has these three modules:

Web Page 2

...

Web Page n

Fig. 1. User trail: A student X searching for web pages related to IT and E-learning: he logs in, enters session title, and browses for related articles. Web pages 1 to n will be given under any session with title keywords IT and E-learning.

268

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276 Open Annotation

Explorer Bar Login / Logout Login Screen: get user ID, session name

Dictionary

Enter Session Keywords

| Authenticate

Add Notes and Ranks

Check dictionary for similar name

U User Trails

Display Notes and Ranks

No Suggestions

Valid

View Recommended

Query concept

Found Suggest

Propose suggestions

Inference Engine Initialise Trace

Export Utility

Trigger Lattice Update

FCA Conceptual Lattice

Delta & Full XML Files

User browses and marks good pages

Log out to exit or start a new session

FCA Algorith

Reporting Module

Fig. 2. KAPUST1.2 architecture.

1. The login/logout module (Fig. 3) Tracing does not occur unless, the user is validated by this module (see Fig. 4 for the lifecycle of a user session). The user then identifies the session by a name containing one or more keyword word. For example, in testing our tool in an E-learning

Fig. 4. Life cycle of a user’s session.

environment, students are expected to identify the topic and the question, as they do their research assignment in any browsing session. A browsing session is delimited by a login and a logout. Use of session identifiers during reasoning is detailed in Section 3.2. Fig. 5 displays a flowchart of these steps. 2. The ‘‘View Recommended Pages’’ module shows search results as lists of rated and categorized links. Section 3.2 describes later how addresses of these web page are

Fig. 3. Login module (left frame of the window).

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276 Prerequisite: User has opened the explorer bar and started a browsing session

269

In our Semantic Web, search ontologies are evolved from the keywords that users enter to name their browsing sessions at the start. These keywords relate to users search domains. How these naming keywords are transformed into a semantic web is described in the next section.

A page is visited

Take URL, display and update any info

3.2. From user trails to Semantic Web Show a page info summary and a link to any note

On-click on a link the note appears

User adds a note

Display the cumulative rating

User rates page

Fig. 5. Flow chart for visiting a web page.

retrieved. To visit any of them, users may click on their links. This is where users benefit from traces of other users in their community. In our E-learning example (detailed later), students get to exploit views of each other. They might find their quests in those links, or they can go ahead and continue searching through other pages and contribute to the traces database for the benefits of future users. 3. The Dictionary module minimizes syntactic mistakes in session keywords and detects synonymous is applied. The SOUNDEX function is a built-in utility in the Microsoft SQL Server 2000. SOUNDEX converts an alpha string to a four-character code to find similarsounding words or names. The first character of the code is the first character of character expression and the second through fourth characters of the code are numbers. Vowels in character expression are ignored unless they are the first letter of the string. The DIFFERENCE function compares the difference of the SOUNDEX pattern results and returns a number between 1 and 4. Number 4 represents the least possible difference between the two strings. We use this number to determine if a similar string exists (see Table 1). For example, applying SOUNDEX on McDonnell gives the code M-235. Suppose a keyword ‘IT projects’ exists in the database in the list of keywords and the user starts a new search session and enters any of ‘IT’, ‘IT-projects’ or even ‘IT-projects’. He would get a suggestion telling him that a similar keyword ‘IT projects’ already exists. He can then either use the suggested keyword or ignore it.

KAPUST turns user traces into structured knowledge, in the form of a conceptual lattice, using FCA reasoning. This involves two steps: a matrix table is constructed showing keywords that each page satisfies, a conceptual lattice is then assembled from the matrix table, as detailed next. 3.2.1. Formal Concept Analysis (FCA) FCA is a mathematical theory (Ganter & Wille, 1999) modeling concepts in terms of lattice theory. FCA starts with a context K = (G; M; I), where G is a set whose elements are called objects, M is a set whose elements are called attributes, and I is a binary relation between G and M [(g; m) 2 I is read ‘‘object g has attribute m’’]. Formalized concepts reflect a relation between objects and attributes. A formal concept, C, of a formal context (G; M; I) is a pair (A, B) where A  G is the set of objects (extent of C) and B  M is the set of attributes (intent of C). The set of all formal concepts of a context K together with the order relation 6 is a complete lattice, F (G, M, I): For each subset of concepts there is always a unique greatest common subconcept and a unique least common superconcept. In KAPUST, web page URLs form G, the set of objects. Keywords of session names form M, the set of attributes. A concept in the resulting conceptual lattice is formed of a set of page URLs as the extents and a set of keywords as the intents. Concepts can be a result of either a single user session or multiple sessions that intersect each other. Fig. 6 displays an example of three different user-sessions that share some common web pages in their trails. For example, ‘‘WebPage1’’ is visited by users A and C having different keywords identifying their session. This indicates the creation of a new concept in the lattice as a result of the intersection between their sessions. The new concept will have ‘‘WebPage1’’ as its page set and ‘‘IT, Political Science, Technology’’ as its keyword set. The

User A Session = “IT”

Table 1 SOUNDEX coding guide: same numbers are used for similar sounding characters The number

Represents the letters

1 2 3 4 5 6

B, P, F, V C, S, K, G, J, Q, X, Z D, T L M, N R

User B Session = “IT Projects, Public Sector”

User C Session = “Political Science, Technology”

WebPage 1

WebPage 2

WebPage 4

WebPage 2

WebPage 5

WebPage 4

WebPage 3

WebPage 1 WebPage 6

Fig. 6. Example of user trails.

270

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

concept having ‘‘WebPage1’’, ‘‘WebPage2’’, ‘‘WebPage3’’ as its page set and ‘‘IT’’ as its keyword set, is an example of a concept resulting from a single user, A, session. Set of names and web page URLs are extracted from user traces and input to the FCA engine as XML documents. Traces are stored in a database together with the relations that exist between each web page URL and its session name. At this stage the input to the FCA engine contains all user traces collected so far. As more traces are collected, they are incrementally fed to the FCA engine to update the existing matrix table with any new web page URLs and keywords (Fig. 7). On receiving the initial matrix table and every time the FCA engine is updated with new traces, it reconstructs the conceptual lattice. This is computationally expensive and may take few minutes to complete. We update the matrix table and the lattice on a weekly basis. This has been sufficiently efficient in our E-learning application, since students are using KAPUST to do a weekly assignment. The lattice generation can be scheduled to run daily instead of weekly in case of higher usage of KAPUST. KAPUST subsequent use of the generated lattice for query management, and intelligent interface are described in the next sub-section. 3.2.2. KAPUST Query handling Our querying algorithm takes a user’s query (set of keywords) and the conceptual lattice as input. It returns as output, the web page links that best match the search query (Figs. 7 and 8). A lattice, L, is a tuple of two sets, (Pi, Ki), where Pi and Ki as sets of page links and keywords respectively (Fig. 7). To illustrate query processing by KAPUST, we denote the set of potential concepts that match the user request, together with their priorities as PotentialConcepts = {(Pi, Ki), Priority}, where Priority determines how relevant the concept is to the user’s query. We take it as the depth level of

( {Hypertext}, {Page1, Page2, Page5, Page6, Page7, Page8} )

a concept (Pi, Ki) in the conceptual lattice in case a matching concept is found. Otherwise, we take it as a measure of how many keywords from the set of keywords entered by the user at login, UK, exists in a concept (Pi, Ki). Referring to Fig. 8, the algorithm has the following steps: Step 2 in the process prunes the set of keywords entered by the user by removing new keywords that do not exist in the concept lattice. Step 3 checks for a concept that has the exact set of pruned keywords. Step 4 handles the case were no matching concept is found. In this case, all concepts that have one or more keyword in their set of keywords that matches any keyword in UK’ are added as potential concepts. The priority is taken as (CountUK–CountK) to give highest priority to the concepts that have more matching keywords. If a concept has 2 matching keywords and the set UK’ has 3 keywords in its list, the priority will be 1, which is higher than a concept that has 1 matching keyword where the priority will be 2. Steps 5 and 6 consider the case where a matching concept is found. They add superconcepts and/or subconcepts of the matching concept. Subconcepts will have a higher priority than the superconcepts because they are more specialized. The most general and most specific concepts are not considered as super or subconcepts. If no super or subconcepts are found, the matching concept itself is added to the potential list of concepts. Step 7 orders the potentialconcepts for display. Step 8 divides the category and the page links under each category then retrieved the average rating for each page link to be displayed for the user. The strategy of choosing super and subconcepts gives the user a better perception and a wider amount of relevant information. Subconcepts contain all extents of the concept

({IT Projects}, {Pag1, Pag2, Pag3, Pag4, Page5, Page8} )

( {Hypertext, IT Projects}, {Page1, Page2, Page5, Page8} )

( {Hypertext, IT Projects, E-government}, {Page1} )

( {Hypertext, IT Projects, Virtual Agency}, {Page5} )

( {Hypertext, IT Projects, Virtual Classroom} {Page2} )

Fig. 7. Conceptual lattice: An E-learning example.

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

271

Fig. 8. Processing queries using the lattice.

itself. This allows us to categorize the page-sets one level deeper. Back to our example in Fig. 8, suppose the student logs in and enters keywords ‘‘Hypertext, IT projects’’, there is a matching concept containing exactly those keywords. Instead of getting the following results: ‘Hypertext, IT projects ! WebPage1, WebPage2, WebPage5, WebPage8,’ KAPUST gives the student a better insight about those web pages with the following recommendations: Hypertext, IT Projects, E-government: WebPage1 Hypertext, IT Projects, Virtual Classroom: WebPage2 Hypertext, IT Projects, Virtual Agency: WebPage5 Hypertext: WebPage6, WebPage7, WebPage8 IT Projects: WebPage3, WebPage4, WebPage8

Degree of generality of a concept on a page and its priority are inversely related. Recommended pages are shown to the user in order of decreasing priority. A page is displayed once unless, it belongs to n concepts on the same level of generality in the lattice, it will be displayed n times. The next section presents results of employing KAPUST in an E-learning environment.

4. KAPUST1.2. for E-learning A web server is needed to set up and store the semantic web component of KAPUST. A database server (SQL) is also required to store traces and execute the associated FCA engine. A client machine is designated as administrator and accesses the FCA engine from any machine. Traces

272

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

The class has 12 students from various backgrounds. The course given, PSPA 289: Information Technology and Public Administration, is a senior seminar course open to all majors. It focuses on the impact of IT on various aspects of public administration and policy around the world. PSPA289 students use KAPUST as part of their weekly assignments to answer assigned research questions. The system’s database and website are deployed on an online server at the university. A setup package to install the internet browser add-on utility, together with a video manual regarding the tool is arranged and distributed to each student so he/she can install and use it from home. For each of the first 12 weeks of the semester, the professor gives a research assignment on a new topic. Students are required to use KSPUST either from their homes or the common computer lab at the university, to browse for related articles and answer their assignment questions. For each question, students choose one or more keywords from the domain of the assignment question and enter them as session keywords before browsing any web pages. All visited web pages in this session are later annotated with these keywords. After the student logs in, he searches for articles to answer the question under study. For the students, each assignment question is handled as a distinct browsing session by the annotation tool (refer to Fig. 3 to recall the life cycle of a user’s session). At the session login the student chooses a new set of keywords representing question 2.

between clients and KAPUST server are transferred in XML. In our E-learning environment, students can do their assignments from home. They are given an installation package for the client side of KAPUST. In this section, we discuss and present our deployment of KAPUST in an E-learning environment. We first describe specific features of the domain of E-learning.

4.1. KAPUST for Learning E-learning refers to the systematic use of networked information and communications technology in teaching and learning (Horton & Horton, 2003). The emergence of E-learning is directly linked to the development of and access to information and communications technology infrastructure. Distant academic learning is one important application. E-learning techniques in the corporate world are also often used for residential workshops and staff training programs. E-learning is flexible, relatively cheap and supplies ‘‘just in time’’ learning opportunities. Creating an on-line collaborative learning environment is a necessary aspect for E-learning. Creating a sense of community and understanding the on-line behaviors of the participants are also crucial (Blunt & Ahearn, 2000). Several efforts have been made to create such environments. Notably, in George Mason University under the Program on Social and Organizational Learning (PSOL), research is being done to create and maintain a Virtual Learning Community for the participants in the program. The purpose of that research is studying the learning of the community within the developed environment and a better understanding of the dynamics of collaborative dialogue to enable more informed and sound decision making (Blunt & Ahearn, 2000). As an experimental workbench, we deploy KAPUST to provide an E-learning environment for undergraduate class in the Political Science and Public Administration Department at the American University of Beirut directed by Professor Roman Kultchitsky. Since this is an experimental setting, the presence of the professor is maintained throughout the semester. Students are free to use KAPUST at home and in some lab sessions.

4.2. Data collected and observations Browsing traces are collected on a weekly basis. Table 2 displays information about those traces. Fig. 9 explains how the tool behaves as the E-learning semantic web matured over 12 weeks of experimentation (much the fall semester of 2003 at AUB). As seen in Table 2 and Fig. 9, the annotation tool successfully collected user traces over the 12 weeks of experimentation. A large number of traces were collected. The rate of collecting traces increased over the last few weeks. We attribute this increase to the deployment and use of the lattice structure. Using the conceptual lattice gave the students a kind of motivation to work harder and helped in their learning process. It is encouraging to get relevant links directly to the point one is

Table 2 Pages visited per week Week 1

Week 2

Week 3

# of pages visited # of crossing pages

77 9

30 0

23 0

Week 7

Week 8

# of pages visited # of crossing pages

15 0

4 0

Week 4 2 0

Week 5

Week 6

18 0

9 0

Week 9

Week 10

Week 11

Week 12

11 0

26 0

78 6

0 0

This table shows the number of pages visited by users per week, as well as the number of crossing pages per week (pages that have been visited by two different users). Crossing pages give us a measure of how much sharing of knowledge is occurring. This table was constructed from a Statistics database table created particularly for the purpose of analyzing the traces.

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

W E E K #

1 A

2

3

4

5

6

B

7

8 C

9

10

11

273

12 E

D

Fig. 9. Behavior of the tool over the 12 weeks: Period A represents the testing phase where students were getting familiar with the tool. In Period B, we perceive trace collection and building up of the semantic web structure, and knowledge sharing between students through the notes and rating features of the annotation tool. Period C represents the maturity of the semantic web structure and the construction of the first conceptual lattice. In Period D, the functionality of the system as a whole is observed. In this period, the semantic web structure is continually updated online, the matrix table and the lattice structure are incrementally updated and constructed on a weekly basis, students continue to share their experiences, and most importantly students are given recommendations for each new browsing session they initiate. Period E represents the ending of experimentation and collection of data.

searching for. Moreover, students started to cross one another’s trails in Period D. This shows the power of the inference engine of the tool. The inference engine started taking action during this period to query the conceptual lattice and present recommendations to the students according to their search criteria. All throughout the term of 12 weeks, the tool proved itself as an essential ontology builder application. Ontologies were automatically collected from user’s trails and a database of ontologies related to the domain of information technology and its impact on political and public sectors has been created. Though the generic creation of ontologies has a tendency to result in a lot of noise and syntactical mistakes in the keyword entry process. In our case, this problem was overcome by consulting a dictionary module of ontologies. The ontologies used by the dictionary module are the same ones that the system has collected from user traces. Using the dictionary module to validate students’ search criteria minimizes the possibility of keyword redundancy and mistakes. These ontologies, which are in the form of keywords, will be discussed further in the following section while discussing the lattice structure.

Keywords, Pages & Concepts of the Conceptual Lattice per Week 300 250 200 150 100 50 0

Week 8

Week 9 Week 10 Week 11 Week 12

Accumulated Number of Keywords

51

72

79

94

98

Accumulated Number of Pages

116

154

178

240

255

Total Number of Concepts

59

78

86

104

109

Fig. 10. Analysis of conceptual lattice evolution: The accumulated number of keywords and pages come form the Matrix Table, while the total number of concepts comes form the Lattice Structure. The update of the lattice from week 9 till week 12 came from the exported traces showing additional browsing experience (so called ‘Delta’ traces).

4.3. Conceptual lattice construction Traces (in XML exported files (Delta and Full)) are collected on a weekly basis. The first full exported file is fed to the inference engine during Period C (Fig. 9) to generate the first version of the conceptual lattice. Sixty user sessions are collected, 112 notes and 156 ratings are made. Fig. 10 shows how the conceptual lattice evolves during Period D by measuring the number of keywords, pages and concepts; concepts are scattered across eight levels (including most specific and general concepts). The conceptual lattice is finally formed of 255 pages, 98 keywords and 109 concepts. Table 3 displays characteristics of concepts at each of level of the final lattice. Recall that a concept is formed of a set of pages and a set of keywords (characterizing the session), level 1 contains the most general concept (see Table 2). This concept is formed of all the pages in its page set and contains the empty set in its keyword set. At level 8 we have the most specialized concept. This concept contains the

Table 3 Analysis of conceptual lattice by level: the number of keywords per concept increases—in reverse to number of pages, as we move towards the more specialized concepts

Level Level Level Level Level Level Level Level

1 (general) 2 3 4 5 6 7 8 (specific)

# of concepts

# keywords per concept

# pages per concept

1 59 21 16 6 4 1 1

0 1 to 2 to 3 to 5 to 8 to 12 98

255 1 to 1 to 1 to 1 to 1 to 1 0

2 3 5 8 9

22 12 9 6 4

empty set in its page set and all the keywords in its keyword set. The fact that the lattice has several levels indicates knowledge sharing by the students. It also means that the

274

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

tool has been successful in mapping concepts with the corresponding pages from different user trails to generate new concepts on various levels of the lattice structure. At level 7, the second most specialized concept, we have one concept which refers to the Google search engine in its page set and which has 12 satisfied keywords. This concept is not too relevant, because the Google search engine is not a valid web page. It is rather a general site. This concept resulted from improper usage of the tool and from testing or noise data during Period A (see Fig. 9). Constructing the lattice (Fig. 11) becomes time consuming as the number of concepts increase (around 5 min for 109 concepts). This is not crucial in our case, since we are performing the updates on a weekly basis without hindering the students from using the tool. However, if the system is to be deployed in another domain where concepts grow fast and the lattice has to be updated more frequently, then our FCA algorithm may need to be reconsidered.

an E-learning environment through knowledge sharing among students who come from different generations. Most concepts in our lattice are at level 2 which is too general. Several factors relating to the nature of our E-learning environment have led to this: similarity in keywords and the fact that a new topic, and that a research topic was introduced each week. Comparative research assignments produced a deeper lattice. This leads to an important question, how the research task assigned to the students impact the development of the conceptual structure. We are planning to explore the relationship between ‘kinds’ of research assignments and semantic web development. As finding trails for a given ‘‘topic’’, one can succeed with a relatively high relevance by simply searching with appropriate keywords. The problem is to find the web pages that contain the answer on the given ‘‘question’’ from the given ‘‘topic’’. A possible extension to KAPUST towards this, currently explored by one of the authors (Prof. Kultchitsky), is to recognize a page as a collection of paragraphs, where each paragraph has its own keyword. This will be a major extension to KAPUST. In its current form, KAPUST provided an E-learning environment which in turn provided the students and the professor involved means of sharing information and experiences without requiring any paper work. Even during the early stage of evolution of the semantic web for the class, the tool still contributed to the learning process. It assisted in collecting web pages under the domain of IT, public administration, and political science and created a structure out of these web pages which accelerated the learning of the class. At the pedagological level use of KAPUST merges the reading and writing processes; and a whole new dimension of implicit discourse between students evolves on top of the original documents that lend themselves to new multiple reading pathways, and non-hierarchical and polyarchic structures without ruining the integrity of any original research texts.

4.4. Discussion of results

5. Summary and future work

Collected results are evidence for suitability of reasoning over user traces using KAPUST. Our FCA algorithm used to construct the matrix table and lattice performed well in our domain. The method we use to query the lattice provides recommendations for the students in a categorized way and that gave the students a way to share their knowledge with their fellow students. Querying for a set of naming keywords where none of the concepts in the lattice structure contains an exact match, we take each keyword individually and leave it to the user to judge, this is like an ordinary keyword search thus not fully benefiting from the conceptual lattice. Deploying KAPUST another semester, we expect a more complex lattice to be constructed. Moreover, students in the following semester will get even more benefit from the tool, as they will be searching an existing lattice structure. This will further show the benefits of this approach in

In this paper, we developed and explored a technique to help people mine the Internet more effectively. Instead of categorizing text based search results, as some tools attempt, our work develops categories based on users from similar community and interests. Trails of users of similar interests are processed to create a semantic web which is then used to provide intelligent search responses. Our approach to develop semantic web is incremental, building meta-data describing the relationships amongst web pages. We exploited these dependencies in an intelligent search engine to yield a more focused search result than a standard text based search. We developed an add-on tool to work with existing internet browsers. Our tool collects users’ traces and generates a conceptual structure from a collection of similar traces. Our testbed community of users is a class of undergraduate students. Our tool provided an infrastructure for an E-learning environment. Our

Time needed per number of concepts to construct the lattice 350 300 250 200 150 100 50 0

Week 8

Week 9 Week 10 Week 11 Week 12

Number of concepts

59

78

86

104

109

Time to construct the Lattice Structure (in sec)

83

146

181

266

292

Fig. 11. Relative time needed to construct an increasing size lattice.

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

approach is innovative in that we use information which is typically ignored by other intelligent browsers. In current browsers, e.g. Explorer, the virtual surfing trail is not commonly exploited and its use is limited allow backtracking in a browsing session by a single user. The surfing trail is not used to improve searching in future browsing sessions by other users. We have borrowed several rules from psychological studies of social navigation on what makes a good navigation system: we considered the integration of the tool with the browser, the presence of a community to share knowledge, and the proxemic space to provide a transparent environment for the users. This has further made our tool easier to use. Our tool requires very little training. Anyone familiar with using the internet browser can directly recognize its functionalities. Our approach is non-intrusive to users and incremental and adaptive to any changes in the knowledge collected from the users. We combined ideas from manual ontologies engineering and from automatic approaches based on machine learning and data mining. We bypassed the need for manual annotation of web pages. Our hypothesis in this project has been that individuals in a given interest group produce surfing trails that are of interest to the whole group. Our software and our experiments have shown that our hypothesis is indeed true. Individuals of our interest group, students studying political science, produced surfing trails that are of interest to the whole group. Students interactions with the course notes have incrementally evolved into a Semantic Web, describing the relationships amongst contents of their course notes and ideas. These dependencies were utilized by our intelligent search engine to yield a more focused search result, for the students themselves to use as they learn. The whole class was collectively learning and evolving as the subject progresses. Relationship between the pages and students notes was automatically discovered by our intelligent software running in the background, ensuring that the technology we used remains nonintrusive and easy to use to non-technical students. Our approach has bypassed difficulties associated with manual annotation of web pages in two ways: We first lowered the effort required by restricting the input from the user to only name the session. This name is then applicable to all subsequent visited pages. Secondly, we added a dictionary module to ensure that similar names are associated with the same surfing trails. In our approach users do not have to agree on use of names; and they can remain focused on the browsing task. The automatic part of the system which organizes the trails and the keywords is based on Formal Concept Analysis. The generated conceptual structure provides to the students a user friendly natural presentation of the semantic web evolved. The conceptual lattice structures the data from the most general to the most specific concept. It relates concepts to each other based on their intents and extents, thus providing a means for creating new data that was not directly perceived from the user trails. Moreover, querying

275

the lattice is an easy task and it can vary between approaches to make the most out of the structure. For instance, in our method, if a user is searching for a certain concept, we provide him with a categorized result formed of the upper and lower levels of the concept itself in order to gain more insights about the extents constituting the concept. 5.1. Future work Our system, KAPUST, is a form of open multiagent system. The external agents are the browsers (students in our case) and the internal agents are the intelligence building algorithms and the intelligent interface modules (e.g. dictionary). In this light, our framework for enhancing search results in response to users evolves the behavior of the electronic medium (info space) to improve its effectiveness for the users (students in our case), and is in general terms a form of multiagent system evolution. This form of evolution in response to external agents behavior is of importance electronic markets agent systems. For example, keeping track of clients preferences in a non-intrusive manner can be used to evolve market access and maintain products lines (service products or otherwise). The non-intrusive nature of our approach solves the problem of getting feedback from customers who may not have any true interest or inclination to give it. Towards this, we are currently investigating what kind of information can be obtained non-intrusively. In Beydoun, Debenham, and Hoffmann (2004), we looked at evolving e-markets in response to external evolutionary factors, the processes of other electronic institutions. Our work in this paper is a form of multiagent system evolution, in response to external agents. How the two external evolutionary factors, external agents behavior and external processes in other e-markets are integrated into a single framework for evolving e-markets is also current work in progress in Beydoun et al. (2004). Recently in Beydoun et al. (2005), we have devised mechanisms to evaluate the result of cooperative modeling to integrate with KAPUST. We are in the process of integrating those mechanisms to devise a measure of trust users can invest in the developing conceptual structure. References Benjamins, V. R., Fensel, D., Decker, S., & Perez, A. G. (1999). (KA) 2: Building ontologies for the internet: a mid term report. International Journal of Human–Computer Studies, 51, 687–712. Beydoun, G., Debenham, J., & Hoffmann, A. (2004). Integrating agents roles using messaging structure. In Pacific Rim multiagent system workshop(PRIMA2004), Auckland, Auckland University. Beydoun, G., Hoffmann, A., Breis, J. T. F., Be´jar, R. M., Valencia-Garcia, R., & Aurum, A. (2005). Cooperative modeling evaluated. International Journal of Cooperative Information Systems, World Scientific, 14(1), 45–71. Blunt, R., & Ahearn, C. (2000). Creating a virtual learning community. In The sixth international conference on asynchronous learning networks (ALN2000), Maryland. Davies, J., Fensel, D., & Harmelen, F. V. (Eds.). (2003). Towards the semantic web: Ontology-driven knowledge management. Wiley: London.

276

G. Beydoun et al. / Expert Systems with Applications 32 (2007) 265–276

Dieberger, A. (1997). Supporting social navigation on the world wide web. International Journal of Human–Computer Studies, 46(6), 805–825. Forsberg, M., Hook, K., & Svensson, M. (2001). Design principles for social navigation support. In C. Stephanidis (Ed.), User interfaces for all. Stockholm: Lawrence Erlbaum Assoc. Ganter, B., & Wille, R. (1999). Formal concept analysis: Mathematical foundations. Springer-Verlag. Hendler, J. (2001). Agents and the semantic web. IEEE Intelligent Systems, 16(2). Horton, W., & Horton, K. (2003). E-learning tools and technologies. John Wiley and Sons. Kahan, J., Koivune, M.-R., Prud’Hommeaux, E., & Swick, R. R. (2001). An open RDF infrastructure for shared web annotations. In Tenth international world wide web conference (WWW10), Hong Kong. Kim, M. H., & Compton, P. (2001). Formal concept analysis for domainspecific document retrieval systems. In 14th biennial conference of the Canadian society for computational studies of intelligence (AI 2001). Ottawa: Springer.

Lacher, M. S., & Decker, S. (2002). RDF, topic maps, and the semantic web, markup languages: Theory and practice. MIT Press. Laus, F. O. (2001). Tracing user interactions on world-wide webpages. Psychologisches Institut III. Germany, Westfa¨lische WilhelmsUniversita¨t. Quan, D., Huynh, D., & Karger, D. R. (2003). Haystack: A platform for authoring end user semantic web applications. In The twelfth international world wide web conference, Budapest, Hungary. Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer, R., & Wenke, D. (2002). OntoEdit: Collaborative ontology development for the semantic web. In The first international semantic web conference 2002 (ISWC 2002), Sardinia, Italy. Wexelblat, A. (1999). History-based tools for navigation. In IEEE’s 32nd Hawai’i international conference on system sciences (HICSS’99). Hawai: IEEE Computer Society Press. Wexelblat, A., & Maes, P. (1999). Footprints: History-rich tools for information foraging. In Conference on human factors in computing systems (CHI’99), Pittsburgh.