Intelligent Assistance for Navigating the Web - Semantic Scholar

0 downloads 0 Views 39KB Size Report
make sense to represent this as Chris having two separate resumes. This is why the three kinds of individuals are distinct. Having the subjects of the information ...
Intelligent Assistance for Navigating the Web Christopher A. Welty Computer Science Dept. Vassar College Poughkeepsie, NY 12601 Tel: (914) 437-5992 Fax: (518) 272-5581 [email protected] The Untangle Project is an attempt to apply KR&R techniques to the problem of finding information on the ever-expanding World Wide Web. There are two key enabling technologies that allow for Untangle to work: a deep ontology of the kind of information that can be found on the web. and an HTML interface that provides on the fly access to the knowledge-base. The interface allows for queries formulated in the underlying representation language (Classic), which provides a far more expressive facility for searching than is currently available in any web navigation tools. Subject Areas: Ontologies, Electronic and On-Line Information, KR&R User Interfaces, Representing large domains.

1 Introduction There can be no doubt that the Word Wide Web is growing at an incredible rate. The current default model for Web use is casual “surfing” – many users consider The Web a novelty and browse around without any specific goal. Clearly this model will change, if The Web is to survive, and tools will have to be available that support users who are searching for specific information. This paper describes aspects of the Untangle Project, an effort to provide intelligent assistance to someone searching The Web for information. This assistance is manifested by several technologies which are briefly described in this paper. The first technology is an ontology and knowledge-base implemented in the the description logic Classic [Brachman, et al., 1989], a descendant of KL-ONE. The ontology provides a deep

representation of the information that is available on The Web. The second technology is a web interface for Classic, which allows the knowledge-base to be accessed interactively through any web browser (such as netscape or mosaic). This access is not to a static set of HTML files that are generated automatically at periodic intervals, but to the live knowledge-base. The paper closes with a discussion of how Untangle can be used as a vehicle to demonstrate the power and utility of KR in real applications. This paper is available in HTML format at http:/ /www.cs.vassar.edu/faculty/welty/papers/untangle/flairs-96_1.html. The HTML version of the paper contains numerous hot links to the references and to live demos of the Untangle system.

1 of 11

Intelligent Assistance for Navigating the Web

2 Ontology for Electronic Information information-view

From a KR perspective, the central issue in the Untangle project is the ontology for representing information that is in electronic form. Chris-resume.html

2.1 Previous Work

Chris-resume.ps

This work began before the World Wide Web became popular, and the initial goal was to support intelligent email distribution. The basic elements of the ontology remain unchanged from the KBEDS system originally described at FLAIRS-94 [Welty, 1994a], however some minor changes have been made to bring it in line with the standard bibliographic ontology that is part of the Ontolingua Ontology Library [Gruber, 1994]. This updated ontology is described in [Welty, 1995], and while the reader is referred to either of these previous papers for more in-depth details of the main ontological elements, a brief discussion follows. The ontology of electronic information is broken into three disjoint parts: information-item s, information-views, and event-or-objects. These concepts partition all the information in the knowledge base. An information-item is a piece of information that, typically, describes an event or object, and an information-view is a particular view of that piece of information. Figure 1 shows individuals of these three concepts. In this figure, and throughout this paper, the dashed lines represent the individual-of relationship, the solid lines represent the named relationships (roles), the rounded boxes are individuals, and the sharp boxes are concepts.1. The individual Chris, an object, represents a person. This person has a piece of information, a resume, associated with it through the has-information link. The inverse of this link, informa1. The individual-of relationship is a special syntactic construct in Classic, and is therefore shown apart from other relationships.

has-view

Chris-resume

has-view

has-information Chris

information-item

object

FIGURE 1. FIGURE 1. The three kinds of individuals.

tion-of (not shown), typically represents the fact that the information item is about the event or object. The information item, Chris-resume, has two views. It is important to realize that each of these views is the same information, simply in different formats, and therefore it would not make sense to represent this as Chris having two separate resumes. This is why the three kinds of individuals are distinct. Having the subjects of the information available in the representation is the key advantage to this approach. While most users of the system will be after particular information items, it is the objects and events that tie the various pieces of information together and facilitate the inference that makes the system intelligent. A frequent source of confusion in the ontology is the representation of paper, a sub-concept of publication, as an object. Many see this as obviously an information item. A paper is, however, an object which has a title, an author, may reference and be referenced by other papers, may be

2 of 11

Intelligent Assistance for Navigating the Web

event-or-object

information-view paper-01.ps

paper-01.html

event has-view

summary-01.txt

person

information-item

has-view

paper-01-text

object

has-item

conference

organization has-item paper-01

paper-01-summary

publication college

title

“Intelligent Assistance for Navigating ...”

FIGURE 2. FIGURE 2. Events and Objects.

published in a book or journal, etc. It can have several information items associated with it as well: an abstract, a summary, a review, or the document text itself [Gruber, 1994]. Each of these latter kinds of items would be individuals of information-item. Once the existence of the document-text concept, which is clearly an information item that can have multiple views, is realized, the representation becomes more clear. Figure 3 shows an example of how a paper is represented.1 2.2 New Goals of the Research

With the explosive growth and popularity of The WWW as a medium for disseminating information, it seemed far more interesting, practical, and trendy to apply the ontology for electronic information to this huge tangled mess. The original goals were therefore subtly altered: to pro1. In truth, all individuals have a generic name, such as paper-01 or person-159, and objects have a name or title role that is filled with the more conventional name.

author Chris object FIGURE 3. FIGURE 3. Representing a paper.

vide intelligent assistance for navigating The Web. The most popular navigational tool available on The Web is Yahoo, which presents a hierarchical view of subject areas that can be browsed topdown, or searched for keywords. The incredible popularity of this service indicates that it is useful and desperately needed, and clearly this form of information organization (a hierarchy of increasingly specific subjects) lends itself quite easily to expression in a KR system. There are numerous shortcomings to Yahoo that can easily be overcome with KR technology. There is little place in the Yahoo taxonomy for

3 of 11

Intelligent Assistance for Navigating the Web

web pages that can’t be explicitly classified. A person’s home page, for example, will typically not be be located on Yahoo, since it describes a person, not a specific topic.

information-item

web-page

The granularity of the Yahoo categories is often too large, and there is no other information about the web pages that might be used to constrain a search, other than keywords appearing in the link name. The Artificial Intelligence category, for example, has over 150 links. If you are interested in only AI conferences, however, you can either browse the 150 links, or try to find everything in the AI category that has the word “Conference” in its name. The latter search would miss such important items as FLAIRS-96 or the AAAI Spring Symposium Series, and clearly it completely lacks the much needed ability to search for “conferences in Florida.” In addition, without something to tie them together, related web pages will simply be listed in alphabetical order, thus separating such things as the web pages that describe the various special tracks of FLAIRS. If there was knowledge that these different pages describe aspects of the same event, it could be helpful to someone searching for this information. Clearly KR has much to offer as a technology for assisting web navigation, and the Untangle Project began as an effort to apply KR to this domain. 2.3 Obstacles

Criticisms of Yahoo aside, it is an extremely useful service, and the initial plan for the Untangle Project was to at least duplicate the functionality of Yahoo, and build from there. This deceptively simple goal, however, presented some obstacles whose solutions were fairly good lessons for ontological development. The initial approach used to integrate the Yahoo

science-page

art-page

compsci-page

drama-page

ai-page

opera-page

FIGURE 4. FIGURE 4. Initial representation of Yahoo subjects.

subject taxonomy into the ontology of electronic information was to create a concept called webpage, below which were the subjects, as shown in Figure 4. This was essentially the Yahoo hierarchy, since Yahoo is only capable of representing Web pages, and not the objects or events behind them. This approach works for web pages, but when the individuals in the event-or-object category were considered, it quickly became clear that they, too, needed to be grouped by subject. Suddenly the ontology grew to the point where there was a subject taxonomy below the concept person (i.e. science-person , art-person , etc...), below conference (i.e. science-conference, artconference, etc...), below publication, organization, and so on. Aside from the obvious problem of too many redundant concepts, there was nothing to indicate that sets of concepts such as ai-person, aipage , ai-conference , etc., were somehow

4 of 11

Intelligent Assistance for Navigating the Web

?

conference

ai-conference

FLAIRS information-item-of

information-item-of

FLAIRS

conference

web-page

FLAIRS-page FLAIRS-page

subject

cs

art

ai

drama database

web-page

ai-page FIGURE 6. FIGURE 6. First step towards a solution.

FIGURE 5. FIGURE 5. Missing inference.

related. It would seem like an obvious and desirable inference to be able to make that if a web page is an information item of a conference, and the web page is considered an ai-page, then the conference must be an ai-conference. Figure 5 shows a situation with the simple inference missing.

between individuals of subject , but this was undesirable because Classic uses subsumption as its central form of inference [Brachman, et al., 1991], and the built-in subsumption facilities do not recognize any named roles as a specialization link. 2.4 Solution

Clearly this would be fairly easy to remedy with a rule that says any individual of an ai-page is the information-item-of an individual of ai-conference. The problem is that a separate rule is needed to infer this between database-conference and database-page , art-conference and art-page, and between every pair of related subjects. The next approach was to take out the subject sub-taxonomies, and create a concept called subject , whose individuals were all the different subjects, and these individuals could fill the role has-subject in all events, objects, web pages, etc. This worked a little better, since one rule would now cover all the cases mentioned above, but the hierarchical ordering of subjects was lost. It was possible to create a role called sub-subject

The obvious solution may seem fairly clear at this point, but it was stubbornly avoided for semantic reasons. The solution was to create a fourth kind of concept, call it subject for the moment, that was the parent concept of a taxonomy of subjects, such as computer-science, artificial-intelligence , art , drama , etc. Figure 6 shows an example of this fourth taxonomy merged with the previous example. With this approach the taxonomy of subjects is preserved, the natural subsumption relationship between Classic concepts is used, there is no duplication in the concept hierarchies, and a single rule can be used to infer the subject of an event or object if it is known of an information item. Every individual is an individual of one of event-or-object, information-item, or informa-

5 of 11

Intelligent Assistance for Navigating the Web

tion-view, and in addition is an individual of any number of concepts in the subject taxonomy. Clearly this is an elegant solution to the problem in an operative sense, but semantically there is a problem with it. Every individual of a concept is also an individual of all the concept’s parents, thus in Figure 6 the individual FLAIRS is an individual of cs and subject. Consider the normal meaning of being an individual: the individual is a whatever the concept is. For FLAIRS, it makes sense to say that it is a conference, but is it a subject, or even an AI or CS? This semantic difficulty may seem trivial, but the Untangle Project was founded on the very notion that a deep and more accurate representation of information on the web would facilitate access to that information. Compromising on the semantics, on the very way the information is interpreted, seemed to violate the principles of the project and to create the possibility for confusion. Religion aside, the final (perhaps current is more appropriate) solution was actually not far off. Simply by adding thing to the names of the concepts in the subject taxonomy, and changing the name of the top concept to represented-thing, the semantics are no longer confusing. FLAIRS is a conference, an ai-thing, cs-thing, and represented-thing. 3 Web Interface Probably the most significant enabling technology that makes the Untangle Project able to actually demonstrate how KR can offer improvements over existing web navigation tools is the Common LISP Hypermedia Server (CL-HTTP) [Mallery, 1994]. This is an HTTP server implemented in Common LISP, which enables HTML forms, queries, and searches to invoke LISP functions. These LISP functions output HTML, which is interpreted by the client browser and

displayed. The significance of this development is profound in two ways: 1. It immensely simplifies the generation of a user interface, since the web browser handles all formatting, graphics, and viewing. The size of the Untangle Project team would have precluded a usable interface otherwise. 2. The actual data in the knowledge-base can be shared. Any number of people, anywhere on the net, can access the data and see live results in a presentation format (HTML) they are familiar with. Most knowledge bases can only be used by a single person at a time.

The project focused its early stages on developing a web interface for Classic. This interface displays concept and individual descriptions as dynamic web pages – the HTML is generated on the fly to reflect the state of the Classic object at the time of the query. References to other concepts or individuals in these descriptions are hot links to the functions that will display them. Concept taxonomies can also be displayed as nested HTML lists. Within concept or individual descriptions, role taxonomies are displayed as nested HTML lists as well. KR systems are well known for producing lots of information, much of which is not useful to a person. The interface suppresses a lot of this using per-concept meta-information to control which roles are displayed. It also by default suppresses the display of information inherited along the role hierarchies if the original information is also displayed. All this default behavior can be overridden by the user using forms accessible through every description page. The interface also overrides the default display of an individual if any of that individual’s parents have a display-function. The purpose of this is particularly for information-items. The roles of an information item typically link it to the object or event the information is about, and the

6 of 11

Intelligent Assistance for Navigating the Web

views of that piece of information. For example, the individuals Chris-resume and Chrisresume.html from Figure 1 are shown below: Chris-resume:: (and information-item description (fills information-item-of Chris) (fills has-information-view Chris-resume.html Chris-resume.ps))

ble to do inference, and it is inference that sets KR techniques apart from others [Welty, 1994b]. This section provides some examples of useful inference that the ontology currently supports. 4.1 Classifying Events and Objects

It is useful, as discussed in previous sections, to be able to infer where events and objects fit in the represented-thing taxonomy based on how their information items are classified. This is accomplished with a description rule in Classic.

Chris-resume.html:: (and information-view html-view (fills information-view-of Chris-resume) (fills URL “http://www.cs.vassar.edu/...”))

A user will rarely want to see these actual descriptions. When interested in a particular information item, such as Chris-resume, a user will want to see the information itself. Every information view has a URL role which is filled with a string that identifies the location of the information. When a user (through various mechanisms such as the result of a query or clicking on a role in another individual) selects an information item to be viewed, the display function for information-items bypasses the actual individual description, retrieves the string from the URL slot of the item’s view, and passes that back to the web browser (which should know what to do with a URL). If the item has more than one view, as is the case in the example above, the display function generates an HTML list in which each item is a link to the URL of one of the views. In other words, it lets the user choose the view. 4 Showing off KR The Untangle Project has the potential of being a highly visible testimony to the power of knowledge representation techniques in making information more accessible. The advantages are more than simply adding the events and objects underlying web pages to Yahoo. Adding this depth to the representation then makes it possi-

Classic description rules are attached to concepts and fire on any individuals of that concept that pass a filter. This rule is attached to event-orobject, and the filter insures that the individual has an information-item. When the rule fires, it returns the parents below represented-thing of the information item, and adds these to the parents of the event-or-object. Note that the converse of this rule does not work, an information item can not be assumed to be of the same type as the event or object it describes. A person, for example, may have many interests and thus be classified under several unrelated subjects, yet not all the web pages associated with that person will describe all those subjects. The utility of this rule can probably only be realized in practise. The work of populating the knowledge-base with individuals is triggered by discovering interesting web pages and other online information. When discovered, the appropriate information items and views are created, and placed in the represented-thing taxonomy. Especially in the case where a new information item describes an object or event that already exists in the knowledge-base, new subject information is typically not added to the object or event (for no other reason than that it is usually overlooked by the person doing it). This automated assistance then helps keep the knowledge-base more accurate.

7 of 11

Intelligent Assistance for Navigating the Web

4.2 Common Navigational Rules

In addition to providing a deep representation of the knowledge that is behind web pages, the ontology also supports rules that express common navigational rules employed by “experts” at navigating the web. These rules can be used to find information that Untangle does not have. There are basically two kinds of these rules: • Good places to start a search. The ontology

supports an understanding that certain web sites, such as Yahoo in general, or others that are more domain-specific, are good places to start a search. • How to construct an address. Most expert

web navigators fall back on several simple rules about where web servers might be. For example, if you are looking for information about a college, try www.college.edu. This may seem inherently obvious to some, but based on experience at Vassar this useful piece of knowledge is not very widespread.

make use of it. This meta-information can be of tremendous use in a system capable of doing inference. 4.4 Subsumption as Search

The subsumption language of Classic also provides an excellent query language. Without needing to know before-hand what kinds of queries will be done (and thus forming a hash table), subsumption-based searches can still be performed in a reasonable amount of time. The more you know about what you are looking for, the more information that is provided in the query, the faster it goes. The ontology was specifically designed to help a user find previously seen web pages or papers with only a vague recollection of what they were. Here are some example queries supported by the ontology, and their accompanying translations into Classic: • “Find all the AI Conference pages”

4.3 Automatic Classification

This section describes ideas that have not been implemented or explored deeply yet, and are the subjects of the next phase of research in the Untangle Project. Classic provides a powerful and flexible language for describing the sufficient conditions for subsumption. In other words, part of a concept description can be the sufficient conditions for classifying an individual under it. We intend to study how this can be used in conjunction with a web crawler (a program that follows links around the web) to automatically populate the knowledge-base. This effort is not as far-fetched as it may seem. The development of HTML is moving towards providing more meta-information in the language, and encouraging HTML developers to

(and information-item (all information-item-of (and conference ai-thing)))

• “Find all the home pages of people interested

in AI.” (and home-page (all information-item-of (and person ai-thing)))

• “Find all the home pages of people interested

in AI and who are in industry.” (and home-page (all information-item-of (and person ai-thing (all employee-of company))))

• “Find all AI papers available on-line” (and paper ai-thing (at-least 1 has-document-text))

8 of 11

Intelligent Assistance for Navigating the Web

for submission to the Ontolingua Ontology Library [Gruber, 1993].

• “Find all papers published in the FLAIRS-96

conference proceedings that are available online.”

• The application domain for this project is

(and paper (at-least 1 has-document-text) (fills year-published 1996) (all published-in (all proceedings-of (fills name “FLAIRS-96”))))

quite real, and by no means of a toy scale. While Untangle is not near the size of, e.g. Yahoo, this is simply a matter of manpower. The size of the knowledge-base grows daily, and new uses and benefits of the deep ontology and inference capabilities are constantly being found.

These queries can by entered through the interface, and their results viewed as HTML. A primer on writing Classic queries will be available through the interface as well. 5 Conclusion This paper does not present any startling new discoveries. It is the application of of tried and true Knowledge Representation techniques to a highly visible domain: navigating the web. The contributions of this work are threefold: • Making KR technology available on the

WWW can potentially demonstrate to a much larger community the practical benefits of using KR&R. This paper has described a user interface that is completely generic to Classic, and has been used for other Classic knowledge-bases, yet is flexible enough to support the display of objects outside the representation (actual web pages, in this case). It provides a “look and feel” that is familiar to a very large and still growing community. Web interfaces to KR systems will be a prerequisite for success of any knowledge sharing system in the future. • The ontology for electronic information

merged with a subject taxonomy carries the standard bibliographic ontology [Gruber, 1994] one step further. After further practical testing and use, the new ontology will undergo more formal and rigorous analysis

The original motivation for moving this research from Email distribution [Welty, 1994a] to the WWW was to demonstrate to the Digital Library community that KR techniques can improve on what is being done with Information Retrieval (IR) [Welty, 1994b]. The Untangle Project therefore hopes to be a showcase for KR&R. This will be tough going, Digital Libraries and the WWW in general are strongly tied to IR, but the deeper knowledge and inference capabilities that characterize KR do have quite a bit more to offer. Acknowledgments Anthony Schorr and Derek Gaasch have worked on various aspects of this project. REFERENCES [Brachman, et al., 1989] Brachman, R., Resnick, L., Borgida, A. and McGuinness, D. CLASSIC/DB: A Structural Data Model for Objects. ATT Bell Labs Technical Report. 1989. [Brachman, et al., 1991] Brachman, R., McGuinness, D., Patel-Schneider, P., Borgida, A. and Resnick, L. Living with CLASSIC: When and How to Use a KL-ONE-Like Language. Principles of Semantic Networks. Morgan Kaufman. Pp. 401-456. May, 1991. [Gruber, 1993] Gruber, T. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993.

9 of 11

Intelligent Assistance for Navigating the Web

[Gruber, 1994] Gruber, T. Introduction to the Bibliographic Data Ontology. Available as http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/ bibliographic-data.text.html.

[Mallery, 1994] Mallery, J. A Common LISP Hypermedia Server. Proceedings of The First International Conference on The World-Wide Web. Geneva: CERN, May 25, 1994.

[Lenat, et al., 1990] Lenat, D., Shepherd, M., Pratt, D., Pittman, K. and Guha, R. Cyc: Towards Programs with Common Sense. Communications of the ACM. 33(8). Pp. 30-49. Aug, 1990.

[Welty, 1994a] Welty, C. A Knowledge-Based Email Distribution System. Proceedings of the 1994 Florida AI Research Symposium (FLAIRS-94). [Welty, 1994b] Welty, Chris. Knowledge Representation for Intelligent Information Retrieval. Proceedings of

10 of 11

Intelligent Assistance for Navigating the Web

the CAIA-94 Workshop on Intelligent Access to Digital Libraries. March, 1994. [Welty, 1995] Welty, Chris. An Ontology for Electronic Information. Submitted to DL-96, the 1996 ACM Conference on Digital Libraries. Available as http://www.cs.vassar.edu/faculty/welty/papers/on-line-rep/dl-96_1.html.

11 of 11

Suggest Documents