saire - a scalable agent-based information retrieval engine - CiteSeerX

SAIRE - A SCALABLE AGENT-BASED INFORMATION RETRIEVAL ENGINE Jidé B Odubiyi; David J Kocur, and Stuart M Weinstein. Lockheed Martin Space Mission Systems, Seabrook, MD 20706 Email: (jideo, davek, stuartw)@groucho.sms.lmco.com and

Nagi Wakim; Sadanand Srivastava; Chris Gokey, and JoAnna Graham. Bowie State University, Bowie, MD 20715 Email: (nwakim, ssrivas, cgokey, jgraham)@cs.bowiestate.edu

Abstract The information access and retrieval capabilities provided by several conventional search engines, while improving upon traditional techniques, are lacking in supporting conceptual information retrieval capabilities, natural language input, and individual user preferences. This paper presents SAIRE - a Scalable Agent-based Information Retrieval Engine, a multiagent search engine (a.k.a. softbot) employing intelligent software agents, natural language understanding, and conceptual search techniques to support public access to Earth and Space Science data over the Internet. SAIRE provides an integrated user interface to distributed data sources maintained by NASA and NOAA, thereby hiding complex information retrieval protocols from users. Our experience developing SAIRE has demonstrated the feasibility of a multi-agent architecture, multi-agent collaboration, and communication between people and agents to support intelligent information access and retrieval. Our approach for resolving these and other technical issues are addressed in this paper.

1. Introduction SAIRE is one of the seven Digital Library Technology (DLT) projects being funded under NASA’s Information Infrastructures Technology and Applications (IITA) Program, a part of the High Performance Computing and Communications (HPCC) effort. SAIRE’s primary objective is to provide tools to aid the public in accessing and retrieving Earth and Space Science data over the Internet. SAIRE is one of NASA’s contributions supporting the goals of the National Information Infrastructure (NII). The Earth Observing System (EOS) Data and Information System (EOSDIS) provides access to over 500 products and services of interest to users of science data. A Distributed Active Archive Center (DAAC) performs both ingest and distribution functions for science data at a particular data center. The EOSDIS Information Management System (IMS) provides a gateway to ten DAACs from NASA and NOAA with a uniform query language and data protocol for the Earth Science user community. NASA maintains a Global Change Master Directory (GCMD) where descriptions of actual data in the DAACs are stored. Search engines developed to support the public’s access to Earth and space science data over the Internet should be easy to use, able to reason about user objectives, and employ innovative information navigational techniques[14]. Complex information location and retrieval protocols must be

transparent to the user. Users should be able to specify what they want in any preferred mode of communication with the system (speech, text, graphics, etc.) without any concern for the complexities of information retrieval protocols or the location of information sources. SAIRE is being developed to provide these capabilities. It can accept a user’s natural language request and translate it into an Object Description Language (ODL) query that includes specific data centers, sensors, earth regions, and temporal constraints. While the query is being processed at the IMS gateway in the case of DAACs or at the Global Change Data Center (GCDC) for metadata, status information is available to the user on the results of the query. When query processing completes, results are made available to the user. SAIRE provides an opportunity for non-science users to answer questions and perform analyses using quality science data. There are ongoing research activities at various government, academic, and industry research institutions, to develop innovative technologies supporting public access to heterogeneous information sources distributed over the Internet (refer to Section 5 of this paper). At the highest level, the architectures of these digital library projects are very similar. Three groups of software program modules are apparent: a group for the user interface or the front-end; an interface group in the middle acting as a “mediator”[15], “facilitator”[3], or “coordinator”[11]; and the information processing environment in the back-end. The facilitator provides the resources needed for communicating requests/results to/from the back-end/front-end. The back-end provides an interface with heritage software used to search and retrieve information from distributed information sources or “collections.” SAIRE’s architectural definition and goals shown in Figure 1, are based on a similar architectural framework. SAIRE employs innovative intelligent software agent technology to address several issues encountered accessing and delivering information to the public from distributed sources over the Internet. In the context of a multi-agent system architecture, we are able to address several research issues such as handling words with multiple meanings (polysemy), user modeling, collaboration among agents, and communication between people and agents. Since each agent in SAIRE is a self-contained intelligent program, it can be readily adapted to support information access and retrieval from new domains — making scalability supporting a broader user community one of its strongest attributes. In the remainder of this paper we will describe SAIRE’s multi-agent architecture in the framework of the three agent groups: the User Interface Agents (UIA), a Coordinator agent, and the Domain Specialist agents. We will also explain the three levels of protocols employed by the agents to support inter-agent communications and agent-to-human user interactions. The three protocol levels are: 1) SAIRE architecture-specific protocols that include an agent communication language (a modified version of the protocol originally described in Steven Laufmann’s coarsegrained agents’ communication protocol[7]), a natural language

Objectives • Transparent access to heterogeneous information sources for novice and expert users over the Internet • Interactions through typed or spoken natural language, incomplete queries, or graphical menus • Employs intelligent software programs that can learn and support changing user preferences

Distributed Information Sources Over the Internet

Domain Specialist Agents

SAIRE

Communication Activities

• Electronic Mail, News

I

Directory Services

N T E

Coordinator User Interface Agents

Knowledge Archiving and Retrieval Services

R N

User

Data Storage Access Services

E

• Biographical Data on Members • Location Data on Data Generation and Processing Centers • Research Centers and Ongoing Research Activities • Conferences, Publications, etc. • Science Data Processing and Retrieval • Archived Knowledge on Instruments and Experiments

• Meta-Data Products • Browse Data Products • Available Data Products

T Information Processing Services LEGEND:

• Quicklook Products (Near Real-Time) • Collaborative Computing

Intelligent Software Agents

LM SAIRE/003-001R 6/25/96

Figure 1. Architectural Overview of SAIRE A Scalable Agent-based Information Retrieval Engine interface, and strategies for coordinating interactions among the agents; 2) domain-specific protocols used by Domain Specialist agents to interface with information sources; and 3) communication network protocols for supporting communications between agents residing at different processes or nodes on the Internet. We will explain the approaches used in SAIRE and identify how technological issues addressed in SAIRE will complement and not duplicate the capabilities of other intelligent information retrieval systems.

Additional capabilities planned for SAIRE agents include an ability to learn about changes in information sources and support individual user-centered preferences through user modeling. The agents communicate across processes with the aid of Transmission Control Protocol/Internet Protocol (TCP/IP) sockets through the SAIRE Socket Interface1 (SSI), a persistent socket inter-process communication mechanism developed to meet the special needs of SAIRE’s high volume of asynchronous interprocess communications.

2. Rationale for Using Agent-Based Technology

3. Integrated Innovative Technologies to Support Effective Information Retrieval

Goal-oriented, autonomous intelligent software modules are being developed to enable SAIRE to achieve its intelligent information retrieval objectives. With these capabilities, a human user or an agent can delegate tasks to another agent with minimal supervision/intervention. Therefore, multi-agent system technology offers a number of capabilities, including the following: a. Agents can be independent and have non-deterministic behaviors. Each agent is a complete expert system with its own partitioned and private knowledge-base. b. Agents can migrate to another location to accomplish a particular task and reduce system processing loads. c. Agents can be persistent and may undertake tasks covering a lengthy period. d. Agents may communicate with other agents in a robust agent communication language, while a user interface agent is able to interact with human users in a domainspecific natural language. e. Agents are able to spawn or clone themselves to handle a large task, or multiple tasks that must be executed in parallel.

Eight distinct innovative methodologies are being integrated to support intelligent agent behavior in SAIRE’s multi-agent architecture. These techniques/concepts are presented in Figure 2. In keeping with the focus of this paper, we will provide detailed discussions for only those concepts relating directly to our three technical issues: multi-agent architecture, collaboration between agents, and communication between people and agents.

1

SAIRE’s architecture requires message buffering, and once inter-process links are established, they must be kept to support frequent agent communication between processes. We experimented with the TCP/IP software Data Transfer Mechanism (DTM) from the National Center for Super Computing Applications (NCSA) but abandoned it because DTM demands that inter-process links be created on as-needed basis, and destroyed after the rendezvous between the communicating processes has been achieved.

Conceptual Search Techniques

Multi-Modal Human-Computer Interaction Techniques

Agent Communication Language

User Modeling Strategies

SAIRE

Distributed Production System With Partioned Knowledge Base

Multi-Agent Coordination Techniques

Agent Migration Strategies

Schema-Based Reasoning LM SAIRE/003-002R 6/25/96

Figure 2. Innovative Methodologies Applied to SAIRE 3.1 Multi-Modal Human-Computer Interaction Mechanisms As previously stated, to promote ease of use, an information retrieval system should provide users with a multi-modal interface. Users are then able to submit requests in a preferred input format (e.g., natural language, graphical menus, commands, etc.). Remote access to SAIRE is provided through Mosaic or NetScape World Wide Web (WWW) browsers with the aid of Java applets or the Common Gateway Interface (CGI) as illustrated in Figure 3. Presently, SAIRE accepts user queries in either a domain-specific natural language text or graphical menu. Plans are under way to support a speaker-independent, continuous speech, spoken language interface. 3.1.1 SAIRE’s Natural Language System. SAIRE’s Natural Language Parser (nl-parser) takes a natural language statement as input and works to output a frame containing actions and important concepts embedded in the natural language input. User input is usually in the form of sentences constructed with one or more phrases. These phrases in the input give us a way to analyze the sentences and relate words for a more accurate interpretation. The nl-parser incorporates four elements: a dynamic dictionary, grammar, pre-processing function, and a chart parser. The dictionary and grammar are specific to the domain/domains in which SAIRE is working. The main dictionary contains words with semantic meanings related to SAIRE’s specific domains. These words can have only one meaning in the context of the current domain. A second dictionary, the user-dictionary, contains words that might have multiple meanings in the domain, as well as new words defined by users. These newly defined words are a result of SAIRE’s ability to learn new words. SAIRE interacts with the user to define unknown words as well as to clarify words with multiple meanings. As user modeling progresses, each user will develop their own dictionary containing their preferred domain meanings. Overall, this allows SAIRE’s dictionaries to grow and not hinder other functions when a new term is presented in the domain.

3.1.2 Implementation of the Natural Language Parser in SAIRE. The nl-parser works by dividing input into phrases. A phrase is defined as the set of words located between key separator words.. Separator words are connecting words like and, a, the, about, as well as certain forms of punctuation (e.g., commas and colons). The nl-parser scans input strings for separator words and while scanning, creates a list of phrases in the input. Each phrase is then parsed. Words are looked up in the dictionaries, and relations between words in the phrase are made when appropriate. If an unknown word exists and a definition cannot be derived for the word from the phrase, user clarification is needed to define the word. After a phrase has been parsed successfully it is translated into the Agent Communication Language (ACL) message frame being constructed for the request. After all phraseshave been successfully parsed and a complete ACL message exists, the ACL message is then processed and routed through the system.

3.2 Individual User Modeling Strategies The user modeling strategy employed in SAIRE is based on the concept of stereotypes employed in ARCHON[17], and PROTUM[13]. Modeling users by means of stereotypes is based on the premise that given a specific application domain and known information about users, it is possible to generate stereotypes of user groups. Each user specifies a group preference at log-in. The user groups are science, general science, or non-science. Individual users can then be modeled by customizing an instance of their profile from the general class. Beliefs regarding each user can be modified by reasoning nonmonotonically with new information inferred from SAIRE’s operational environment. This supports dynamic adaptation to user preferences. User preferences are stored in individual user dictionaries and retrieved as needed.

Total number of documents with keyword

3.3 Conceptual Search Techniques Intelligent information retrieval systems should be capable of understanding user intentions and should use this knowledge in the information retrieval process. To support an intelligent information retrieval capability, advanced navigational techniques such as relevance ranking, natural language queries, and concept searching[4,9] are required in evolving information retrieval systems. Conventional information retrieval systems provide results based strictly on specific keywords matching only the data values in the database (i.e., lexical matching). The information returned must match what Papazoglu[8] calls the “extensional portion” of the database. If the conventional database system does not contain a specific keyword contained in the query, it cannot deduce the intention of the user and transform the query into an “intentional query.” For example, a query such as “Tell me about Vegetation of Colorado” submitted to the U.S. GCMD may return a null result because information about Colorado is stored by specifying the latitude/longitude coordinates. If the information retrieval system knows that it can get the latitude/longitude information from other sources, it should be able to modify the request and retrieve the information according to the user’s intention. Intelligent information retrieval systems should have the means to model user intentions and address problems of polysemy in word usage. Conceptual searching is typically implemented with semantics network, relevance ranking, or similarity-based classification techniques. The current build of SAIRE employs semantic network and clarification dialogs. Conceptual search allows SAIRE to return more accurate information for a user’s requests. SAIRE’s conceptual search mechanism allows it to narrow general requests to specific domain topics in which a more detailed query can be made, and words with multiple meanings can be clarified. 3.3.1 Implementation of Conceptual Search Mechanism in SAIRE. SAIRE’s conceptual search mechanism contains two major levels. The first level is the topic-term level and the second level is the keyword-term level. The terms of the query in the conceptual search mechanism are keywords parsed from the user’s request. The first step is to determine what topic in the domain the user’s request belongs to. This is done by looking up keywords parsed from the user’s request in a topic-term matrix. This matrix contains all known terms on one axis and all major topics in the domain on the other axis. Each keyword can be associated with one or more topics. The topics for each word are kept track of and when a complete list of topics for all of the user’s keywords has been made, a calculation is performed to determine the topic best matching the keywords. If no decision can be made, the user is asked to clarify which topic they want. SAIRE’s current operational prototype covers 13 topics in the Earth and space science disciplines, 721 terms, and more than 2300 words. After a topic has been chosen, we proceed to the next level. Here additional words are added to the user’s query. These additional words are chosen based on their relevance to the keywords from the user’s original request and the topic chosen in level 1. Relevance is determined in the following fashion: Each keyword in each specific topic has a list of relevant terms. These terms are weighted according to their relevance to the keyword. Weights are represented by the percentage of retrieved documents containing the keyword and the term, normalized to 5. i.e.,

Number of documents with keyword and term --------------------------------------------------------

X

5

Relevant terms are then chosen by their weight. If a user specifies a more literal search (less conceptual), high and low weighted terms are chosen. When a user specifies a more conceptual search, terms with higher weights are chosen. The new relevant terms are then added to the Agent Communication Language (ACL) message frame. This frame is then passed throughout the SAIRE system and processed by different agents. Database queries are made by agents using the terms and keywords stored in the ACL message. Weight relationships between keywords and terms are updated every time a query is made by the system. Query information is forwarded to the conceptual search mechanism and weights for any relevant terms are re-calculated based upon query results. This action works to keep the semantic network up to date, while also fine tuning itself to produce better search results.

3.4 SAIRE’s Agent Communication Language SAIRE’s ACL is a modified version of one proposed by Steven Laufmann in his paper on coarse-grained agents[6] and embellished with ideas from the draft standard for the Knowledge Query and Manipulation Language (KQML)[2] for an ACL. The message parameters, or fields 1 through 9, define the header for each message, while fields 10 and up define the body of the message. The fields are identified as: 1) Message-ID (a combination of hostname, random number and current time); 2) User-ID (user log-in ID to support electronic messaging); 3) Sender agent; 4) Receiver agent(s) - final destination of this message; 5) Reply-constraint - used to limit when reply is expected (e.g., asap, urgent, whenever); 6) Return-address destination(s) where completed requests may be sent; 7) Language - default is C-Language Integrated Production System (CLIPS); 8) Message-type (e.g., request, response, repeat, cancel, status, etc.); 9) Performative, KQML-ACL, and KQML-like actions performed on a task (e.g., Monitor, Generate, Display, Subscribe, Tell, Recommend, Standby, etc.); and 10) Unlimited number of request frame’s attribute/value pairs defining the message body. Examples of the attributes include: task-name or service (e.g., GCMD); result-action; GCMD keywords; sensorname; discipline; parameters; storage-medium; etc. The following text is an example of an ACL message sent from the UIA-agent to the Coordinator-agent. The messages would be generated internally from a natural language query such as “ Tell me about hurricanes in Florida.” ((msg-id chico-gensym5-10/20/95-08:54:11) (usr-id John) (sender uia-agent) (receiver coordinator-agent) (replyconstraint whenever) (language CLIPS) (msg-type request) (performative parse) (input-string "Tell me about hurricanes in Florida”)). Each of the fields in the message corresponds to the fields of the message defined in the previous paragraph. It is not necessary to provide the values for all the fields.

3.5 Distributed Production System Concepts Figure 3 illustrates the architecture of SAIRE as a multi-agent system in distributed production systems where each CLIPS process consists of an Agent Manager (AM) and one or more specialist agents. The AM and each specialist agent is implemented as a module. This module implementation gives the agents their own partitioned knowledge base. The agents in a CLIPS environment are organized in a hierarchical fashion. The AM is a level above the specialist agents in the hierarchy.

GCMD Agent Manager (AM)

User Modeling Agent Manager

NASA’s GCMD

Results AM Natural Language Agent Manager

Geographic Information System’s AM

SAIRE Coordinator Agent

SAIRE Port Common Gateway Interface

DAACs Agent Manager User Interface Agent Manager

IMS WWW Gateway to DAACs

SAIRE Server HTTP Server

WWW Client (Netscape non-JavaEnabled Browser)

University of Michigan Geog. Name Server

WWW Client (Netscape With Java-Enabled Browser to Support a Dynamic Interface) LM SAIRE/003-004R 6/26/96

LEGEND: Sockets

Figure 3. WWW Client/SAIRE Server Multi-Agent Architecture The AM is allowed to see structures existing in the specialist agents. Due to this hierarchical organization, specialist agents do not have access to the knowledge base or structures owned by the AM. The hierarchical organization is convenient because it allows the AM to naturally manage any specialist agents present in its CLIPS environment.

3.6 Multi-Agent Coordination Techniques SAIRE’s multi-agent architecture, also shown in Figure 3, depicts how a Coordinator agent provides the AM with the system’s services such as other agents’ skill base and location. The location of each AM is the agent’s socket ID. Each rectangular box in the diagram holds an AM in a multi-agent production system (i.e., a CLIPS process). Each AM controls the activities of a group of command-driven agents under its domain. There is no direct communication between the AMs except through the coordinator. 3.6.1 Benefits of Using a Multi-Agent Coordinator Architecture. SAIRE’s multi-agent coordinator architecture offers the following benefits: a. The skills base and requirements of all AMs may be stored at a depository managed by the coordinator, with a backup storage for fault tolerance. b. The coordinator can manage an effective queuing discipline. By keeping a history of the requests sent to the AMs, it can decide when to clone a new AM or migrate a specialist agent to another node. The coordinator also supports SAIRE system monitoring and promotes agent learning. c. The architecture provides a more efficient use of communication bandwidth than that experienced with

direct communications among the agents, where each agent may need to maintain a model of the resources and skills of other agents. 3.6.2 Disadvantages of a Multi-Agent Coordinator Architecture. A multi-agent system architecture that employs a coordinator formalism has a centralized control scheme that can result in a single point-of-failure. This risk can be mitigated in SAIRE by implementing a backup Coordinator agent. 3.6.3 Strategies for Achieving Multi-Agent Coordination in SAIRE. Each AM, as well as the Coordinator agent in SAIRE is always alive. The AM continuously executes a loop, checking for newly arriving tasks, servicing those tasks, and forwarding results to other agents. Tasks are serviced by their delegation to Specialist agents under the AM. Specialist agents remain idle until delegated a task. They then act on the task, send results to their AM, and wait for the next task to work on. Each agent in SAIRE is modeled as a module. To make multiple modules work together in a single CLIPS environment, module focusing becomes a significant issue. To be in focus means that the module is the center of attention (i.e., CLIPS can only see the structures and knowledge base of that module). An agent environment is then implemented in the following manner. The AM is kept in focus most of the time. This allows it to perform its cycling duties of monitoring for incoming tasks and managing current tasks. Due to the hierarchical organization of the CLIPS environment, the AM can effectively monitor and control the delegation of tasks to the Specialist agents. The Specialist agents are usually out of focus. They remain idle until the AM gives them a task to work on. The Specialist agent will wake up by firing a rule when a request is received causing the agent (i.e., module) to come into focus. This allows CLIPS to use

facts asserted into the Specialist agents’ knowledge-base. The request will then be processed. The results are given to the AM by putting the AM in focus, sending the results to the AM, then taking the AM out of focus. If the Specialist agent is finished, focus is returned to the AM. The implementation is effective in creating two independent agents working to achieve a common goal. The AM is continuously cycling to communicate with the outside world, as well as managing its own environment. Simultaneously, specialist agents wait for work to be delegated by their AM.

3.7 Agent Migration Strategies SAIRE employs the following procedures to support agent migration: a. Starts a CLIPS process at the remote node. b. Starts a System Services Agent Manager (SSAM) at the remote node. c. Lets the SSAM start another CLIPS process and creates an AM for the incoming Specialist agent. d. Migrates the specialist agent by providing the AM at the remote node with appropriate information to replicate the old Specialist agent. e. Updates the user interface to inform the user that a Specialist agent has migrated to a named remote node and provides the rationale for the decision Both the SSAM and the new agent group are managed by the coordinator at the home node. The SSAM is required at the remote node to provide remote node’s system services to the new agent group. Upon completion of its activity, the AM will return the Specialist agent’s results to the Results Manager through the Multi-Agent System Coordinator (MASC).

3.8 Schema-Based Reasoning A schema-based reasoner[11] uses three types of knowledge of the problem to support adaptive reasoning: 1) strategic or domainindependent knowledge; 2) contextual-schema to represent domain-dependent definition of the request; and 3) procedureschema to represent the procedures for achieving a specific goal in the domain. Typically, a strategic schema identifies a problem’s attributes such as plans, goal hierarchies, and appropriate functions for achieving the main goal. A contextschema identifies specific features of a problem together with standing orders, strategies for handling exceptions, default and attention focusing information supporting a convergence of the solution, and action selection information listing subgoals to be achieved. A procedure-schema holds the procedural knowledge or rules for achieving each subgoal specified in its corresponding context-schema.

performed with the agent communication language described previously. Also as shown in Figure 3, a user may submit a request in a domain-restricted natural language from a WWW client to the User Interface Agent Manager (UIAM). The UIAM accepts the message and forwards it to the nl-parser agent through the SAIRE Coordinator Agent (SCA). After parsing the sentence and assuming there is no ambiguity in the interpretation, the nlparser generates a request template and submits it to the SCA. The request is forwarded to the GCMD-AM which in turn delegates it to the GCMD-client-agent or to the DAACs’ AM as appropriate. The GCMD-client-agent then identifies the source of the information to be retrieved, formulates a Structured Query Language (SQL) query from the request template, and submits the query to a client program residing in its environment. The client program then sends the SQL query to the GCMD and retrieves the Directory Interchange Format (DIF) metadata. The gcmd-client agent then translates the DIF, formulates a response to the request, and returns a response to the SCA. The SCA then delivers the results to the Results Agent Manager (RAM). The UIAM is notified of the availability of results. The RAM agent has the tools to deliver the output in different media or formats. A different scenario would be used to retrieve information from the DAACs where an ODL query would replace the SQL query. The SAIRE Build 2 prototype: a. Accepts user queries in a natural language format with 13 topics, 721 terms, and more than 2,300 words in its Earth Science dictionary. b. Provides a mechanism for clarifying ambiguous words. This is achieved through a user-agent dialogue. c. Provides agents collaborating to support Internet information retrieval from multiple databases in response to a single NL query. d. Provides eight agent groups, each in a distributed multi-agent production system[5], that communicate through an agent communication language and collaborate to satisfy user requests. Eleven agents, each a complete expert system, constitute the eight agent groups. e. Provides visual displays on the status of the multiagent network and distributed data systems on the Internet. f. Runs on multiple computer platforms (Sun SPARCstations, 386 and faster PCs), and various Unix operating systems (Sun OS, Solaris 2.3, 2.4; and Linux), and X windows. It is accessible from Javaenabled or standard Netscape browsers from URL: ¬http://saire.ivv.nasa.gov/saire.html

4.1 Performance Metrics for SAIRE

4. SAIRE Architecture And Operations Concept We presented a multi-agent and distributed production systemsbased architecture for SAIRE in Figure 3. Each rectangle in the diagram consists of an agent group with an AM for each technical domain. Each agent group operates in a distributed CLIPS process (i.e., a CLIPS production system). This architecture supports several agents running continuously and concurrently without interfering with the actions of other agents. Concurrence of operation is supported when each agent group executes on distributed platforms. Whenever agents need to exchange information with agents residing in other agent environments, either locally or at remote nodes, they use the SSI communication protocol. Communication between agents is

We are using two information retrieval metrics to measure the effectiveness of SAIRE (i.e., recall and pertinence of retrieved documents). Recall is the number of documents retrieved through lexical matching of keywords from a query to terms in the database. Pertinence is the proportion of documents deemed relevant from a user’s judgment. From ten representative requests submitted, recalls ranging from 8 to 563 were obtained for each request. A user then submitted the same set of requests with the option to use conceptual search mechanism. The user selected and browsed the top ten documents returned for each request and tallied the number of relevant documents. All the ten documents reviewed for three of the queries contained information appropriate to the user’s interest. Nine out of the ten documents examined for four additional queries provide information that are very relevant to the user. Our pertinence

metric has a range of 75% to 100%. With these preliminary results, we believe that SAIRE has the capability to retrieve documents that are pertinent to a user’s interest instead of retrieving hundreds of document titles (as common in traditional search engines) leaving the burden of finding a needle in a hay stack to the user.

5. Related Work A University of California, Berkeley Electronic Environmental Library[16] project is being developed under a joint National Science Foundation (NSF), Advanced Research Projects Agency (ARPA), and NASA Digital Library Initiative (DLI), in collaboration with five other institutions (University of Michigan; University of Illinois at Urbana-Champaign; University of California, Santa Barbara; Stanford University; and Carnegie-Mellon University). The research goals of the project include the development of 1) fully automated indexing and intelligent retrieval techniques; 2) database technology to support electronic library applications; 3) a more effective protocol for client/server information retrieval; 4) resource discovery and distributed search algorithms; and 5) compression and communication techniques for remote sensing. The Global Change Assisted Search for Knowledge (GCASK)[9] is another digital library project under development by an industry team that includes: ConQuest Software, Inc. (provides dictionary-based natural language search engine), ESystems (a system integrator and provider of cross-document viewing technology), WAIS Inc. (provides an Internet publishing protocol), and Genasys II [provides a Geographic Information System (GIS)]. The objectives of GC-ASK include supporting several user groups (K-12 students, researchers, government policy-makers, and the general public), and providing access to existing Federal agencies’ heterogeneous and distributed data and information sources. The University of Michigan Digital Library (UMDL)[1] project employs innovative technologies to address several technical barriers by introducing agents that can reason, plan, and coordinate the activities of other agents. While both GCASK and UMDL support information retrieval from large collections of published articles from government archives, UMDL includes library collections from a few commercial publishers. UMDL employs three classes of agents in its architecture: 1) the UIAs accept user queries, add the user’s profile, and then forward the request to a group of Mediator[6] agents; 2) the Mediator agents accept queries from the UIAs and apply the user’s profile to plan and forward the queries to Collection Interface Agents (CIA). The mediator agent also monitors the progress of each query, and accepts and delivers query results to the UIAs; and 3) the CIAs provide interface functions “communication wrappers” to the search engines used by each information source (collection) to search and retrieve information. The Information Manifold[6] project provides “queryanswering” algorithms and information agents to support a common front-end to structured information sources over the web. The research presents strategies for addressing decisiontheoretic planning issue in information retrieval. It presents algorithms for computing query plans, modeling distributed information sources, gathering, integrating, and presenting them to end users.

6. Contributions to Related Work and Research Issues SAIRE’s research focus lies in developing technologies that promote transparent access to distributed and heterogeneous Earth and Space Science data over the Internet. While a system like UMDL is responsive to the preferences of groups of users, SAIRE responds to the preferences of individual users. These two approaches are complementary because there are situations where it is preferred and less computationally intensive to track groups of users. As explained earlier, SAIRE’s GCMD-agent accepts a request from a coordinator agent, formulates an SQL query, sends the query to the GCMD database, retrieves a result, formulates the result in the form of an ACL, and returns the result to the coordinator agent. This methodology, also known as a transduction technique[3], is very effective when access to the structures and code for the information source is unavailable. We are using the same approach to retrieve information from the DAACs and the Geographic Name Server (GNS). In situations when access to the code for the information management system is available, it will be more efficient to use a “wrapper”[3] technique to interface with the information source. This is technique is used in the UMDL project. Rather than simply retrieve titles of articles and display them for the users as done in GC-ASK, SAIRE aims to retrieve the information in a form that is more useful to the user. SAIRE provides such information as summaries on the topic of interest, points-of-contact, and locations of additional information. The SAIRE project team is actively exploring ways to address some critical research issues, such as: 1) uncertainty regarding availability of services on the Internet (i.e., the communication network and databases; 2) dealing with stale or missing information from the database; 3) meeting users’ expectations in the face of unreliable Internet resources; and 4) developing techniques that will fully support the user’s intentions.

7. Conclusion We have presented SAIRE, a multi-agent system supporting intelligent information retrieval from NASA and NOAA’s Earth and Space Science information repositories over the Internet. Presently, the information is available to scientists with some knowledge of information retrieval protocols. SAIRE has been developed to support both expert and novice users without any requirement for knowing the retrieval protocols. With the aid of persistent socket connections between Java applets at a WWW client, and persistent socket connections to intelligent software agents at a server, we are able to provide intelligent access to remote and distributed information from multiple computer platforms. Also, we have demonstrated the use of a natural language query, conceptual search, and collaboration among multiple agents to support intelligent information access and retrieval.

8. Acknowledgments The research reported in this paper is being funded by NASA’s Office of Aeronautics under the Information Infrastructure Technology and Applications (IITA) component of the High Performance Computing and Communications (HPCC) Programs. (URL ¬http://saire.ivv.nasa.gov/saire.html) We thank Dr. Nand Lal, and Dr. Susan Hoban for their support. They are the Manager and Assistant Manager, respectively, for NASA/CAN, DLT Projects at Goddard Space Flight Center (GSFC) in Greenbelt, Maryland.

9. References [1] Atkins, D. E., Birmingham, W. P., Durfee, E. H., Mullen, T., Wellman, M. P., et al, “Toward Inquiry-Based Education Through Interacting Software Agents”. IEEE Computer, pp. 6976, May, 1996. [2] Finin, T., Weber, J., Wiederhold, G., Genesereth, M., Fritzson, R., McKay, D., McGuire, J., Pelavin, R., Shapiro, S., and Beck, C. “Draft Specification of the KQML Agent Communication Language”. [email protected]. June 15, 1993. [3] Genesereth, M. R., and Ketchpel, S. P., “Software Agents”. Communications of the ACM. pp. 48-53, July 1994. [4] Hsinchun Chen. “A Concept Space Approach to Addressing the Vocabulary Problem in Information Retrieval”. University of Arizona,1995. [5] Ishida Toru. “Parallel, Distributed and Multi-Agent Production Systems”. Proceedings of the First International Conference on Multi-Agent Systems.(ICMAS-95). pp. 416-422. June 12-14, 1995. San Francisco, California. AAAI Press. [6] Levy, A. L., Rajaraman, A., and Ordille, J. J., “QueryAnswering Algorithms for Information Agents”, Proceedings of the Thirteenth National Conference on Artificial Intelligence. , Portland, Oregon, pp. 40-47, August 2-8, 1996. [7] Papazoglou, M., Laufmann, S., and Sellis, T. K. “An Organizational Framework for Cooperating Intelligent Information Systems”. International Journal of Intelligent and Collaborative Information Systems. Vol. 1, No. 1, pp. 169-202, 1992. [8] Papazoglou, M. P. “ Unraveling the Semantic of Conceptual Schemas”. Communications of the ACM. Vol. 38, No. 9, pp. 8094, September, 1995. [9] Pender, Cheri. “About Global Change Assisted Search for Knowledge (GC-ASK)”. ConQuest Software Inc., Columbia, MD 21044, [email protected].

[10] Swisher, Kara. “For Mathew Koll, Building a Software Business is a Find Art”. Washington Post, Business Section. page 11, July 31, 1995. [11] Turner, Roy M. Adaptive Reasoning for Real-World Problems: A Schema-Based Approach. Lawrence Erlbaum Associates, Publishers. 1994. [12] Truszkowski, W., Odubiyi, J., and Ruberton, E. “An Agent Model in a Multi-agent System Architecture for Automating Distributed Systems”. (ICMAS-95). page 463. June 12-14, 1995. San Francisco, California. AAAI Press. [13] Vergara Harald. “PROTUM: A Prolog-based Tool for User Modeling”. University of Konstanz, D-78434, Konstanz, Germany, 1994. [14] Weld, D. S., Marks, J., Bobrow, D. G., “The Role of Intelligent Systems in the National Information Infrastructure”. AI Magazine, pp. 45-64, Fall 1995. [15] Wiederhold, Gio. “Mediators in the Architectures of Future Information Systems”. IEEE Computer. pp. 38-49, February, 1992. [16] Wilensky, Robert. “Toward Work-Centered Digital Information Services”. IEEE Computer, pp. 37-44, May, 1996. [17] Wittig, T. (editor). “ARCHON: An Architecture for Multiagent Systems”. Ellis Horwood, Publisher. 1992.

saire - a scalable agent-based information retrieval engine - CiteSeerX

saire - a scalable agent-based information retrieval engine - CiteSeerX

Suggest Documents

A Scalable History-based Policy Engine - CiteSeerX

efficient and scalable information retrieval from ... - MegaSecurity

Search Engine Retrieval of Changing Information - WWW2007

Information Retrieval and Search Engine

Information Retrieval Research - CiteSeerX

Visual Information Retrieval - CiteSeerX

Multimedia Information Retrieval - CiteSeerX

a multidimensional information retrieval engine for structured data and ...

A Structured Ontology and Information Retrieval Engine for Email ...

A Meta Search Engine for User Adaptive Information Retrieval

Augmenting Data Retrieval with Information Retrieval ... - CiteSeerX

A Scalable Framework for Information Visualization - CiteSeerX

A Model for Adaptive Information Retrieval - CiteSeerX

Incidental Learning During Information Retrieval: A ... - CiteSeerX

NATURAL LANGUAGE INFORMATION RETRIEVAL ... - CiteSeerX

Cross-Language Information Retrieval - CiteSeerX

Information Retrieval Support Systems - CiteSeerX

NATURAL LANGUAGE INFORMATION RETRIEVAL ... - CiteSeerX

Collaborative Information Retrieval Environment - CiteSeerX

Dynamical Information Retrieval Modelling - CiteSeerX

A Flexible and Scalable Rendering Engine for

VERGE: A Video Interactive Retrieval Engine

14 Parallel Information Retrieval - Information Retrieval Group

Information Retrieval: An Empirical Study on Search Engine - ipedr