What is the problem - Semantic Scholar

Process-Aware Knowledge Retrieval Submitted to HICSS 35, The Digital Economy: Internet & Workflow Automation: Technical & Managerial Issues

Kurt D. Fenstermacher Management Information Systems Eller College of Business and Public Administration University of Arizona Voice: (520) 621-4016 Fax: (520) 621-2433 Email: [email protected]

HICSS ’01 — Internet & Workflow Automation: Technical & Managerial Issues

Page 2 of 23

Process-Aware Knowledge Retrieval ABSTRACT Even with modern information retrieval systems and advanced digital libraries, many people cannot find the information they need when they need it. There are many reasons why information access is difficult: users cannot track all the available information sources, few computer users are skilled in composing queries and “everyday” applications (e.g., word processors and spreadsheets) are not integrated with information access tools, are just a few of the problems. Too often, users who do search for needed information pose short, simple queries that retrieve hundreds or thousands of documents. Many information access strategies fail because they are not proactive, context-sensitive or task-specific. Process-aware retrieval, however, offers a framework that shifts the burden of information access from the user to the computer. By explicitly representing processes, and information about processes, process-aware retrieval enables the computer to make highly targeted suggestions regarding knowledge and information sources, while aiding in the completion of process steps. This papers reviews current strategies for information access, evaluating strengths and weaknesses, and then defines a process-aware framework that builds on the strengths to address the weaknesses.

Introduction If you’ve searched the Web, browsed a set of directories or updated a previously written report, you already know the frustration of accessing the information you need when you need it. Even experts in information access must carefully frame their search queries and identify promising information sources. For knowledge workers, who toil through their computers, the choice is often information feast or famine. The ubiquity of computers and their frequent use in knowledge-intensive tasks is an opportunity for changing the paradigm of information access. Instead of forcing users to monitor the availability of information sources, learn complex query languages and remember which sources and queries were helpful in the past, computers should do the job. If your workstation knew about your activities and the resources you needed, your computer could fetch the information you wanted, perhaps even before you knew you wanted it. This is the potential of process-aware knowledge retrieval — by representing processes and annotating them with information about knowledge sources, computers could better aid users by proactively suggesting context-sensitive, task-specific knowledge. Process-aware retrieval offers great promise in a range of applications from high-school students researching term papers to globally distributed workgroups tackling demanding business problems. This paper argues for the promise of process-aware retrieval, and discusses the construction and application of a process-aware framework.


Page 3 of 23

The Frustration of Information Access: An Example from the Web Suppose that you’re organizing a birthday party for a friend, and need to make dinner reservations at a local restaurant. You begin by searching restaurant review Web sites for your area, and after reading through a half-dozen reviews, select one of the restaurants — perhaps Crazy Eddie’s Steak and Sushi Bar. The review page does not list Crazy Eddie’s phone number, so you copy the name and paste it into a phone directory search site for your area. You call the listed number, and make the reservation. Next, you’ll need a map showing Crazy Eddie’s location, and driving directions from your workplace. From the phone directory search results, you copy Crazy Eddie’s address, and enter it into an online map service. With the map in hand, you’re ready to announce the party. Although making a reservation is not difficult, it remains beyond the ken of your computer. Suppose instead of a series of copy-and-paste operations, your computer would recognize when you were making a restaurant reservation and complete the process for you. After entering a restaurant search in a given area, your workstation could suggest alternative review sites with coverage of the same area. After you select a restaurant, your computer could then look up the phone number if needed, dial the restaurant for you, draw a map and print driving directions. If your computer were to complete the task for you, there are two potentially valuable benefits. First, more of the burden of this mundane task would shift from you to the computer. In a more complex process, there could be additional benefits as well — for example, you might not know all the steps in the process, or the variation of a process to follow in an exceptional case. A second benefit of having the computer complete the task is the opportunity for knowledge exchange. If relevant knowledge sources, such as restaurant review sites, were centrally indexed, then new opportunities arise to share knowledge. For example, if you wished to make a reservation at an out-of-town restaurant, you might begin with an international restaurant review site, such as the Zagat Survey’s Zagat.com (Zagat Survey, LLC, 2001). But many metropolitan areas have local online guides that often include restaurant reviews as well — how would you locate such services? A system that had seen someone else (perhaps a resident of your destination city) use a local review site could propose that site as well, in addition to your more global search. Given the ease of reserving a restaurant table, why must people still perform so much of the task themselves?


Page 4 of 23

Making restaurant reservations might seem a simple task, one that you have probably performed hundreds of times, but the familiarity of the task hides the extensive knowledge needed to do it well. In the above example, you must already know that restaurant review Web sites exist, and know how to find them. You must know how to search the review sites and how to parse the results returned. If you need information missing from a result (e.g., the restaurant’s phone number), you need to know how to use what you do know (the restaurant name, in this case) to complete the missing information (in this example, by entering the name in an online phone directory). For map information, you again need to know that online map services exist, what information they need and how to interpret their results. In the abstract, you must know the process (restaurant reservation), the tasks (restaurant selection, make reservation and generate map/directions) and knowledge sources relevant to each of the tasks and the overall process. Making reservations at a restaurant is a common task, but clearly not critical to the successful operation of most

organizations.

However,

restaurant

reservations are a simple example of a much broader class of computer-based processes that are decomposable into tasks and relevant knowledge sources. For example, a student preparing a term paper must research potential topics, identify sources, review each source, write a rough draft and revise the final paper. A salesperson on first-

Figure 1 — Associating relevant information sources with tasks

time sales call must research the customer’s needs, introduce her organization and identify products and services of interest to the new prospect. Process-aware retrieval is a framework that associates relevant knowledge sources with user activities (as shown in Figure 1), enabling process-aware system to make context-sensitive, task-specific suggestions to users. This paper proceeds by discussing other attempts to address the broader problem of delivering the right information to the right people at the right time — what might be called just-in-time information, by analogy to just-in-time manufacturing (Cheng & Podolsky, 1993). Several fields have addressed the problems of information/knowledge delivery, and although each has made important contribution, challenges remain. After reviewing the work of related fields, the importance of process is addressed, and a process-aware architecture is described. This paper will argue


Page 5 of 23

for the process proxy principle: information about the process and activity a user is performing offers the contextual information needed to more accurately retrieve relevant knowledge, as it is needed by the user. To support processaware retrieval requires an architecture that builds on existing process representations (e.g., (Workflow Management Coalition, 2001)) by including knowledge source annotations, as well as hooks for process recognition and recording.

Managing Data, Information and Knowledge Despite advances in artificial intelligence, computers are still poor thinkers; people continue to make the complex judgments that are commonplace in the knowledge economy. To make such decisions well, however, people need data, information and knowledge. And there’s the rub — as knowledge workers create and store more information in computers, it has become harder to extract just the needed information, just as it’s needed. However, knowledge-intensive processes are often complex undertakings, and require a broad range of skills. Thus, the proposed process-based architecture is a hybrid of work in several fields, although it draws primarily on work in information retrieval, workflow and work on planning (Tate, Hendler, & Drummond, 1990) from artificial intelligence. Information retrieval research (Korfhage, 1997) has primarily focused on the question: given a corpus of text documents, how can one build a system to provide fast access to relevant documents? Much of information retrieval research has revolved around implementations of the vector space model (Salton & Yang, 1975), which represents documents as high-dimensional vectors, enabling systems to be built based on linear algebraic operations. The vector space model provides quick retrieval and provides for many complex query forms — including conjunctive, disjunctive, proximity and negative queries. For example, most Web search engines rely on the vector space model to support queries over very large collections (over 1,346,966,000 indexed pages claimed by Google (Google, Inc., 2001)). Information retrieval (IR) techniques will often work when other techniques will not, because they are predicated on very few assumptions about the indexed documents. To apply IR techniques, the system assumes only a parser that will read the documents and output tokens, which will then be indexed. Wide applicability is a doubleedged sword, however, as the simplicity of indexing requires more sophisticated retrieval. In particular, to use most


Page 6 of 23

IR systems effectively, users most couch their requests in complex, multi-word queries often incorporating lengthy Boolean expressions. In past decades, when access to online information sources was through reference librarians and other information specialists, complex query formulation was not a significant barrier. However, with today’s widespread access to online information, few people have the skills needed for effective access. In short, information retrieval systems are a critical component of information access, but new interfaces are needed to shift the burden from the user back to the information system. Short queries are the norm today (Kirsch, 1998), which in turn lead to ambiguous queries. Is “bank” a financial institution or the side of a river? Without additional search terms — either bank AND finance AND deposits or bank AND river — IR systems cannot distinguish between the two interpretations. The term effective query will be used to distinguish between ambiguous queries and those that can identify relevant documents, either due to the number of terms or their combination. The parsimonious assumptions of information retrieval ensure broad applicability (at the cost of complex queries), but they make it difficult to incorporate additional information when it is available. Information about document structure can support fielded search, which enables users to link query terms to specific record fields, improving the accuracy of search results. The use of fielded search by Web search engines, which parse primarily HTML documents, is a common example of this. However, most IR systems do not provide the incorporation of rich meta-information at indexing or retrieval time. In this research then, the challenge is to use the power of information retrieval to support quick access to information, while easing the burden of posing effective queries and enabling the incorporation of rich meta-information. When a puzzled patron arrives at a library reference desk and asks for information on a topic, the librarian’s first question will often be, “Why do you want to know?” The librarian needs to establish the context of the information need to narrow the search. Often the key to this context is a combination of what the patron is doing (e.g., researching potential topics for a high school term paper) and what role the patron is playing (e.g., the author of a school term paper). This is the previously defined process proxy principle in action. The importance of process is a key aspect of computing, and the notions of process and role are central in the field of workflow management systems. A workflow management system “completely defines, manages and executes ‘workflows’ through the execution of software whose order of execution is driven by a computer representation of the workflow logic”


Page 7 of 23

(Hollingsworth, 1995). In this research, the emphasis on process and its explicit representation is shared with the field of workflow systems. Traditionally, the emphasis has been on control and coordination (Georgakopoulas, Hornick, & Sheth, 1995) of tasks, rather than leveraging processes as context indicators to support better information access. Workflow systems are an excellent base for process-oriented knowledge retrieval because they are well studied and increasingly common in organizations. The idea that process can be a proxy for context, and hence a lever for information access, has been explored in other work. This process proxy principle has several implications; different researchers have emphasized three principles in designing process-aware systems: •=

Context-aware — In the information access literature, those who build process-aware systems often label them context-aware (or context-specific) to denote that information access depends on what the user is doing and why. Such process representations range from simply the words that a user is currently reading or writing (Budzik, Hammond K., & Birnbaum, 2001) to complex knowledge representations drawn from artificial intelligence research (Abecker, Bernardi, Maus, & Wenzel, 2000; Schnurr & Staab, 2000). Generally, the label context-aware describes query formulation, and focuses on the use of the process context to generate effective queries.

•=

Task-specific — Although incorporating a notion of context is an important step, users often participate in many contexts simultaneously. Thus, in addition to generating context-specific queries, information systems must respect that users also want information when it is appropriate. In process-aware systems, information is deemed appropriate when it bears on the task at hand for the user. For example, as a Formula 1 racing fan, I would appreciate context-sensitive searches that exclude results from NASCAR, CART and the many other racing leagues operating worldwide. But I also I do not wish to be interrupted while writing this paper to be notified that the BMW/Williams racing team and Michelin have settled on the rubber compound to be used for the racecars’ tires during Sunday’s Canadian Grand Prix.

•=

Proactive — With the proliferation of online information systems, users are confronted with two new problems. First, they must know what information sources exist. Second, users must know what content is available. Past research has explored information filtering, also known as current awareness or selective dissemination of information (SDI) systems (Belkin & Croft, 1992). Filtering systems select documents of likely interest from a stream of documents. However, there has been little work done on identifying promising streams themselves.


Page 8 of 23

Highly targeted search engines that cover very narrow topics might be appropriate for users, but they must know that they exist, what content they index and how to query them. By associating knowledge sources with tasks, process-aware systems can offer a hybrid push/pull system by recommending sources and queries. Although there have been some attempts to use process representation as an aid to knowledge retrieval (e.g., in the Kabiria system (Celentano, Fugini, & Pozzi, 1995; Celentano, Fugini, & Pozzi, 1992)), few workflow systems have attempted to reuse the process information to support knowledge indexing and retrieval. In an overview of document management systems, (Zantout & Marir, 1999) argue for process representations as the key to regulating the flow of information to users — both to ensure that they are not overloaded, nor that they limited by their own ignorance of relevant sources. They argue for the use of process representation in building intelligent information agents, focusing on opportunities for proactive retrieval, but without describing the representation or its implementation. Abecker, Bernardi et al. (2000) describe a more developed approach that supports the three process-aware principles discussed above: context-aware queries, task-specific information and proactive delivery. In particular, they build representations for what they term knowledge-intensive tasks, or KITs. KITs are workflow activities annotated with information about knowledge to begin the activity, as well as processing rules and where task-generated knowledge would be useful. Although there are many similarities to the approach described in this paper, building the KIT representations is a substantial knowledge engineering challenge. Moreover, the authors do not discuss techniques for learning KIT annotations from user behavior — an important source of meta-information. Staab & Schnurr (1999) present a very similar approach to Abecker, Bernardi et al. (2000), but, as discussed by Abecker, Staab and Schnurr emphasize ontology-based retrieval. Although workflow research is an obvious home for process-based research, artificial intelligence researchers have long studied planning (Tate et al., 1990) and such research has connections to the current work. Planning research is largely driven by work on autonomous agents and tries to answer the question, given my goals and the current situation, what should I do next? Many of the ideas present in the workflow literature are seen in the planning literature as well. Planning researchers have developed semantically rich process representations that planning systems can reason about. These computer-reasonable (not simply computer-readable) representations enable planners to operate dynamically, assembling plans to meet changing goal criteria and adapting to a changing world state. Although some work has been done at the intersection of workflow and artificial intelligence (Myers & Berry, 1999a; Myers & Berry, 1999b), more investigation is needed. In related work (Berry & Drabble, 1999), planning


Page 9 of 23

techniques are employed to enable dynamic process construction for a workflow management system. For processaware retrieval, the key lessons from planning are guidelines for developing representations and the potential for generating processes dynamically at run-time.

A process-aware architecture If the potential of process-aware retrieval is clear, its implementation is not. To begin, one can ask how much process is needed to garner the benefits of process-aware retrieval? More importantly, what functions are needed to support process-aware retrieval, and what are the trade-offs in design, implementation and deployment costs? To address these questions, three levels of process support will be defined and discussed: pipelining, workflow management and process-oriented knowledge management. Pipelining is the simplest form of process orientation, in which activities’ input-output mappings are specified and a total ordering of activities is given. (The term pipelining stems from the concept of pipes in the Unix operating system (Stevens, 1992); users create often perform complex actions by piping the standard output from one process to another’s standard input.) Workflow management (with its usual definition) is an extension of pipelining that supports richer process representations and rule-based actions. Process-oriented knowledge management augments workflow’s process representation with knowledge. Each of these approaches is now discussed in further detail. In the opening restaurant example, one of the arguments made for the process-based approach is to spare users the burden of carrying information from one source (say, Zagat’s review pages) to the next (Qwest’s online phone directory). To do this, a pipelining system is sufficient. We define a pipeline system, P, to be a totally ordered set of pipes,

P = {p1 ,K, pn } . A pipe, p, is a producer, consumer pair. Producers consist of an information source,

I, and a set of extraction rules,

{r1 ,K, rn }. Each extraction rule includes a label (l) and an expression (e) to extract

the labeled information from the source I’s output. The dual of a producer is a consumer, which includes an information source, I, and an input specification, given as a set of labels, el. Thus, a pipe is a set of two information sources, IProducer and IConsumer, a set of extraction rules,

R = {(el1 , er1 ),L, (eln , ern )} and a set of input labels,

Linput = {(l1 ,L, ilm }, where {il1 ,L, ilm }∩ {el1 ,L, eln } ≠ φ .


Page 10 of 23

The pipeline designer (who might be the user as well) must specify the information produced and consumed by each source and how that information is to be extracted and submitted. Describing the extraction of unstructured text is difficult. Two common solutions are the use of pattern matching (commonly using regular expressions) or a document object model (DOM), although the latter solution is more common with more highly structured documents. Some systems, such as Ariadne (Knoblock & Minton, 1998), can aid users by inductively learning to build wrappers (which can then be joined to create pipes) from a set of Web pages, enabling users, rather than developers, to create new pipes and pipelines. The designer can then link successive sources with a pipe,1 ensuring that the needed information will be pulled from one source and handed to the next in linked set of pipes, or pipeline. Pipelines could then be assembled from other pipelines, encouraging reusability. Indeed, a library of pipelines for common tasks could be defined and users could assemble pipelines as needed. For example, a pipeline that selects a restaurant and retrieves the phone number, shown in Figure 2, could be defined as follows. First, the user must

Query user

Zagat

Find

for restaurant

restaurant

initiate the pipeline by having selected a particular Zagat-

Zagat restaurant

QwestDex

Lookup

(Phone,

reviewed restaurant. Thus, the first pipe is

Figure 2 — Restaurant phone number pipeline

the user as a producer who specifies the attributes consumed by the Zagat Web site. The first pipe’s producer is the information source, IUser, and a list of labels and extraction rules. In this case, the extraction rules are identical for all labels — ask the user, given as Ask. The labels are the needed attributes:

{( Region, Ask ), ( Neighborhood, Ask ),L, ( Food, Ask ), ( Decor, Ask ), (Service, Ask ), (Cuisine, Ask )}

1

As used here, a pipe is an instance of the Adapter design pattern (Gamma, Helm, Johnson, & Vlissides, 1995), also known as the Wrapper or Mediator pattern, but the term pipe is used to preserve the pipeline analogy.

HICSS ’01 — Internet & Workflow Automation: Technical & Managerial Issues The first pipe’s consumer is the Zagat Web site, which uses the labeled attributes to return a page.

Page 11 of 23

([a-z,A-Z,0-9]+) Listing 1 — Regular expression to extract the restaurant name

For example, a user who had specified Region = Arizona, Neighborhood = Tucson, Food = 27, Décor = 25 would arrive at the page for Janos, /Search/Details.asp?VID=1&PID=1&LID=50&LSID=10294&BRW=21&NDPH=in+Tucson&RID=22773.

The second pipe

treats the Zagat-served page as a producer, and connects to the QwestDex consumer — a Web–based form that accepts a location and business name, and produces a directory listing and phone number. To extract the restaurant name, a regular expression such as that in Listing 1 might be used. The extraction rule for the Name label would include the expression, a notation that it is a regular expression and a binding that states that the value of the rule should bind to the regular expression matcher’s %1 return value. Pipelining will work to connect Zagat’s restaurant review pages to the QwestDex online directory, and to a mapping source as well. However, pipelines have any important shortcoming — they are static. For example, Zagat’s reviews cover not only large U.S. metropolitan areas, but also Toronto, London, Paris and other non-US cities. QwestDex, however, only indexes phone numbers in the United States. A pipeline system will pull the name of user’s chosen London restaurant, but no directory information will be available through QwestDex. Worse yet, there is no conditional rule to suggest another phone directory for non-US restaurants, and no exception handling mechanism. With conditional rules and exception handling, pipelining would offer some dynamism at run-time, but the rules and exceptions would still need to be known at build-time. To offer dynamic process building, pipelines would need to add meta-information about information sources to reason about which sources would be appropriate for a given task, and assemble processes on the fly. Workflow management systems address some of the shortcomings of pipelining, by incorporating conditional rules and exception handlings. Developing a complete model of a workflow system, as done above for pipelining, is beyond the scope of this paper and has been done elsewhere. For more detail, see either (Hollingsworth, 1995) or (Leymann & Roller, 2000). Workflow systems subsume the functionality of pipelining through data mapping. Data mapping, as defined in Section 4.5.1 of (Leymann & Roller, 2000), is a description of “how the data elements of an input container are composed from data elements of the output containers”. The data


Page 12 of 23

descriptions of workflow systems, however, focus on the data connection, rather than on the semantics of the data itself. For example, Listing 2 shows an example data mapping. In addition, workflow systems can describe the input and output parameters of applications. The more expressive language of workflow management systems would enable the system to more gracefully handle the problem with pipelining. For example, a Zagat user could select a London restaurant, which would then be passed to QwestDex,. The lookup at QwestDex would fail. In a workflow system, this shortcoming could be addressed with an exception mechanism that would inform the user of the problem. Or, a conditional rule could be written that would say, “If the location is in the United States, then submit the name to QwestDex. If the location is London, then submit the name to British Telecom”. The problem is that these relationships must be known at build-time and the conditions must be stated again for each activity, which means the process developer must be aware of the necessary mappings. If instead information sources could be described semantically (but still associated with process activities), then the needed associations could be made at run-time while preserving the benefits of workflow management systems. Process-aware knowledge retrieval builds on workflow management systems to support knowledge management more broadly. The key aspects of a process-aware retrieval are 1) information source suggestions keyed to user activity, and 2) context-sensitive information access. To provide these two services, process-aware systems must recognize when a user is engaged in a known process, annotate process activities with relevant knowledge and build and maintain libraries of annotated processes.

Process recognition A process-aware knowledge retrieval engine requires extensive coordination among user applications and the engine itself. The retrieval engine must monitor user actions within applications, looking for both retrieval and indexing opportunities. For retrieval, the engine must monitor user actions, looking for opportunities to suggest relevant knowledge sources or perform follow-on steps in a process. To support such monitoring, applications must expose high-level events that the retrieval engine can process to identify useful suggestions. At a minimum, the retrieval engine must be able to automate user

DATA FROM SOURCE TO ‘Call restaurant’ MAP ‘Name’ TO ‘Restaurant’ MAP ‘Phone’ TO ‘PhoneNum’ Listing 2 — Data mapping example


Page 13 of 23

applications in line with a user-selected process. The system must infer from user actions that the user is engaged in a process, as well as the task being performed. Recognition could be as simple as a user’s selection of a well-known process form a menu of processes. Alternatively, it might be as complex as inferring the user’s goals and belief from seemingly unrelated actions. The latter, known as plan recognition in the artificial intelligence community (Kautz & Allen, 1986), is known to be computationally intractable in the general case.

Information source association One solution to the source problem is to maintain a central catalog of useful sources. Of course, this does not require a process-aware system — libraries have successfully tracked new information sources for centuries. The challenge is in supporting context-specific retrieval of information sources. Process-aware retrieval answers this question by offering a mapping between user actions and relevant knowledge sources. In the opening restaurant example, visiting any restaurant review site and entering a locale suggests that other review sites that cover the same area would be relevant. With a process-aware system, information sources relevant to follow-on steps are deemed valuable as well. Moreover, the ability to pass needed information from one step to the next (e.g., the name of the restaurant from the review site to the phone directory) enhances sharing of knowledge sources. In process-aware retrieval, unlike workflow systems, processes are the means rather than the end. The end is less burdensome access to information. Thus, at each step of the process there is an opportunity to associate relevant information sources. Workflow systems offer the means to describe what data sources are available and how to access them. In addition, workflow systems can answer the question of when to access data, based on the inclusion of a data description within an activity. However, knowing when a source is appropriate is a function the process instance, as well as the overarching process template. In the phone directory example, knowing whether to search for a phone number at QwestDex or British Telecom depends on whether the user is looking in London or Hawaii for a restaurant — an aspect of the process instance, not the process description. Through reification, a designer could simply create different processes — one for United States restaurant reservations, and one for United Kingdom reservations. Creating highly targeted processes scales poorly, however. The conditional rule approach mentioned previously has the drawback that it requires build-time knowledge. The rapid pace of information source creation and destruction argues against a strategy that requires the revision of process definitions with each change.


Page 14 of 23

If instead semantic information were available about each source, then information sources could be associated with activities on the fly. Rather than creating complex conditional expressions to handle myriad cases (or creating multiple static processes), process-level goals would be used to construct processes. For example, a user who is interested in selecting a restaurant in a U. S. location would need a U. S.-specific phone directory. A processaware system that had labeled QwestDex as just such a directory could then assemble a process that fed the results of a restaurant selection to QwestDex. In short, the conditional execution that workflow systems embed in the process description itself would be abstracted away from the process description to a meta-process level. Work in artificial intelligence planning has always treated actions in this way, beginning with early problem solvers (Newell & Simon, 1964). Moreover, planning researchers have studied abstraction as a thing in itself, beginning with Sacerdoti’s Network of Abstraction Hierarchies (NOAH) system (Sacerdoti, 1974). Later systems, such as HEARSAY II (Fennell & Lesser, 1975) and the model proposed by Hayes-Roth and Hayes-Roth (1979), have supported reasoning at multiple levels of abstraction simultaneously. Adapting these techniques to workflow applications is necessary for processing knowledge flows, and could be helpful for other kinds of workflow as well.

Process library construction Process-aware knowledge retrieval requires more complex process representation than that used in traditional workflow systems. The necessity of maintaining hierarchies of abstraction and the need to reason about which activities would be useful in service of larger goals warrants a more complex, knowledge-based data structure. Many alternatives are available, but a case-based approach has both intuitive appeal (Kolodner, 1993) and a proven track record (Watson, 1997). Case-based reasoning (CBR) can be characterized as, “Reasoning is remembering”. The central tenet is that people solve problems by recalling solutions to past similar problems, and then adapting the past solution to meet the constraints of the current problem. The core of any CBR system is the case base, where cases are indexed by the successes they achieve and the failures they avoid (Hammond, 1989). Many organizations already have processes encoded in computer systems, within their workflow systems. Thus, reusing those processes would reduce the burden on developers of process-aware systems, as well as making a more compelling case for the adoption of such systems. However, there are two outstanding questions. First, are the processes currently represented in workflow systems knowledge-intensive processes? Second, will annotation of


Page 15 of 23

existing process support the more advanced reasoning necessary? Both of these are empirical questions that are under study by the author. Constructing processes is normally an arduous task involving many iterations as conditional expressions are revised and exceptions arise during enactment and are incorporated. Although thisPrevious work in artificial intelligence has addressed In addition to hand-coded process definitions, users could create new processes by invoking a “process recording feature” similar to the macro recording feature found in many current applications.

Inter-application communication Each of the above activities — process recognition, information source association, information extraction and library construction — requires close cooperation between the knowledge retrieval and indexing framework and user applications. Tight integration can be achieved either by crafting custom process-aware application tools that communicate directly with the knowledge retrieval framework, or by using a general event communication protocol to tie together existing user applications. Although building custom software tools offers maximal flexibility, most users are reluctant to learn new applications. If the designer must take the user’s applications as given how can he support the needed integration? The two primary functions needed are event handling and program control. Modern applications are written in an event-driven style, where user actions and other program actions fire events and the application programmer writes code to “handle” events by taking the appropriate action. In addition to recognizing events, a process-driven system must also be able to control other applications. For example, a user who is responding to an emailed request for information might complete a Web search and then email the relevant pages to the questioner, along with a comment in the body of the email. If a process-driven system recognized the mail format as a question form, the engine might extract the query and submit it to a relevant, Web-based, knowledge source. Doing so, however, requires event notification of the email message and the ability to start a Web client program and control its queries. Microsoft Corporation, the publisher of most of the world’s application software, has a general protocol in place that supports both event handling and program control — the Component Object Model (COM). The Component Object Model (COM) is a binary reuse framework that supports communication among COM-compliant applications (Box, 1998). Much of the emphasis on reuse today is targeted at source-code reuse — carefully authoring complex systems using object-oriented principles so that much of the code written for one


Page 16 of 23

system can be used for another. Binary reuse instead focuses on capabilities of compiled applications. For applications running on the Microsoft Windows family of operating systems, COM provides the glue to tie applications together. If a designer is willing to limit herself to Windows, then COM offers the needed applicationsharing infrastructure.

Prototype design: Restaurant reservations Table 1 below depicts the reservation process as it is today — without a process-aware retrieval engine. Imagine instead that a process-aware engine supports this interaction. Once the user navigates to the Zagat.com site, the process engine recognizes Zagat.com as a restaurant review source and opens a second browser window with a list of similar restaurant review sources — perhaps the online City Guides served through MSN at http://local.msn.com/. In addition to suggesting relevant sources, something else has happened: a potential process has been identified, based on an information source. The system uses the result of the user action, in this case navigating to Zagat.com’s index page, as a key to index into a data structure of known information sources. As the user follows a link to a new Web page, the universal resource locator (URL) of each page is checked upon arrival. The URL of Zagat.com’s home page (http://www.zagat.com/) indexes a list of restaurant review sources. Selecting Arizona as the location has no effect, since no statewide restaurant services are known to the system.

In

other

words,

using

the

URL

of

the

Arizona

(http://www.zagat.com/Index.asp?VID=1&PID=1&LID=50&newLID=50),

returns

page an

empty list. Choosing Tucson, however, updates the suggestions to include Tucson-specific review sources, including DoTucson.com’s Dining pages and the Arizona Daily Star’s restaurant review pages, which are indexed under the

URL of Zagat.com’s Tucson page, as a key to index into a data structure of known information sources. Suppose the user now switches from Zagat to DoTucson.com and selects a restaurant. The system, recognizing not only the restaurant reservation process but also the task within the process, offers a dialog box asking whether the user would like the system to lookup the address and phone number, dial the restaurant, retrieve a map and email driving directions to other invitees. Answering yes causes the system to parse the DoTucson.com Dining result page. If the phone number is available, it is pulled from the result page; otherwise, the restaurant name is submitted to an online phone directory


Page 17 of 23

covering the search area — Tucson, Arizona, in this case. Next, the address is submitted to an online mapping service; a map is retrieved and directions are generated. There are several benefits to adding process-aware support. First, the user need only participate when human judgment is required. For example, the selection of an appropriate restaurant is a complex task in which people weigh competing factors, such as price, service, food atmosphere, occasion and many other elements. Other systems can aid users in such choices (Burke, Hammond, & Young, 1997). Selecting and copying text from one Web page to another, however, requires less skill. Second, process offers a context in which to suggest relevant information sources. One of the flaws in push technology is the difficulty in knowing enough about a user to make highly targeted recommendations. A process-based approach shifts the question from “Who are you and what are you interested in?” to “What are you doing right now?” Third, knowledge of the process can offer the contextual information needed to improve retrieval. This is the central notion of this work — that process information is a tractable proxy for extensive contextual information. As an example of the contextual use of process, let us return to the restaurant reservation example. Suppose that the user has selected the restaurant Dish from an online review site. The next step is to retrieve the phone number from an online directory. Using the local phone company’s QwestDex service, the user would normally enter “Dish” in the business name search field. The basic search form does not allow a user to enter both a business name and category, however. Entering “Dish” on the basic search form returns seven listings, six of which are for the Dish Network, a satellite television provider. The context of restaurant reservation is lost, in this case, and “Dish” is taken to many any string that contains the string “Dish”. A process-aware system, however, identifies the name “Dish” as a restaurant name, because it has been extracted from a restaurant review page, in service of making a reservation. Therefore, the context of a restaurant reservation can be used to disambiguate the term “Dish”. In this case, the process engine would submit “Dish” through the advanced search form, which supports searching by category and business name simultaneously. The system can thus enter “Restaurant” in the category field, “Dish” in the name field, and only a single result is returned. The process representation has captured the key element of the context. In this example, the gain is not dramatic because restaurant reservation is a simple process. However, the principle applies to other processes as well. An accountant preparing this year’s annual report who searches for “current assets” could have her search restricted to documents authored by the accounting department. A student researching term paper topics might


Page 18 of 23

search only general sources, written for a broad audience. In computer science for example, queries at the topic selection stage might be sent only to indexes for Communications of the ACM and IEEE’s Computer.

Technical implementation Orchestrating interaction among several different applications, using events generated in one application to control the action of another application, has application beyond building process-aware systems. Thus, the framework described could be used in other areas — computer-supported cooperative work (CSCW) and digital libraries are two other uses being investigated. The core of the system is a combination of two technologies: Microsoft’s Component Object Model (COM) and the Python programming language. COM is the means by which one application can control another and also receive events other applications, as has already been discussed above. With this framework, users can continue working with the applications they have always known. In fact, users might not even be aware that they are using enhanced applications. In a controlled environment, such as a corporate desktop, users might double-click the same icon they always have, but start a Python script in the background that in turn starts their familiar application. Existing applications can be instrumented for testing purposes or gather experimental data. Currently, work is underway to apply this approach to building virtual libraries for public and student access.

Future research Currently, the process-aware framework is being investigated as a tool for building a virtual public library. In a typical public library application, a citizen interested in learning about the candidates in an upcoming local election might be interested in information about their records with regard to important local issues. If the voter were to simply enter elections into a search engine she would retrieve information about candidates, the election process and democratic principles. A first-time voter might want information about the process of election itself, to better prepare for the casting his vote. In a university setting, a student researching a paper would likely begin with a broad search to identify a suitable topic. At this stage in the process, a detailed article with a narrow focus would not be appropriate. Thus, an undergraduate student would likely be better served by searching newspapers, magazines and even the general interest publications of professional societies, such as the Communications of the ACM or


Page 19 of 23

Archaeology, published by the Archaeological Institute of America. With a process-aware framework, voters and students could receive highly targeted assistance. Computer-based collaboration is another opportunity for additional research. Using a task, role pair to characterize users offers an opportunity to share knowledge sources transparently. Moreover, the process framework can link together the actions of distributed teams. Being able to reason about the processes, and identify gaps in teamwork or areas that require additional emphasis is another possible application. In addition to many potential applications, several important research questions are still outstanding. The verification of the process proxy principle is one of the key questions, as much of process-aware retrieval depends on it. Using process information to support information access is different from previous uses of process, and thus will likely need different — or at least augmented — process representations. Closely related to process representation is the question of associating information sources with processes. Source association is doubly important in the framework described because it serves as the key to process recognition, as well as information in its own right. As discussed in the review of related work, research in artificial intelligence planning has grappled with many of these same questions in an intelligent agent context. Incorporating planning principles into workflow process representation enables new research into reasoning about processes, and offering a new level of dynamism in workflow systems. Case-based reasoning offers particular promise in the construction and use of intelligent process libraries.

Conclusions Today’s information systems offer excellent performance in skilled hands, but are difficult to navigate for most users. Constructing complex queries, tracking myriad information sources and creating a thread that runs across multiple applications are demanding tasks that should shift from users to their computers. The lack of context in user queries is one of the key hurdles to better information access. Unfortunately, representing the complete context of every user action and goal scales poorly. By encoding process information and observing user interaction with existing application, user processes can serve as proxies for the more complete context. In many environments, the tools exist to enable users to reap the benefits of process-aware retrieval while continuing to use familiar applications.


Page 20 of 23

Process-aware retrieval offers a foundational principle for the design of information access systems that seamlessly integrate with other technologies. Drawing on representation work from artificial intelligence planning, process research from workflow and access tools and strategies from information retrieval, process-aware systems can shift the burden of access from the user to the system — where it belongs.


Page 21 of 23

References 1. Abecker, A., Bernardi, A., Maus, H., & Wenzel, C. (2000). Information support for knowledge-intensive business processes - combining workflows with document analysis and information retrieval. AAAI Spring Symposium Series: Vol. 2000. Bringing Knowledge to Business Processes Menlo Park, CA USA: AAAI Press. 2. Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: two sides of the same coin? Communications of the ACM, 35(12), 29-38. 3. Berry, P. M., & Drabble, B. (1999). SWIM: An AI-Based system for Workflow Enabled Reactive Control. Workshop on Workflow and Process Management (IJCAI '99) AAAI Press. 4. Box, D. (1998). Essential COM. Reading, MA USA: Addison Wesley. 5. Budzik, J., Hammond K., & Birnbaum, L. (2001). Information access in context. Knowledge Based Systems, 14(1-2), 37-53. 6. Burke, R. D., Hammond, K. J., & Young, B. C. (1997). The FindMe Approach to Assisted Browsing. IEEE Expert, 12( 4), 32-40. 7. Celentano, A., Fugini, M. G., & Pozzi, S. (1992). Conceptual document browsing and retrieval in Kabiria. Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (p. 3). ACM. 8. Celentano, A., Fugini, M. G., & Pozzi, S. (1995). Knowledge-based document retrieval in office environments: the Kabiria system. ACM Transactions on Information Systems, 13 (3), 237-268. 9. Cheng, T., & Podolsky, S. (1993). Just-in-Time Manufacturing - an introduction. London, England UK: Chapman and Hall. 10. Fennell, R. D., & Lesser, V. R. (1975). Parallelism in AI problem solving, : a case study of Hearsay II. Pittsburgh, PA USA: Department of Computer Science, Carnegie-Mellon University. 11. Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns : elements of reusable objectoriented software (Addison-Wesley professional computing series . Reading, Mass.: AddisonWesley. 12. Georgakopoulas, D., Hornick, M., & Sheth, A. (1995). An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure. Distributed and Parallel Databases, 3(2), 119-153. 13. Google, Inc. (2001) Google Home [Web Page]. URL http://www.google.com/ [2001, May 17]. 14. Hammond, K. J. (1989). Case-based planning : viewing planning as a memory task (Perspectives in artificial intelligence v. 1 . Boston: Academic Press. 15. Hayes-Roth, B., & Hayes-Roth, F. (1979). A Cognitive Model of Planning. Cognitive Science, 3(?), 275-310. 16. Hollingsworth, D. (1995). (Report No. WFMC-TC00-1003). 17. Kautz, H. A., & Allen, J. F. (1986). Generalized plan recognition. Proceedings of the 5th National Conference on Artificial Intelligence (pp. 32-37). Menlo Park, CA USA: AAAI Press.


Page 22 of 23

18. Kirsch, S. (1998). Infoseek's experience with searching the Internet. SIGIR Forum, 32(2), 3-7. 19. Knoblock, C., & Minton, S. (1998). The Ariadne approach to web-based information integration. IEEE Intelligent Systems, 13(5), 17-20. 20. Kolodner, J. (1993). Case-Based Reasoning (The Morgan Kaufmann Series in Representation and Reasoning . San Mateo, CA: Morgan Kaufman. 21. Korfhage, R. R. (1997). Information Storage and Retrieval (1 ed.). New York, NY: John Wiley & Sons, Inc. 22. Leymann, F., & Roller, D. (2000). Production workflow concepts and techniques. Upper Saddle River, N.J USA: Prentice Hall PTR. 23. Myers, K. L., & Berry, P. M. (1999a). At the boundary of workflow and AI. Workshop on Agent-Based Systems in the Business Context (AAAI '99) AAAI Press. 24. Myers, K. L., & Berry, P. M. (1999b). Workflow management systems: An AI perspective. Menlo Park, CA USA: Artificial Intelligence Center, SRI International. 25. Newell, A., & Simon, H. A. (1964). An example of human chess play in the light of chess playing programs. [Pittsburgh]: Carnegie Institute of Technology. 26. Sacerdoti, E. (1974). Planning in a Hierarchy of Abstraction Spaces. Artificial Intelligence, 5(?), 115-135. 27. Salton, G., & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. K. Sparck Jones, & P. Willett (pp. 273-280). San Francisco, California, U.S.A.: Morgan Kaufmann. 28. Schnurr, H.-P., & Staab, S. (2000). A Proactive Inferencing Agent for Desk Support. AAAI Spring Symposium Series: Bringing Knowledge to Business Processes Menlo Park, CA USA: AAAI Press. 29. Staab, S., & Schnurr, H.-P. (1999). Knowledge and business processes: Approaching an integration. Workshop on Knowledge Management and Organizational Memory at IJCAI '99 Menlo Park, CA USA: AAAI Press. 30. Stevens, W. R. (1992). Advanced Programming in the UNIX Environment (Professional Computing Series . Reading, MA USA: Addison Wesley. 31. Tate, A., Hendler, J., & Drummond, M. (1990). A review of ai planning techniques. in J. Allen, J. Hendler, & A. Tate Readings in Planning (pp. 26-49). Morgan Kaufmann. 32. Watson, I. (1997). Applying Case-Based Reasoning: Techniques for Enterprise Systems. San Francisco, CA USA: Morgan Kaufmann. 33. Workflow Management Coalition. (2001). Workflow Process Defnition Interface -- XML Process Defnition Language. (Report No. WFMC-TC-1025). 34. Zagat Survey, LLC. (2001) Zagat.com [Web Page]. URL http://www.zagat.com/ [2001, May 29]. 35. Zantout, H., & Marir, F. (1999). Document management systems from current capabilities towards intelligent information retrieval: an overview . International Journal of Information Management, 19(6), 471484.

Step 1 — Begin a localespecific restaurant search at Zagat.com

Page 23 of 23

Step 6 — Review the map, and request directions

Step 2 — Select a restaurant (Dish, in this example)

Step 7 — Enter starting address for driving directions

Step 3 — Copy the restaurant name (“Dish”) into the phone directory lookup

Step 8 — Paste map into email message announcing dinner reservation

Step 4 — Review the directory page, call the restaurant to make the reservation

Table 1 — Restaurant reservation process in action


Step 5 — Copy the restaurant address to an online map/directions Web site

What is the problem - Semantic Scholar

What is the problem - Semantic Scholar

Suggest Documents

What is metamorphosis? - Semantic Scholar

What is nanotechnology? - Semantic Scholar

What is political? - Semantic Scholar

What is Usability? - Semantic Scholar

What is Ada? - Semantic Scholar

What is economics - Semantic Scholar

What is Participation? - Semantic Scholar

What is information? - Semantic Scholar

What is Computation? - Semantic Scholar

What is Probability? - Semantic Scholar

What Is the Hobbit? - Semantic Scholar

What is the system failure? - Semantic Scholar

What is the brain? - Semantic Scholar

Understanding sound: so what is the problem?

What is the nature-nurture problem/issue

What is the Policy Problem? - STES

Euthanasia: What Is the Genuine Problem?

What is the Problem of Judicial Review

What is the Size of the Semantic Web? - Semantic Scholar

What Is Inflammatory Breast Cancer? - Semantic Scholar

What is an altruistic action? - Semantic Scholar

What Is Evidence-Based Practice? - Semantic Scholar

What Is Stochastic Resonance? - Semantic Scholar

What Is a Situation? - Semantic Scholar