Leveraging Web Services for Information Discovery

Leveraging Web Services for Information Discovery Doron Cohen, Michal Jacovi, Michael Herscovici, Yoelle S. Maarek, Noga Meshulam, Aya Soffer, Vladimir Soroka IBM Research Lab in Haifa Haifa 31 905, Israel ph: +972 4 829 6369 email: [email protected] ABSTRACT In this paper, we describe a novel application of the Web Services model for end-user information discovery needs rather than for the traditional business-to-business applications. We describe a specialization of Web Services for information providers and demonstrate, through an exemplary unified information discovery console, how consumers can easily customize their favorite information sources, and obtain information from them in a passive or active but always unobtrusive manner.

Categories and Subject Descriptors H.3.4 [Systems and Software]: User profiles and alert services H.5.3 [Group and Organization Interfaces]: Web-based interaction

General Terms Management, Human Factors.

Keywords Web Services, User Interfaces, Search.

1. INTRODUCTION Web services [12] are self-contained applications that can be published, discovered and invoked over the Web. They typically perform operations that allow businesses to collaborate ondemand, without the need to surface complex API’s or to share code. Examples of classical Web services range from simple requests such as stock quotation to more complex business processes like booking complete vacation packages. The key advantages of Web services are interoperability and extensibility, thanks to the use of XML, as well as their ability to be easily combined. While there has been some efforts on human-facing Web services [20], they are still predominantly used for businessto-business (B2B) transactions. Most content providers and information services on the Web share many characteristics with Web business applications: they are numerous, operate over the Internet, their data is dynamic and is constantly evolving. Examples range from weather sites such as Accuweather.com, to news feeds, such as Businesswire, to astrological predictions, such as astrology.com. We can roughly categorize these services in two main classes: pull services -- that users will typically access via their browsers, and query via a simple “search box” that will be submitted to their favorite search service -- and push services that will deliver subscribed information to one’s desktop via dedicated applications or tickers (See Section 5). The variety of these

sources of information, and their various modes of delivery, play a significant role in the information overload syndrome. We argue here that by using a Web services platform, many information and content providers could easily be consolidated. Let us consider this issue from an end-user’s viewpoint. Take a typical day at the office, IT workers start their day by scanning through their e-mail inbox, answering right away the most urgent and easy-to-address issues, and procrastinating the rest, checking their stock options via a dedicated ticker, reading local news online on their preferred newspaper Website, maybe paying a quick visit to their favorite discussion forums. They return later to pending email that requires more work, and probably search for adequate information that would help them crafting an answer, by turning to Google for the Internet, the company portal for the Intranet information, asking colleagues via chat, etc. Throughout the day, users are absorbed in their mainstream work but keep getting short interrupts from their e-mail application, or tickers scrolling constantly on their screen and which catch their attention from time to time. In most cases though, they are not aware of the events that occur in their various information sources without breaking from the stream of their work, switching context to each information provider's application, and consolidating all "inside their heads”. Imagine now a one-stop application, that would serve as a broker of events that may arrive from various sources of interest in a push-like mode, as well as enable conducting information discovery on several sources at once in a pull-like mode. The user’s favorite sources of information could easily be plugged in and out and customized at will in this one application. This need for consolidating all information discovery sources into one access point is not new and has already been identified by portals, from consumer Portals such as Yahoo!, with its ability to provide personalized areas through my.yahoo.com [21], to enterprise commercial or open-source portlet servers [28],[18],[14]. However, the typical portal approach is an “intermediary” approach where the portal is the unique broker. Indeed, my.yahoo allows consumers to select from a set of “canned” services . As rich and various as they may be, they are still controlled by the portal, which decides what kind of customization they offer to end users. In the enterprise world, portal servers adopt a slightly different approach, where the service provider typically builds an end-to-end vertical solution from the basic Web service to the customized portlet, which typically interact with each other through SOAP. The portal server admin then simply lets end-users select from a port-folio of portlets, and arrange them at will on their personal page. Due to the vast range of services that can be offered this way, no generic GUI artifact can be used, and portlets need to be aware of the semantics of the service. In addition,

portals typically display their contents on the full Web browser page and thus occupy most of the desktop real-estate. In this paper, we propose a framework for consolidating information discovery that, unlike previous approaches, does not require an intermediary but directly accesses distributed Web services. By focusing exclusively on Web services that provide information or content with narrow semantics, generic GUI components can easily be associated and reused for many sources. Thus information providers can easily publish their services following the framework's guidelines and, instead of each building their own portlet, they can benefit from predefined GUI artifacts that can easily be arranged on a small console or integrated into various applications on the user’s desktop. The paper is organized as follows. Section 2 introduces our Web Services-based approach for information providers and in particular presents the specifications for both providers and consumers of information. Section 3 describes the architecture of our proposed framework. Section 4 presents an embodiment of this approach: It shows how an example console application, Sheli, takes advantage of this framework and illustrates how it can be used via a few example scenarios. Section 5 discusses some related work and Section 6 concludes by summarizing the key contributions of our work.

2. APPROACH The novelty of our approach is in defining a framework and architecture enabling diverse information sources to publish their availability and then broadcast content using a standards-based Web Services infrastructure. This is in contrast to traditional “push” services, such as PointCast [23], that use proprietary protocols, or centralized services such as Microsoft .Net alerts [17] as discussed in Section 5. The proposed framework supports the following generic publish subscribe model: • Sources of information publish their ability and willingness to supply certain types of information. We call an information source a content provider or provider for short. • Information discovery applications can locate available sources of information. These applications are termed content consumers or consumers for short. • End users can decide whether to "subscribe" to a source of information, and to what extent. • Content providers broadcast appropriate information to the content consumer. • Content consumers present information to end users via some user interface. Content Providers can be diverse in terms of the type of content they provide, the frequency of updates, and the specification per user. Our goal is to support the differing needs of providers in a well-defined, easy to use and rich framework. Some example providers include news agencies, weather services, shopping sites, stock quotes, last minute deals, parcel services, sports results, forums, and more. In addition to generic Web-based content providers, our architecture can support personal content providers using the same interface. These may include email applications, enterprise portals, instant messaging applications, chat rooms and more. Note that the purpose is not to replace any of these applications, but rather provide a platform for providers to alert content consumers when a significant event occurs. A consumer

receiving these alerts can then concisely display everything that may interest a user in one place as in the case of our sample consumer application – Sheli (See Section 4). At this point we have only discussed one aspect of information discovery, namely awareness. The complimentary need, querying, is also supported in a uniform way by our proposed framework. Specifically, each provider must declare whether it supports querying. When selecting providers, users can choose whether they want to make use of the providers query services. When a user enters a query, it is sent to query-enabled providers that have registered with the consumer. The format of the query request as well as the result is part of the object model as described in Section 3.4. From a user interface viewpoint, it is crucial for the console to handle the multitude of events originating from providers and display them in a concise yet meaningful way. The application must additionally provide powerful tools for customization and content source selection. The console must be light so as not interfere with the user’s regular tasks. On the other hand, it needs to be rich enough to clearly convey events as they occur. From an architectural and implementation viewpoint this approach poses numerous challenges. Both consumers and providers are hosted on machines that must run a Web server that supports Web services. Both consumers and providers must be able to receive and send requests. A provider sends events to the consumer, which must be prepared to catch and display them. A consumer can send queries and launch requests to the provider, which in turn must catch these requests and act upon them. A mechanism is required whereby consumers can locate available providers. Providers need to “advertise” the content they can serve in a uniform manner. Content may be a flag indicating an event occurred (e.g., print job is complete), a numeric value (e.g., temperature in a particular city), or textual (e.g., Dan has replied to your latest posting on the hiking forum). When selecting a provider, the user may need to qualify an event (e.g., pick the city for weather forecast, or enter the parcel number that s/he would like to track). We describe in detail how we address these issues in the next section.

3. ARCHITECTURE In this section, we present the architecture of our framework, and explain how it addresses the issues mentioned above. Let us take a closer look at this “world” of multiple providers and consumers, as depicted in Figure 1.

Provider 1

Consumer 1

Provider N

Consumer M

Figure 1 - Multiple Providers, Multiple Consumers A consumer may be interested in data originating from many providers. A provider may reside on the same machine as the consumer, but may also reside on another machine. Each provider typically serves many consumers. The system requirements can be summarized as follows:

•

Location: Each provider must have a location associated with it since consumers need to detect and locate available providers. A location must be aware of all providers that reside on it.

•

Content definition and selection: A provider must have a mechanism to define and specify its functionality. A consumer requires a mechanism to select a subset of these capabilities that are of interest to it.

•

This is akin to knowing the URL of a Web page that a user wishes to view. If there is more than one provider on the same computer, they all publish themselves in the same registry. This is depicted in Figure 2.

Registry Publish

Registration: Each consumer needs to keep track of its selected providers. Similarly, each provider needs to keep track of the consumers it is serving.

•

Data push: A method for providers to notify consumers of new data.

•

Data pull: A method for consumers to request data.

We propose to addresses these requirements in the context of a Web services infrastructure. We briefly remind below the key concepts and terminology behind Web services so as to facilitate the reading of the paper. Interested readers might consult the various online tutorials for more detailed information.

3.1 Introduction to Web Services Web services are an emerging technology based on leading industry standards – XML, HTTP and SOAP. Web services can be discovered in global UDDI (Universal Description, Discovery and Integration) [27] directories and can be connected using the HTTP-based SOAP (Simple Object Access Protocol) protocol [24]. Service interfaces are described using the XML-based Web Services Description Language (WSDL) [30]. The key benefit of Web services comes from its allowing applications to interact without exposing their API’s. Web Services and Web Service clients can be developed in a relatively simple manner using standard tools. A typical Web Services architecture is illustrated in the Figure below.

UDDI

register

lookup bind

Web Service

Web Service

Figure 2 - Web Services general architecture A Web Service registers itself with the central registry (UDDI) and any client can look it up there and connect to it, using the SOAP protocol.

3.2 Locating Providers – Registries A content provider registry (or registry for short) is a small component that resides on the same computer as the provider. Its purpose is to store information pertaining to all providers available on this machine. In order to detect a provider, a consumer must know the location of the machine on which the registry resides.

C Lookup

P1

P2 Figure 3 - Publish and Lookup.

P1 and P2 are content providers residing on the same machine; both publish themselves in the same registry. C is a consumer, residing on another machine. C can detect P1 and P2 by performing a lookup in the remote registry. Publish and subscribe tasks are as simple as invoking a Web service.

3.3 Content Definition – Provider Profile A content provider declares itself by a publish operation which consists of placing a content profile on its host machine registry. Consumers that probe the registry can learn about the functions offered by a provider from this profile. The registry can inform consumers when a new provider becomes available. Users select providers based on this profile. The provider profile consists of the following fields: • ID – The unique identifier for this provider. • Name – The name of this provider. • Description – A short description of the provider. • Supported query types – A list of the types of queries that this provider supports. Each query has a name, a description, and an indicator whether user input is required to launch the query. The most common query type is "search". •

Launch URI – An indicator of whether and how a consumer can launch this provider. This is needed in cases where the consumer is up and running while the provider is not.

•

Icon – An icon that the consumer can associate with this provider on its user interface.

•

Elements – A list of content elements offered by this provider. The definition of a content element follows.

A content element consists of the following fields: • Name – The name of this element; it can be used as a tool tip by the consumer UI. • Description – A short description of this element; it should be descriptive enough for end users to decide whether this element is of interest to them. • ID – A unique identifier of this element within the specific provider. • Type – The content type. It can be one of the following: o Flag – Specifies a binary (yes/no) state, such as “There is a problem”, or “John has arrived”. o Number - Specifies quantitative values, like “Number of people currently in the lab”. o Choice – Specifies an integer value within a finite range.

Text – A line of text describing an event like the title of an email message or of a discussion. Type specific fields – Each data type may have additional type specific fields: o Flag Yes/No icons – Icons that can be displayed on the console according to the flag state. Optional. o Number Range – The range of numeric values for this element. Optional. If a range is specified, then the element can be displayed using a graphic element such as a gauge. Other wise it is assumed that the actual number will be displayed. o Choice MinValue – The minimal positive value for this choice. Optional. NumChoices – The number of different choices. Icons/Name – For each possible choice an icon and associated name to be displayed on the consumer UI o Text NumTypes– The number of different types of text line elements. Icons/Names – For each text type, an icon and name to display on the consumer console. For example, an email provider can support text elements for both email and calendar events, each with a different icon and name. Qualifiers – A list of qualifier definitions, if applicable for this element. These qualifiers enable users to further qualify a selected element. For instance, if the provider is a parcel tracking system, the qualifier would have a single input field, allowing the consumer to supply a parcel ID to be tracked. o

•

•

We complete the definition of a provider profile by specifying the Qualifiers fields: • Name – The name of this qualifier. • Description – A short description of the qualifier. • Must – A Boolean property specifying whether a consumer must provide this qualifier when subscribing to the specific content element. • Type – Either Choice or Text. A Choice field additionally specifies: o NumChoices – The number of different choices. o Values – A list of valid choices. The qualifier definition can be further expanded to allow more sophisticated functionality. This is left for future work.

3.4 Registering for Data Push Once a consumer has examined the profiles of available providers, a provider can be selected, and certain elements available by this provider can be selected as well.

Consider the case depicted in Figure 4 below. Consumer C obtained two profiles from the registry. The two profiles were shown to the end user, and, based on the providers’ description, the user found P2 interesting and useful. The user further examined the five content elements supported by P2, and decided to register for two of them. A subscription request was sent to P2 via the registry, specifying the two content elements of interest, and supplying the address of consumer C Web service. Now P2 can notify C on content elements of interest via the Web service interface of C, or, in other words, provider P2 can push data towards the consumer C. register

Registry

Profile 1 Profile 2

register

P1

C

P2

notify

Figure 4 - Data Push. We call the data that flows to the consumer in this push process a notification event. A notification event encapsulates the current data of a supported content element. Its structure depends on the type of the element. It is defined as follows: • Provider machine – Identifies the machine on which the provider resides. • Provider ID – Identifies the provider uniquely per machine. • Element ID – Identifies the element of the provider. • Type specific values – each element type consists of additional type-specific fields: o Flag: A Boolean value depicting whether the flag is on or off o Number: A numeric value associated with the event. o Choice: An integer value in the choice range. o Text – The text associated with the event, it contains: § Action Code – One of: • ADD • REMOVE • UPDATE § Text – For ADD and UPDATE only § Subtype – A subtype number, for ADD and UPDATE only. Optional. § Icon – A specific icon, for ADD and UPDATE only. This, together with the subtype, allows dynamic addition and update of icons for text content elements. Optional. Summarizing the capabilities introduced so far, consumers can connect to a registry, query its providers, inspect providers by their profile, select an interesting provider, and finally subscribe to “data push” for all or some of the provider capabilities. The provider will start pushing data to the consumer as data is created or modified. We now describe how the complementary information need is handled in our framework, namely the consumer ability to pull data at will.

3.5 Data Pull Requests The most classical example of information pull request by a consumer is search. At any point in time, the user may wish to submit a search query to one or all providers whose profile indicates that search is indeed supported. Once the user has typed the search query and requested to start the search, the consumer sends a corresponding pull request to the provider. Here too, the request is sent to the registry via the standard Web services interface. The registry then forwards the request to the provider, also via the Web services interface. The provider performs the required search, and returns a response to the registry, which forwards the response to the consumer. This mechanism is relatively straightforward but might seems costly performance wise as it involves issuing two Web service requests and obtaining two responses in a row, rather than one of each. We believe that the penalty is nevertheless minimal considering the benefits of an interface that is both standard and secure. Our experiments indicated that the delay is negligible, mainly because the registry request from the provider is a local one, on the same machine. To complete the discussion of pull requests, we now define the request and response entities. The Pull Request definition allows for more than just search queries. The query request will contain the name of the query type, as obtained from the provider's profile, and, if required, also the query input. Although the predominant query type remains “search”, in the information retrieval sense of “free-text search”, the same pull request mechanism can easily be extended to other types such as “translate”, “currency conversion”, etc. A Pull Request contains the following fields: • • •

Name – Name of the query type such as “search”, “translate” or “convert”, as mentioned above. Input Text – If required, as specified in the profile query type definition. Subtype – The name of one of the text subtypes defined in the profile list of content elements. Only elements of the specified subtype will be returned. Optional.

providers. By getting rid of an intermediary that would hold the centralized registry, we remove a significant bottleneck in the provision and delivery of services. In addition, providers, unlike content registries, need not be fully enabled Web services. This reduces the burden imposed on providers by allowing for a lighter infrastructure, which is particularly useful if several providers run on the same machine. More specifically, consumers interact only with registries, which are fully Web-service enabled, and never directly with the providers. The provider will respond only to requests originating from the local registry, and never to remote requests, thus allowing it to use a lighter Web services engine, without penalty in terms of reliability or security.

3.6.2 Turning Existing Applications into Providers Consider an existing application that serves content to its users in a pull model via a Web browser, and perhaps also in a push model, via some proprietary channel. The two key requirements a consumer application has to fulfill in order to comply with our model, is (1) to become a Web service provider and (2) to adopt our framework WSDL definition. Both tasks are not trivial; therefore we reduce the effort required from the provider by offering a Web service provider adapter as part of our framework. This adapter component was developed using Apache AXIS [1], and provides the skeleton needed for complying with our model. More specifically, in terms of development effort, the provider only needs to follow the following steps to: 1.

At initialization time, invoke the adapter register method, and pass it the provider profile. 2. Extend an abstract class, which we provide, by implementing a few abstract methods that are invoked by the framework at various points in time. The key methods are: • The subscribe method which is invoked when the consumer subscribes to a push operation of a certain element. • The query method, which is invoked when the consumer issues a query (pull operation). The adapter utility is already fully implemented in Java, and will be soon available in C++ for C++ environments. Providers may of course, always decide to fully implement our framework, as long as they comply with our WSDL definitions.

The Pull Response consists of the following fields: • • •

Name – The name of the pull request. Status – The status of the pull request. It has two possible values: OK, ERROR. In the case of an ERROR value, an error message is attached. Items – A list of content items that match the pull request.

3.6 Further Issues In this section, we shortly address a few of the remaining open issues. We start by challenging the design decision of placing a registry on the same machine as the one on which the provider is running. Then, we discuss how existing applications can offer provider’s capabilities, and thus turn into providers. Finally, we detail how a client application can become a consumer of these services.

3.6.1 Centralizing vs. Distributed Registries One differentiating element of the architecture we propose here is the constraint that the provider's registry must be located on the same machine as the provider itself. Thus, we use distributed registries rather than a centralized registry shared by many

3.6.3 Developing Consumer User Interface We consider here a typical human-facing consumer application, which, by definition, includes a User Interface. This UI component allows the user to (1) connect to an arbitrary registry, (2) inspect the providers found on the registry, and (3) subscribe to the selected providers. In addition, the UI needs to react to various events from registries and providers. Programmatically speaking, in Java, this requires implementing three interfaces: •

RegistryListener for being notified of the existence of new providers, or changes in the profiles of existing ones. • ProviderStatusListener for being notified when a provider of interest becomes available. • ProviderContentListener for content notifications. The detailed methods of these interfaces are beyond the scope of this paper. These interfaces are already available for Java programmers, and will be soon available for C++ programmers, and are also part of the package that we offer. Consumers might also, like content providers, prefer to fully implement these methods, as long as they comply with our WSDL definitions.

4. AN EXAMPLE INFORMATION DISCOVERY CONSOLE, “SHELI” We present here Sheli, a light information discovery console for desktop computers as an example of a consumer application, built on top of our framework. Sheli is a compact, unobtrusive sample information discovery application that receives notifications from content providers, and sends free text queries to selected Web search services. Most of the time, Sheli occupies minimal space on the user's desktop – it is a thin bar, which can be conveniently placed at one of the corners of the screen without disturbing any other windows applications.

4.1 The Sheli bar

•

Finally, my book recommendation service is represented by icon #5. This service does not have any graphic elements .

4.2 Sheli text notification views The Sheli bar takes care of all content element types except for text elements. When a text notification arrives, Sheli pops another thin text line under the bar which displays the notification, as shown in Figure 6 below.

Figure 6 – Sheli single text line view

At launch time, Sheli comes up as a thin bar showing a row of small icons and indicators, as shown in Figure 5. The Sheli bar displays concise information from each subscribed provider. Each provider is represented by a different icon (from the provider's profile) that can be followed by a number of graphic indicators, which represent the content element values last sent by that provider. By clicking on the provider's icon, the user invokes the provider's application or his\her favorite Web browser on a given URL.

The text is decorated with a small icon that represents the source of the event, i.e., the application icon as depicted in the bar, as well as another icon that represents the notification type. The latter is needed in order to help distinguishing between different types of notifications that originate from the same provider. By default, the text line disappears after a few seconds, but it can also be pinned by clicking button #6, so that it remains on-screen indefinitly. Also, at any time, the user can choose to show or hide the text line by clicking on button #7.

•

A Flag value is represented by an icon, which can be displayed in two forms, one indicating a false value, and the other indicating a true value.

•

A Numeric value is represented either by a gauge indicating a relative value within a range, or by a text box that displays the exact value.

The user can also choose, by clicking on button #8, to open a larger text window that shows all previous text notifications, as shown in Figure 7. The user can filter out notifications from selected providers by unchecking the appropriate checkboxes displayed on the right side of the figure.

•

A Choice value, among a finite set of values, is represented by its corresponding icon.

Hovering the mouse over any graphic indicator will bring up a tooltip message with a short textual representation of the element value.

Text notifications are clickable, the same way that application icons are clickable in the thin bar: Clicking on a text notification launches the application or the Web page of the provider. The main difference though is that text notifications will direct the user to the specific context pertaining to the notification. For example, clicking on a new e-mail notification will open that very e-mail message within the mail application.

Figure 5 – The Sheli Bar Figure 5 shows a typical Sheli bar configured with a number of providers and indicators: •

•

•

•

Icon #1 (on the left side of Figure 5) represents my favorite instant messaging application. The indicator next to it is a Flag indicator, which turns green if my manager John goes online. My e-mail/calendar application is represented by icon #2. The gauges next to it show how many e-mails in my inbox remain unread), and the ratio of incomplete tasks on my todo list. If I hover with my mouse over one of the gauges, a tooltip displaying the actual numbers will appear. The weather forcast service icon (#3) is followed by an icon which indicates that I should expect snow tomorrow (this is a choice value, which can have one of a list of predefined values, e.g., sunny, cloudy, partly cloudy, showers, etc…), and a text box indicating the temperature forcast for tomorrow (-3c in our example). My home automation system icon (#4) is followed by two indicators, one telling me that the burglar-alarm system is active, the other indicates that all lights are off.

Figure 7 – Sheli multiple text line view Figure 7 shows a few text notifications examples, including a notification about a new book recommendation, a reminder about an appointment set for today, etc.

4.3 The Sheli search box By clicking on the binocular icon of the Sheli bar (label #9 in Figure 6), the user can dynamically extend the bar to the right to include a search box as shown in Figure 8 below.

Figure 8 – Sheli search box Users can then simply enter free-text queries in the search box, and click on the “go” icon (right side of the Figure) in order to direct them to the queryable content providers. The user can at will send a query to one specific provider, a subset of providers, or all providers. Search results, grouped by provider, are shown in a

window similar to the multiple line text notification view, and can be clicked to bring up the appropriate Web page or application context.

4.4 The Sheli providers dialog Sheli allows users to subscribe to new content providers and customize the notifications it receives from them. Providers can be added and customized through the Sheli providers dialog accessible through the Sheli menu (see #10 on the left side of Figure 6). Sheli reads an updated list of available provider profiles from the local registry (where local applications like e-mail are registered) or from any user-defined registry, and displays it in the providers dialog. The user can subscribe to any number of providers from this list. Each subscribed provider has its own tab within the providers dialog. The tab is built dynamically according to the provider's profile, and enables the user to specify the details needed for each notification type available from the provider.

5. RELATED WORK Our work relates to two main technical fields, namely, browsers and awareness interfaces, and push technologies. Other related work was discussed as relevant throughout the paper. As far as browsers and awareness interfaces are concerned, we have already mentioned the portal approach in the Introduction, and explained how it differs from ours. Besides the latter, several approaches exist both in the commercial and the academic worlds. Most of them, though, concentrate on alert and awareness issues, which are crucial to information workers [7,3]. Examples of such applications include Tickertech.com [26], providing a variety of stock, sports, and weather scrolling tickers, and the more extended MSN® messenger [19], which provides a wide range of messaging and collaboration capabilities, on top of Microsoft's .Net Alert technology [17]. The latter tool has a lot in common with our example application, Sheli, in terms of awareness functionality. One major difference, though, is its centralized approach that requires from users to use a single and unique consumer application, namely the MSN messenger client, for subscribing to alerts at one unique site, and from providers to publish themselves and route their alerts through a single alert “registry”. Our approach, on the other hand, is more open and non-centralized: Any content supplier can easily develop a content provider application and register itself via UDDI. Anyone can develop a content consumer application. Finally, any content consumer application can detect provider registries via UDDI, inspect content providers and subscribe to them. Another example of a proprietary approach is the Cassius system [16], which uses notification servers [22] for unifying awareness data from several sources in one application. Cassius is again based on a proprietary custom event notification server. In terms of “pull-based” search paradigms, one original example of search interface that departs for the regular Web search services pages, is the widely used Google toolbar [11], that resides on the user’s browser and clearly inspired our Sheli search box. Several tickers such as WordFlash News [29] combine a search text box that directs your queries to your favorite search service with alerting mechanisms, yet most of them use a proprietary and centralized architecture. Another topic related to our work is content personalization. This field has been widely studied especially in the context of mobile devices. Examples include the Pocket Directory Browser [5], by the authors of this work, which uses an intermediary model, or

WebViews, a system for Web personalized content for mobile devices [9], which allows users to fetch relevant information with various means of access. We believe that the Web services model can be advantageously used in this domain, by adding customized information push for connected mobile devices. We will explore the issue in future work. If we dive deeper into the architecture, we see that the notification mechanism described in our architecture embodies a “push” approach. In the early time of the Web, push technologies rose a great deal of interest: as of 1998, there were as many as 49 providers of push channels, though many of them are out of business now [25]. The vision, at that time, that push-based information delivery would end-up replacing the Web browserbased pull paradigm was obviously too far fetched [8]. Yet, push is used in a variety of domains based, for example, upon CDF, the XML based Channel Definition Format [4], originally proposed by Microsoft. One example of a CDF-based application is Desktop News, which provides an unobtrusive scrolling ticker [6]. However, the system architecture selected for Desktop News is centralized, hiding the inherent openness of CDF. Push technology is also used in active databases [10] and recently in XML based information repositories [2]. Hinze and Faesen argue that alerting systems, such as active database alerts, should be unified. They insist on the importance of unified alerting mechanism, proposing a general model for alerting services [13]. Our approach defines a similar model for information dissemination rather than general alerts.

6. CONCLUSION We have described in this paper an approach that exploits Web services for building generic consolidated information discovery applications. We have introduced (1) an open and fully distributed framework for publishing information in a unified manner, which eliminates the need for centralized services, and (2) an example consumer application, Sheli, built on top of it. Our architecture is novel in its usage of distributed UDDI registries for publishing information services, which eliminates the need for centralized repositories. We showed how information consumers and providers could take advantage of the flexibility and extendibility of Web services, as long as they all share similar semantics. We have also introduced a simple data model that would allow most information providers to turn into information discovery Web services, as defined within our framework. Our example application demonstrates that this data model is rich enough to support most of the classical search and awareness requirements. The main goal of this work is to demonstrate that a fully distributed architecture is possible not only for search services, as was demonstrated a few years ago by the now obsolete WAIS approach [15], but also for general awareness applications. We hope that the information provider community will adopt the Web services model in the near future. Should this happen, consumer applications could not only flatly aggregate information from various sources, as we have shown here, but could benefit from intelligently combined services for richer and “smarter” information discovery.

7. ACKNOWLEDGMENTS

[14] Jetspeed. The Apache Jakarta Project.

We are deeply grateful to Sarai Sheinwald for her constant creativity in designing new icons and help in polishing Sheli look and feel.

[15] Kahle, B. and Medlar, A. An information system for

8. REFERENCES [1] Apache Axis. http://xml.apache.org/axis/. [2] Bonifati, A., Ceri, S. and Paraboschi, S. Pushing reactive services to XML repositories using active rules. In Proceedings of 10th World-Wide-Web Conference, 2001, 633-641.

[3] Cadiz, J.J., Gupta, A., Jancke, G., and Venolia, G.D. Sideshow: Providing Peripheral Awareness of Important Information. Microsoft Research Tech Report MSR-TR200183 (2001).

[4] Channel Definition Format. http://www.w3.org/TR/NOTECDFsubmit.html (fetched on Nov 14 2002).

[5] Cohen, D., Herscovici, M., Petruschka, Y. , Maarek, Y., Soffer, A. and Newbold, D. Personalized Pocket Directories for Mobile Devices. Proceedings of WWW 2001 (Hong Kong, China, May 2001).

[6] DesktopNews. http://www.desktopnews.com/. [7] Dourish, P. and Bly, S. Portholes: Supporting awareness in a distributed work group, in Proceedings of CHI'92 (Monterey, CA, May 1992), 541-547.

[8] Franklin, M. and S. Zdonik. Data In Your Face: Push Technology in Perspective. In Proceeding of ACM SIGMOD International Conference on Management of Data (Seattle WA, June 1998), pp. 516-519.

[9] Freire, J., Kumar, B. and Lieuwen, D.F. WebViews: accessing personalized web content and services. In Proceedings of WWW 2001 (Hong Kong, China, May 2001), 576-586.

[10] Gehani, N.H., Jagadish, H. V. and Shmueli, O. Composite event specification in active databases: Model and implementation. In Proceedings of the 18th VLDB International Conference (Vancouver, Canada, August 1992), 327-338.

[11] Google toolbar. http://toolbar.google.com/. [12] Haas, H. W3C Web Services Activity Statement. http://www.w3.org/2002/ws/Activity (fetched on Nov 5, 2002).

[13] Hinze, A. and Faensen,D. A unified model of internet scale alerting services. In Proceedings of ICSC'99 (Hong Kong, China, December 1999), 284-293.

http://jakarta.apache.org/jetspeed/site/index.html (fetched on Nov 11, 2002). corporate users: Wide area information servers. Technical Report TMC-199, Thinking Machines, Inc., April 1991. Version 3.19.

[16] Kantor, M. and Redmiles, D. Creating an Infrastructure for Ubiquitous Awareness, Eight IFIP TC 13 Conference on Human-Computer Interaction (INTERACT'2001), Tokyo, Japan.

[17] Microsoft .NET Alerts. http://www.microsoft.com/netservices/alerts/default.asp

[18] Microsoft Sharepoint http://www.microsoft.com/sharepoint/. [19] MSN-Messenger http://messenger.msn.com/. [20] Myerson, J. M. Human-facing Web services, Part 1: An introduction to Web Services Experience Language. http://www106.ibm.com/developerworks/webservices/library/wsadapt.html (fetched on Nov 11, 2002).

[21] My Yahoo. http://my.yahoo.com/. [22] Patterson, J., Day, M., and Kucan, J. Notification servers for synchronous groupware. In Proceedings of CSCW’96 (Boston MA, Novemeber 1996), 122-139.

[23] PointCast. http://www.pointcast.com/. [24] Simple Object Access Protocol. http://www.w3.org/TR/SOAP/. [25] Strom, D. Push Publishing Technologies of Yesterday and Today. http://www.strom.com/places/t4a.html (fetched on Nov 14 2002).

[26] Tickertech.com. http://www.javaticker.com/. [27] UDDI. http://www.uddi.org/. [28] Websphere Portal Server. http://www.ibm.com/websphere/portalfamily.

[29] WordFlash. Turining information into a measurable asset. http://www.worldflash.com/CPS.pdf (fetched on Nov 14 2002).

[30] Web Services Definition Language. http://www.w3.org/TR/wsdl