Providing Personalized Mashups Within the Context of Existing Web Applications Oscar D´ıaz, Sandy P´erez, and I˜naki Paz ONEKIN Research Group, University of the Basque Country, San Sebasti´an, Spain
[email protected], {oscar.diaz,inaki.paz}@ehu.es http://www.onekin.org
Abstract. There is an increasing tendency for Web applications to open their data silos and make them available through APIs and RSS-based mechanisms. This permits third parties to tap on those resources, combining them in innovative ways to conform the so-called mashup applications. So far, most of the approaches strive to facilitate the user to create bright new mashup applications which are regarded as stand-alone applications. However, the fact that these applications are data driven suggests that the mashup data is frequently used to achieve higher-order goals. Frequently, you are gathering data not just for the sake of the data itself but to help taking some decisions. Some of these decisions are conducted through Web applications. In this scenario, it would be most convenient to post the mashup data by the application where the decision is taken. To this end, the term “mashup personalization” is coined to describe the approach of using mashup techniques for the end user to enrich the content of existing Web applications. A proof-of-concept framework is introduced, MARGMASH, whose operation is illustrated through a running example. Keywords: personalisation, mashup, wrapping.
1 Introduction Personalization is the process of tailoring pages to individual users’ characteristics or preferences that will be meaningful to their goals. However, it is not always easy for the designer to foresee the distinct utilization contexts and goals from where the application is accessed. “No design can provide information for every situation, and no designer can include personalized information for every user” [15]. Hence, traditional approaches can be complemented by mechanisms that allow end users to add their own content once the application is already deployed. This is akin to the “do-it-yourself” principle brought by mashups by allowing the layman to combine existing data from disparate sources in innovative ways. Likewise, the term “mashup personalization” is coined to describe the process whereby recurrent users can enrich existing applications with additional data using a mashup approach. As an example, consider an online travel agency such as Expedia.com. Based on some trip criteria, this site provides information about flights, hotels, vacation packages and the like. Very often, making the right selection requires additional data not available at the Expedia site itself. For instance, if a user is looking for a hotel B. Benatallah et al. (Eds.): WISE 2007, LNCS 4831, pp. 493–502, 2007. c Springer-Verlag Berlin Heidelberg 2007
494
O. D´ıaz, S. P´erez, and I. Paz
for a conference, it is most convenient to see how the Expedia hotels are located with respect to the conference venue, and all the data being rendered through Yahoo! Local. By contrast, while searching for a “last-minute” trip, hotel location might not be so important comparing to the weather forecast for the available places. In this case, the weather forecast becomes crucial data. Yet, another user is looking for exotic places but where appropriate travel guides are available. Here, the vacation package data can be supplemented with a list of guides taken from Amazon. In this scenario, current personalization approaches fall short as it can not be expected Expedia.com to foresee all possible data requirements, and hence, the user is forced to cumbersome, “daisy-like” roundtrips between Expedia.com and the other sites where the data is to be found. The point to stress is that this additional information largely depends on the user at hand and his hidden motivation. The user is the one that knows best why s/he is interacting with the site, and what additional data is required to accomplish the task at hand. To this end, “mashup personalization” uses mashup techniques for the end user to enrich the content of existing Web applications. This additional content comes from external sources through APIs or RSS feeds so that the user can tap on this content in new, unexpected ways. Distinct tools are currently available (e.g. JackBe, Kapow, Dapper, IBM (QEDWiki), Yahoo (Pipes)) that strive to facilitate almost anyone to create the mashup they need, whenever they need it. Here, the resource to be capitalized on is the data, and the user comes up with a bright new application: the mashup. This makes sense when the mashup is an end in itself but frequently, this data is collected to serve a higher-order goal. Frequently, you are gathering data not just for the sake of the data itself but to help you to make some decisions (e.g. which hotel to book, which location to travel). And an increasing number of these decisions are conducted through Web applications (e.g. on-line travel agencies, on-line brokers, on-line course enrollment, on-line shopping, on-line auctions and so on). In this scenario, it would be most convenient to post the mashup data by the application where the decision is taken, which also provides the first feeds or inputs to enact the mashup pipe. These ideas are realized through MARGMASH (from “MARGinal MASHup”), a tool for end users to add mashup fragments to their favorite web sites. The tool behaves as a lightweight wrapper that incorporates “marginal mashups” along the pages of the original application. Being targeted to end users, the selection of an application metaphor is most important for user adoption. To this end, pages of the original application are regarded as physical sheets where marginal mashups can be posted at specific places, highlighted by MARGMASH. To facilitate user adoption and the development of the system, we opt for integrating Yahoo’s pipes for mashup definition into MARGMASH. Yahoo’s pipe is a very intuitive and powerful tool which is catching on very quickly1 . Otherwise, the approach exhibits main mashup characteristics, namely, (1) mashups only affect the content, not the navigation or rendering of the “mashuped” application; (2) mashups are set by the user (the do-it-yourself (DIY) principle); and (3) the owner of the “mashuped” application is unaware of who and how is extending the application. The paper starts introducing MARGMASH through an example. 1
http://pipes.yahoo.com/pipes/
Providing Personalized Mashups Within the Context of Existing Web Applications
495
Fig. 1. Breadcrumb panel
2 MARGMASH by Example Once logged into the MARGMASH framework, a URL of the existing Web application is provided by the user (e.g. www.expedia.com). Framed by MARGMASH, the user browses along Expedia till a page is reached where a mashup needs to be posted. This is indicated by clicking on the “Margmash It!” button (see figure 1). The process is supported through a breadcrumb panel. First, the user needs to provide the visual clues (technically called annotations) on the selected page that will help to identify and classify that page. This identification should be general enough to select the right page regardless of the specific session or navigation that leads to the page. Notice that in this process, we are abstracting from a page-instance-based input selection to a page-class-level identification of the distinct elements on the page: the page class. A page class is defined as the set of pages that describes the same type of information and have a similar page structure. Thus, the user provides the visual clues by annotating distinct page instances which belong to the same class. Figure 2 shows how this process is accomplished for a single page instance. The page instance can be the first of its class (thus, the system prompt for a class name, figure 2(a)) or belonging to an already existing class. Next, the user identifies the visual clues by selecting the markup chunks that univocally singularize the pages on this class. Figure 2(b) shows how the user selects these visual clues. The eligible markups are surrounded with a border line whenever the mouse moves over them. In this case, the page is singularized by clicking on the “Hotel List View” markup. The selection is internally identified through an absolute XPath over the DOM tree of the page2 . However, absolute XPath expressions are very exposed to page structure changes. Improving the resilience of XPath expressions to structural changes involves the number of page instance be representative enough to come with a reasonable set of page instance. This set is then the input of an induction process which results in a more change-resilience XPath (see [11,14] for details). In the next step the user identifies which markup fragments (i.e. nodes of the underlying DOM tree) play the role of the mashup anchors. The identification and location of these fragments go along a similar process as the one conducted to obtain page class but now at the level of fragments. 2
Pages considered here are well-formed HTML documents. Documents that are not wellformed are converted to well-formed documents through document normalization using Mozilla GECKO (accessed through JRex APIs).
496
O. D´ıaz, S. P´erez, and I. Paz
Fig. 2. MARGMASH page classification
Technically, mashup anchors play a double role. First, they hint for placing the mashup output. Second, they provide some feed data for the associated mashup. In so doing, it behaves as a scrapper for the contained markup from where it extracts the data. Next paragraphs describe the definition of a mashup anchor. The definition of a mashup anchor includes: 1. the pipe’s identifier, which the anchor holds as a reference. So far, this identifier corresponds to a Yahoo’s pipe. By using Yahoo pipes, we externalize pipe definition to a popular tool, hence simplifying MARGMASH and facilitating user adoption. Hence, all the definition, debugging and publication of pipes take places within Yahoo’s tool. MARGMASH acts as a proxy for Yahoo’s pipes; therefore, collecting the data to feed the pipe and processing the output (see next paragraphs) 2. the pipe’s starting feeds. Yahoo’s tool uses a pipe metaphor for pulling and merging data from different applications from some initial parameters which are
Providing Personalized Mashups Within the Context of Existing Web Applications
497
prompted to the user. Now these parameters are automatically obtained from Expedia. To this end, “content feeds” are defined in the page classes using similar techniques as those used for page classification and anchor identification: a singularization-markup entry form pops up for MARGMASH to highlight specific markup chunks which are then selected by the user and associated with a given entry parameter of the pipe at hand. Thus, each pipe parameter is going to be fed by a tiny scrapper program that obtains the data from the containment page. 3. pipe’s output layout, which addresses how to integrate pipe’s output into the Web application. So far, the result of the pipe can be displayed as a separate layer or inlaid in the application page at either the left, right, top or bottom position with respect to the anchor situation. 4. anchor navigation mode. So far, two modes are considered: (1) automatic i.e. as soon as the page is loaded, the mashup is enacted, and its output rendered; (2) manual i.e. the mashup is explicitly enacted by the user by clicking the anchor. Once the user has provided enough data, this process builds up a margmash wrapper for Expedia. Browsing along this margmash application will go along the Expedia site and simultaneously delivering the mashup content as a single experience. Figure 3 compares a row page from expedia website, and its “margmashed” counterpart.
3 The Architecture of a MARGMASH Application This section looks inside a MARGMASH application. Figure 4 depicts an interaction diagram showing the main actors. On requesting a page load, the MARGMASH application behaves as a proxy that redirects the call to the “margmashed application” (e.g. www.expedia.com) through JRex. JRex is a Java Browser Component with set of API’s for embedding Mozilla GECKO within a Java application. The aim is to facilitate java wrapping around the required Mozilla embedding interfaces. On receiving the petition, JRex just delegates the request to the margmashed application. However, when the page is returned, JRex APIs can be used to get the DOM structure counterpart of this page. Once the HTML page has been converted into a DOM structure, embedded URLs are rewritten to point to MARGMASH so that latter interactions are always conducted through MARGMASH. Next, the current page is classified along the types identified at definition time (see previous section). If the current page belongs to a page class then, the content of the page must be enlarged with some mashups. To this end, the application first locates those markup fragments playing the role of mashup anchors. The navigation mode of the mashup anchors can be “manual” and “automatic”. In the first case, MARGMASH dynamically inlays an AJAX script into the page that is returned to the user. In this case, the appearance of the page is just the same that the one originally returned by the existing application. The only difference stems from some markups been turned into anchors. On clicking one of these anchors, the corresponding AJAX script asynchronously enacts the associated pipe (see figure 4(b)). The pipe is executed at Yahoo’s place and the output returned back to the AJAX engine. This time the AJAX script takes the Yahoo’s page, extracts the pipe’s output, and collage it as part of the rendered page.
498
O. D´ıaz, S. P´erez, and I. Paz
Fig. 3. Comparing original application with it “margmashed” counterpart
If the anchor’s mode is automatic then, pipe enactment occurs right away without waiting for the user interaction. In this case, the pipe’s output is inlaid immediately and the user gets the mashup data without additional interactions. Notice however that this navigation mode does not benefit from the advantages brought by the asynchronicity of Ajax. Hence, the user experience can suffer if this mode is heavily used due to the delay in building up the page out of the pipe’s output. One limitation of our approach, shared with most other mashup creation tools, is that the MARGMASH application can break if the underlying existing application changes. If a website changes its HTML such that the current wrapper (i.e. XPath expressions)
Providing Personalized Mashups Within the Context of Existing Web Applications
499
Fig. 4. Interaction diagram
no longer locates/classifies the right page or fragment, the MARGMASH application will not work correctly. However, advances in semantic enhancements will hopefully alleviate this problem. Another problem arises from AJAX and JavaScript code. The existence of AJAX and JavaScript code on pages is supported by matching text patterns against standard navigation mechanisms (e.g. document.location, etc.) and introducing special functions that convert the URL parameter into a correct one. Thus AJAX requests are also captured and processed. However, whenever the AJAX response is processed at the client side (e.g. XML Document), MARGMASH might run into problems.
4 Related Work This work can be framed within the distinct approaches that bring reuse to the Web. Besides the advantages brought by componentware, component aware technologies are being fueled by the increasing difficulty of processing the disparate, web-based sources of information that we have to consult on a daily basis. So as important sources of information continue to proliferate, people is craving for simplifying and centralizing their consumption [6]. Table 1. Dimensions for bringing reuse to the Web What vs. How
Markup wrapping
API
Self-describing Web
data Information Retrieval Google, Amazon RSS feed readers, Yahoo’s pipe entry page Web clipping Advert Engines whole application Portletization Portlet, Widgets
500
O. D´ıaz, S. P´erez, and I. Paz
Bringing componentware to the Web poses in turn two questions: which is the subject to be reused (the component model), and how is it integrated into the receiving application (the container model). Table 1 strives to pinpoint different efforts along these two dimensions. The first dimension addresses what is to be reused from a Web application. Three options can be contemplated, namely, (1) just the data, (2) entry pages for an application, and (3), the whole Web application. On the other hand, the mechanisms available for reuse include (1) markup wrapping, (2) API-based and (3), additional markup interfaces along the so-called self-describing Web [10]. The latter strives to make Web content available for automated agents. RSS, ATOM, RDFa or GRDDL can be aligned with this effort. These two dimensions define a matrix where we can frame this work. If data is the resource to be reused, a large body of Information Retrieval work has been reported using wrapping techniques [8]. Single-page extraction wrapping systems have been leveraged to extract and integrate distinct sources of information into a single repository that can be accessed as a traditional database [2,13]. Some of these systems become Web tools (e.g. www.dapper.net or DataMashups.com) that enable end users to republish the content they wish as RSS feeds, XML files or even Google gadgets, enabling the creation of Web pages that syndicate the content of different data sources. Another complementary tendency is that of application owners to permit access to their data silos through the use of Application Programming Interfaces (API). Amazon and Google are the best known examples. This removes the need for information extraction tools. Even more popular is providing the Web content through RSS or Atom channels where feed-reader agents can tap on to get the content out. The new crop of mashup frameworks such as Yahoo pipes can be framed here since they mainly integrate data coming from diverse sources. In a recent revision of commercial tools [7], this situation is highlighted when stated that “there is a clear dividing line emerging between products that only provide data integration and recombination from remote Web services and tools which provide a way in which to build a real user interface for a mashup, with a two-way flow of data from the underlying services”. MashMaker [5] also presents an interesting approach to mashup using a spread-sheet-like approach. Similar to our proposal, data is integrated around an existing application. When the page is loaded, data is arranged in a tree. When a user is looking at a particular node, MashMaker will automatically suggest additional data functions that they might want to apply. Clicking on one of these buttons will insert a new node whose defining expression extracts data from a third-party application. These functions can be defined by the user (the so-called user-defined widgets) or suggested by the system based on previous interactions of other users with the same application. However, data is a main but not the only asset provided by Web applications. Another important piece is the GUI itself. You can be interested in reusing not only the data but the presentation of this data as well .For instance, Web portals act as a frontend doorway to distinct applications (including Web applications) so that the user can benefit from portal commodities such as single sign-on. Hence, portal vendors offer mechanisms to include fragments of the entry page of existing Web applications [16,12]. This is referred to as Web clipping and interacting with the clipped entry page involves a navigation to the original Web application. Web clipping allows for
Providing Personalized Mashups Within the Context of Existing Web Applications
501
a side-by-side visual composition of distinct Web applications in a single Web page [1,4,17]. Another related work is that of [9] where Ajax-based clips are created to be included on an authored Web page. Unlike our approach, this approach is intrusive in the sense that the code of the existing application is modified by the owner of the application. By contrast, our scenario is that of end users “extending” existing applications without this application being aware of who and how is extending it. Finally, recently new approaches have emerged that permit to reuse a whole application (i.e. data, functionality and presentation): portlets and widgets. Portlets are interactive Web mini-applications to be delivered through a third party Web application (usually a Portal) [3]. Portlet technology permits to build portal pages by combining fragments from diverse portlets in a standalone server. In this line, portletization efforts [11] enable the construction of portlets by wrapping an existing Web application integrating it into a portal context. A related technology is that of gadgets. Promoted by Google, gadgets build pages by combining fragments with JavaScript on the client. As for MARGMASH, the subject to be reused includes both a complete Web application and data coming from other places whereas this reuse is mainly achieved through wrapping techniques.
5 Conclusions We have presented MARGMASH, a tool that permits end-users to do mashuping in the context of existing Web applications. Unlike previous approaches, the mashup is not perceived as an end in itself. Rather, the mashup is contextualized by delivering it within an existing website. In this way, the website acts as the initial data provider for the mashup input parameters as well as the container for its output. Additionally, MARGMASH capitalizes on the recent crop of mashup applications to deliver the content, taking Yahoo pipes as an example. Although currently only Yahoo’s pipe is supported, future work includes defining the bindings with other frameworks so that the user can select the one that facilitates the task at hand. However, it has yet to be demonstrated whether MARGMASH is usable by virtually anyone with any skill level using any browser in any language and without any training. This will be essential for mashup tools like ours to succeed with the general public. Acknowledgments. This work was co-supported by the Spanish Ministry of Science & Education, and the European Social Fund under contract TIC2005-05610. Perez enjoys a doctoral grant from the Basque Government under the “Researchers Training Program”.
References 1. Bauer, M., Dengler, D.: Infobeans - configuration of personalized information assistants. In: International Conference on Intelligent User Interfaces (1999) 2. Baumgartner, R., Gottlob, G., Herzog, M.: Interactively adding web service interfaces to existing web applications. In: SAINT 2004. International Symposium on Applications and the Internet (2004)
502
O. D´ıaz, S. P´erez, and I. Paz
3. Bellas, F.: Standards for second-generation portals. IEEE Internet Computing 8(2), 54–60 (2004) 4. Bouras, C., Kounenis, G., Misedakis, I.: A web content manipulation technique based on page fragmentation. Journal of Network and Computer Applications 30(2), 563–585 (2007) 5. Ennals, R., Garofalakis, M.: Mashmaker: Mashups for the masses. In: ACM SIGMOD International Conference on Management of Data (2007) 6. Hinchcliffe, D.: Online ajax ”desktops” try to change the rules of the game (2006), Published at http://blogs.zdnet.com/Hinchcliffe/?p=8 7. Hinchcliffe, D.: A bumper crop of new mashup platforms (2007), Published at http://blogs.zdnet.com/Hinchcliffe/?p=111 8. Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. ACM SIGMOD Record 31(2), 84–93 (2002) 9. Lingam, S., Elbaumda, S.: Supporting end-users in the creation of dependable web clips. In: International World Wide Web Conference (WWW 2007) (2007) 10. Mendelsohn, N.: The self-describing web. W3C (2007), at http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html 11. Paz, I., D´ıaz, O.: On portletizing web applications.Submitted for publication to ACM Transactions on the Web, ACM TWeb (2007), Summary at http://www.onekin.org/margmash/tweb07.pdf 12. Oracle. Oracle9iAS Portal Web Clipping Portlet (2003), Published at http://portalcenter.oracle.com/ ´ An alternative architecture for financial data integration. Communications 13. Pan, A., Vi˜na, A.: of the ACM 47(5), 37–40 (2004) ´ 14. Raposo, J., Alvarez, M., Losada, J., Pan, A.: Maintaining web navigation flows for wrappers. In: Lee, J., Shim, J., Lee, S.-g., Bussler, C., Shim, S. (eds.) DEECS 2006. LNCS, vol. 4055, Springer, Heidelberg (2006) 15. Rhodes, B.J.: Margin Notes: Building a Contextually Aware Associative Memory. In: International Conference on Intelligent User Interfaces (2000) 16. Smith, I.: Doing Web Clippings in under ten minutes. Technical report, Intranet Journal (March 2001), at http://www.intranetjournal.com/articles/200103/ pic 03 28 01a.html 17. Tanaka, Y., Ito, K., Fujima, J.: Meme media for clipping and combining web resources. World Wide Web 9(2), 117–142 (2006)