IADIS International Conference WWW/Internet 2008
FEEDS AND MASHUPS: TOWARDS NEW WEB APPLICATIONS PARADIGMS AND DEVELOPMENT FOR INFORMATION DISTRIBUTION Serena Pastore INAF – Astronomical Observatory of Padova Vicolo Osservatorio 5 – 35122- PADOVA - ITALY
ABSTRACT The Web 2.0 trend has transformed use of the web as a successful participation platform thanks to a rich user interface developed in different languages and technologies able to hide the complexity of the integrating heterogeneous systems and data. Among the underlying concepts of such an environment, there are a web-oriented architecture as a subsystem of a service-oriented architecture and content composition according to the web services. The adopted technologies are separated between those that make the web interface intuitive and rich and those that help to connect multiple data and information sources. The paper describes a work-in progress in implementing design patterns and architectural styles in web development by referring to an example scenario of information distribution among a research institute. By taking advantage of syndication feed technologies, two approaches using or not using server-side script processing are discussed for a news visualization goal, and a new genre of applications such as mashups is introduced. KEYWORDS Web 2.0, internet technologies, Mashups
1. INTRODUCTION The Web 2.0 trend (Murugesan, S. 2007) has transformed use of the web as a successful participation platform thanks to a rich user interface that is able to hide the complexity of integrating heterogeneous systems and data. Web 2.0 is about encouraging user participation through open applications and services implemented by specific web technologies. It gives both a technological approach by using simple protocols and open standards and a social approach to make websites more scalable with tools for user participation in content creation, consumption, distribution, and those designed to produce and manage collective intelligence. The underlying concepts in the technological aspect of web data-driven applications are a weboriented architecture as a subsystem of a service-oriented architecture (Erl, T., 2007), content composition by web service specifications (Cerami, E., 2002), and a semantic approach (Greaves, M., 2007) based on microformats and folkosnomies as collaborative categorization of information according to freely chosen keywords. There is no one set of technologies that every Web 2.0 system uses, but difference may be done by distinguishing those that make the web interface intuitive and rich (Farrel, J. et al, 2007) (i.e Ajax, Javascript, CSS, DOM, XHTML, XSLT/XML and Adobe Flash) and those that helps to connect multiple data and information sources (Goh, C. et al. 2007) (XML-RPC, REST, RSS, Atom, Mashups). Moreover, Web 2.0 embraces both the set of technologies used to develop and implement service (i.e., Ajax, RSS, REST, JSON) and a number of already-available online products (i.e. GMail, Flickr, YouTube). The impulse to use such services is related to the availability of public data and information (i.e., through the web or other data service providers) and of online services and their interfaces available through application programming interfaces (APIs) facilitating their composition, and thus the creation of a particular web application (Lepeda, J.S., et al., 2007). The paper describes the work in progress in introducing design patterns and architectural styles of Web 2.0 in web development by referring to an example scenario. The problem regards effective information distribution among several organizational websites belonging to an research institute, and a solution could be reached by mixing syndication feeds (Hammersley, B., 2005) and Web 2.0 technologies.
409
ISBN: 978-972-8924-68-3 © 2008 IADIS
Technologies, data formats, and tools needed to develop and distribute applications able both to create and publish, and to retrieve, parse, and visualize information coming from different source have been analyzed. Starting from the use of web feeds as an easy data format to describe and distribute web content, we analyze three aspects of development: the best data format to be used to describe content within web feeds following several specifications, formats and standards developed (i.e., RSS, Atom, JSON); the techniques to use for retrieving, parsing and visualizing the content inside a web page (i.e. through Ajax technologies); specific genre of web applications used to provide aggregations (i.e. mashups). In specific the paper describes the implementations of server-side and client-side solutions in rendering feeds and it introduces the mashup solution as a complex web application able to aggregate content from several sources to create a value-added service.
2. THE STARTING SCENARIO: A BEST INFORMATION DISTRIBUTION The initial problem was the institute’s requirement to discover how to best distribute information that although available on a website, is not being accessed by potential users. Feeds and web syndication offer exciting new ways to manage information and communicate with users. Feeds are a successful way to describe web content retrieved from information source providers, since feeds provide information as news items organized in a structured file. The approach to the problem has been to design and implement a web application able to compose content in a feed and to retrieve and visualize it inside each home page of a specific number of organizational websites. The resulting web pages could thus be referred to as mashup of the existing content, and this process should be realized with a minimum effort by local webmasters. In such a development different actors (Figure 1) that are logically and physically disjointed are involved in the interactions.
Figure 1. Components of a web feed applications
There are content providers that may expose their content through specific methods, the site where the web application that aggregates and publishes the feed and then processes and renders it, resides, and finally the client’s web browser, where the application is rendered graphically and user interaction takes place. The involvement of such actors distinguishes the design into two aspects: feed creation with impacts of the format of the exported content on interoperability and usability of the feed and the next elaboration and rendering. Web content is normally stored in different formats on systems that rely on the database backend system to store information that uses specific publishing methods to display information on the web (Simpson, D., 2005). Since this not homogeneity, a feed could be a standard and common way to describe such content. However, several specifications and standards have been developed in various phases to represent feed formats such as the best-known RSS specifications in the different versions 1.0 (http://web.resource.org/rss/1.0) and 2.0 (http://www.rssboard.org) and the Atom standard (http://atompub.org). Each format follows, however, XML-based languages (Harold, E.R., 1999), and each one has different features from the author’s point of view, but all are supported transparently by different software. Other data formats such as Javascript Object Notation (JSON) (http://www.json.org) are used for content representation, taking advantage of its specific features. Feed creation may be hand-rolled, semiautomated (by inserting only content), or fully-automated (by inserting type markup and content). Despite the chosen format, the information provider exports content as feed as a means of a list of news item by
410
IADIS International Conference WWW/Internet 2008
publishing the file in a location in order to be received quickly in a single point. The rendering phase involves grabbing content from the feed, reproducing it, and republishing on a website. The aggregator is the type of software that is devoted to this goal. There are different types of aggregators that could be implemented as standalone client, browser add-ins, or server-based or web-based service. A distinction may be made between client-side and server-side logic solution to approach the data format and visualization techniques.
3. DEVELOPMENT ASPECTS: DATA FORMATS AND VISUALIZATION Feed creation needs a format data choice. This influences the successive phases of parsing and visualizing and impacts on performance goals. An aggregator should retrieve the feed from the network, and thus processing implies a network load. However, performance may be different according that the feed rendering is being made by the server-side through dynamic content programming languages (i.e., PHP, Ruby) or the client-side (i.e., Javascript) approaches. In any case, Web 2.0 technologies such as Ajax (Van Der Vlist, et al., 2006) are preferred. Ajax is a web application model that is composed of several technologies focused specifically around the asynchronous loading and presentation of content: XHTML/CSS for style presentation, the DOM API exposed by the browser for dynamic display and interaction, asynchronous data exchange, and browser-side scripting. Thus, this implies often the combination of code directly embedded as well as scripting APIs or libraries. The approach that makes use of third-party products helps to involve rich Internet applications with benefits regarding server overhead. In our scenario, implementation approaches have involved two solutions: a server-side technique by using PHP and Ajax or APIs and Ajax as an example of the client-side approach.
3.1 Feed XML Dialects: RSS, Atom and JSON Feed format is important in the processing ability. XML languages describe information as structured data stands in a way that enables processing and interoperability among web applications. For this reason, specifications and standards initially developed for feed implementations such as RSS version 2.0 and Atom version 1.0 are based on such a language. Both describe content structured with XML tags and organized in feed items, each one with different specific attributes to specify a title, link, description or other information about the news item. However, other alternatives seem to be going in several directions with respect to the relationship between programming languages objects and XML such as JSON, a lightweight datainterchange format based on a subset of the Javascript language, which is going to become a light alternative to XML. Such a format is thus being used to describe the feed format in order to overcome the differences among feeds formats. A JSON feed is formatted as a Javascript Object described in a text format that describes content items such as a collection of name/value pairs (which could be realized in various languages according a specific data structure such as an object, record, hash table, etc.) and an ordered list of values (which could be realized as an array, vector, list, or sequence). An example of the different feed representations following data formats is shown in Figure 2.
Figure 2. Feed format representations according to RSS specification and JSON
JSON is similar to XML because they are both self-describing, meaning that values are named, hierarchical, and can be parsed and used by lots of programming languages, but JSON has a simple format, it
411
ISBN: 978-972-8924-68-3 © 2008 IADIS
is easy for humans to read and write and for machines to parse and generate, and it is programming-language independent. These features make JSON as a new format in the Web 2.0 environment. However, in the first phase, the decision was to create feeds that follow the most-used formats which are RSS 2.0 and Atom 1.0.
3.2 Feed Rendering: Server-side vs. Client-side Technology Parsing and visualization techniques may use specific applications developed with the most-used web programming languages (i.e., PHP, Ruby, Java) or APIs, which are available from third-parties. Many APIs related to Web 2.0 services have been published by the major business companies (i.e., Google or Yahoo). An API is an interface provided by an application that lets users interact with or respond to data or service requests from another program, other applications, or websites. APIs facilitate data exchange between applications, allow the creation of new applications, forming the concept of the web as a platform. In the example scenario, several applications both client-side (i.e. Google Ajax Feed) and server-side (i.e. the Dynamic Drive Advanced RSS ticket a combination of PHP-based parser that uses a PHP class for retrieving and parsing the feed and Ajax and DHTML standards to dynamically display the feed) API that follow the two methodologies have been tested by each website such as and the, whose resultant effect is shown in each organizational website of the research institute (see for example http://www.oapd.inaf.it or http://www.brera.inaf.it). In specific Google makes available for developers many APIs regarding its main applications. Among these, the Google Ajax Feed API has been developed to allow aggregations of feeds and other content and as an API could be mashed up with other APIs. The Ajax feed API transforms the RSS/Atom feed into an XML or JSON or mixed format by mapping the attributes in order to access the feed uniformity. API’s use requires a domain-based API key needed by Google for using the interface and then it is possible to retrieve the feeds, associate it to a JSON/XML format and then display each entry through properties exposed by the API. Each feed is associated to the google.feeds object, and its properties could be processed by Javascript functions. Moreover, the google.feeds has the FeedControl class, which allows for control of the display. Producing instructions for inserting feed visualization into home pages is simple. Both solutions require modifying the section of the home page to include scripts, and then inside the section to insert the script calls (in the first case) or a simple dynamic feed control inside a
element.
4. FEEDS WITHOUT SERVER SIDE SCRIPTS: MASHUPS AND YAHOO! PIPES Retrieving and composing content from different sources is also the scope of web mashups. The mashup term, derived from a musical approach to collect several audio chunks in a single piece, refers to websites or applications that combine content in an integrated application. The technology is also used in combination with feed to provide value-added services. Based on the concept of service composition in SOA, a mashup application takes advantage of content providers that may expose their content through web protocols (REST/SOAP web services), composes them, and is rendered graphically on a client’s browser where user interaction takes place. Their implementation is thus realized by using server-side dynamic content generation technologies (such as Java servlets, PHP, or ASP) or directly within the client’s browsers through client-side scripting (JavaScript). Among different types of mashups, news mashups could take advantage of user syndication technologies to aggregate feeds over the web. Specifically, the use of such a technique is implemented by different tools that have been developed by several business companies for easy mashup creation. An example is Yahoo! Pipes which allows create advanced RSS-based mashups, Microsoft Popfly, which blocks to be created that are chunks of code that wrap complex operations, such as retrieving data from a website or Google Mashup tools as a set of tools for developers. All these tools free developers from the complexity of web service composition. To test the efficacy of this solution, we examined Yahoo Pipes (Figure 3). This is a service that lets a developer take information on the web, process it, and then format it. In our scenario, we can ask Yahoo Pipes to retrieve the feed and display on the pages without the need of server-side scripts.
412
IADIS International Conference WWW/Internet 2008
Figure 3. Mashup composition and visualization of the feed by using Yahoo Pipes.
Usually, RSS visualization requires a web server to retrieve and process it before it can be sent on the browser. Yahoo Pipes retrieves the pipe’s data as JSON and JSON can be directly retrieved by the web browser (the “server” work is made by Yahoo). Moreover, the same tool could be used to transform the RSS specifications into the JSON format in order to test the performance effect of such an implementation. However, mashup is a new and immature technology, and many problems remain.
5. CONCLUSION Web 2.0 promotes web experiences that encourage user participation in sharing and enriching service that are creating a service architecture on the web. Web 2.0 has evolved as a wide availability of web applications as APIs or web services. Moreover, the availability of such services and APIs introduce a new way of developing web applications that are no longer related to skilled knowledge of technologies. The latest technologies allow the development of rich user interface that can hide system complexity. This APIs expose informational services on the web and take many forms of remote invocation of functions using standard web protocols and XML for data representations (REST, SOAP/WSDL, XML-RPC and other approaches). In the scenario of providing a best information distribution, web 2.0 approaches have been studied. In particular server-side and client-side approaches have been implemented in order to render feeds inside a local home page. Moreover, the mashup approach has been analyzed as a new technology for a best information distribution thanks to the availability of APIs and interfaces that help content composition and simple editor products simplifying the work. In future work, we will mainly focus on the mashup development.
REFERENCES Murugesan S., 2007. Understanding Web 2.0. In IT Professional, Vol. 9, Issue 4, July-Aug. 2007, pp. 34-41 Erl, T., 2004. Service-Oriented Architecture: A field guide to integrating XML and web services”, Prentice Hall PTR. Cerami, E., 2002. Web Services Essentials, O’Reilly Media, 1st edition. Greaves, M., 2007. Semantic Web 2.0. In IEEE Intelligent System, IEEE Computer Society. Volume 22, Issue N.2, March/April 2007, pp. 94-96 Farrell, J., Nezlek, G. S.; 2007. Rich Internet Applications: the Next Stage of Application Development. Proceedings of Information Technology Interfaces Conference, Cavtat, Croazia, page(s):413 – 418. Goh, C. M, Lee, S. P., He, W., and Tan P.S., 2007. Web 2.0 concepts and technologies for dynamic B2B integration. Proceedings of IEEE Conference on Emerging Technologies and Factory Automation, pp. 315-321 Zepeda, J. S.; Chapa, S., 2007. From Desktop Applications Towards Ajax Web Applications. Proceedings of Electrical and Electronics Engineering Conference, page(s):193 – 196. Hammersley, B., 2005. Developing Feeds with RSS and Atom, O'Reilly Media, Inc. Simpson, D., L.,2005. Content for one: developing a personal content management system, Proceedings of the 33th annual ACM SIGUCCS conference on User services, pp. 338-242. Van der Vlist, E., Ayers, D., Bruchez, E., et al., 2006. Professional Web 2.0 Programming. Wrox Professional Guides, Wrox; 1 ed.
413