ty of information âso critical for the whole Web application eras. ..... assessing in early Web development stages to assure the quality of delivered products.
Chapter X
Modeling Content Quality for the Web 2.0 and follow-on Applications1 Roberto Sassano University of Trento, Italy Luis Olsina National University of La Pampa, Argentina Luisa Mich University of Trento, Italy ABSTRACT The consistent modeling of quality requirements for Web sites and applications at different stages of the life cycle is still a challenge to most Web engineering researchers and practitioners. In the present chapter we propose an integrated approach to specify quality requirements to Web sites and applications. By extending the ISO 9126-1 quality views’ characteristics, we discuss how to model internal, external quality and quality in use views taking into account not only the software features but also the own characteristics of Web applications. Particularly, we thoroughly describe the modeling of the content characteristic for evaluating the quality of information –so critical for the whole Web application eras. The resulting model represents a first step towards a multi-dimensional integrated approach to evaluate Websites at different lifecycle stages.
KEY WORDS Content Characteristic, Information Quality, Quality Models, ISO/IEC 9126-1 Standards, Web 2.0.
INTRODUCTION The Web platform –as other Internet service- is approaching to the 20 years of existence. Web sites and applications (WebApps) built on top of this platform have become one of the most influential developments in the recent computing history with real impact in individuals and communities. For instance, as cited in Murugesan (2007: p.8), websites such as amazon.com, google.com, yahoo.com, myspace.org, wikipedia.org, ebay.com, youtube.com, napster.com, blogger.com and saloon.com are considered as the top ten websites that changed the world. Some of these sites have over 100 million users (yahoo.com, ebay.com, myspace.com); about 1 billion visits a day (wikipedia.org); over 1 billion searches per day (google.com). As the Web usage continues to grow, users expect WebApps to be more mature and useful with regard to the functions, services and contents delivered for their intended goals and tasks at hand. In addition, they expect these functions and contents be meaningful, accurate, suitable, usable, reliable, secure, personalised, and ultimately with perceived quality. Despite the major breakthrough in Web developments and technologies there are some Web Engineering branches still in their infancy, with lack of a wide consensus: Modeling of quality requirements for WebApps at different lifecycle stages is one of them. In this chapter we propose – based on related research literature and some ideas published in Olsina et al (2005)- an integrated approach to specify quality requirements for contents, functionalities and services to WebApps.
1
Pre-print: see final version at http://www.igi-global.com/chapter/modeling-content-quality-web-follow/39181
Websites, from the very beginning, were conceived as document-, content-oriented artifacts. Few years later, websites started to provide not only contents but also software-like functionalities and services. Since then, WebApps have at a fast pace emerged for many different sectors as e-commerce, e-learning, e-entertainment, and so forth. After that epoch named Web 1.0, a new recent era that considers a set of strategies and technologies focusing on social networking, collaboration, integration, personalization, etc. is emerging –currently called Web 2.0 and follow-ons. We argue that WebApps will continue being centered on functionalities, services and contents independently of the new technologies and collaboration strategies. However, as aforementioned a challenging issue still is how to specify and assess the quality and the quality in use of WebApps at different lifecycle stages since their intrinsic nature, even if the role of communication and relationships is increasing more and more, will continue being both content and function oriented. In the present work, by reusing and extending some ISO 9126-1 (2001) quality views’ characteristics, we discuss how to model internal, external quality, and quality in use views taking into account the previous concerns. Particularly, we thoroughly discuss the modeling of the content characteristic for evaluating the quality of information, which is not included in this standard and is so critical for the whole WebApp eras. Moreover, we underline how some specific attributes could be considered to Web 2.0 and follow-on applications. So far no integrated approach for WebApps that considers all these dimensions at the same time to different lifecycle stages has been issued as we know – not even the in-progress SQuaRE project intended to replace and to make more consistent many ISO standards related to quality models, measurement and evaluation processes (ISO 15939, 2002; ISO 14598-1, 1999). The rest of this chapter proceeds as follows. First, we give an overview of the Web eras as well as the unique intrinsic features of WebApps compared to traditional software applications. A review of the ISO quality models to specify non-functional requirements at different lifecycle stages of software products follows; besides, we also discuss what is missing in these quality models for the Web features highlighted in the previous section. Then we illustrate the proposed extension of the ISO quality models’ characteristics in order to specify the quality of information. In the sequel, we analyse our proposal in the light of related work, and, finally, we draw the main conclusions and future trends.
FEATURES OF WEB ERAS AND APPLICATIONS: AN OVERVIEW The first goal of this section is to introduce the main features of the Web evolution, but without deepening in the different dimensions of eras, while they are detailed commented in Murugesan (2007); the second goal is to outline the distinctive intrinsic features of WebApps compared to traditional software applications. The first WebApps can be grouped in the Web 1.0 era, and they can be categorized into static and dynamic; most recent WebApps can be grouped in the so-called Web 2.0 era as per O’Reilly (2005). These allow people collaborate, share and edit information online in seemingly new ways of interaction. Others applications could be grouped in the mobile Web era, where applications could offer some additional features such as personalization and contextaware capabilities and services; and the semantic Web era (Berners-Lee et al, 2001), where applications offer the automatic processing of information meaningfully. According to Murugesan (2007: p.11), at the very beginning, most websites were a collection of static HTML pages intended to deliver just information. After a while, WebApps became dynamic delivering pages created on the fly. The ability to create pages from the data stored on databases enabled Web developers to provide customized information to the visitors, in addition to more complex functionalities. However, “… these sites provided primarily one way interaction and limited user interactivity. The final users had no role in content generation and no means to access content without visiting the sites concerned”. In the last few years, a new class of Web 2.0 WebApps have emerged. Examples of these applications include social networking sites such as ‘myspace.com’, media sharing sites such as ‘youtube.com’ and collaborative authoring sites such as ‘wikipedia.org’. We can feature Web 2.0 WebApps as follows: 2.0 and follow-on applications, as we comment in the next section.
Cancelled pages are available in the full version of this document at http://www.igi-global.com/chapter/modeling-content-quality-web-follow/39181 EXTENDING THE ISO QUALITY MODELS TO INFORMATION QUALITY As aforementioned, information has added value over data, and hereafter we consider Web information as Web content, which can be textual or other media. Hence, we define content as “the capability of a Web product to deliver information which meets stated and implied needs when used under specified conditions”. Taking into account previous contributions made in the area of information quality –as we will discuss in the related work section-, we have primarily identified for the content characteristic four major sub-characteristics, which can help to evaluate information quality requirements for WebApps. This initial proposal was made in Olsina et al (2005, pp. 122-123). In the present work we contribute by redefining them (see Table 4) and also by extending and defining the sub-sub-characteristics (see Figure 2). The content sub-characteristics are: first, content accuracy, which addresses the very intrinsic nature of the information quality; second, content suitability, which addresses the contextual nature of the information quality; it emphasizes the importance of conveying the appropriate information for user-oriented tasks and goals; in other words, it highlights the quality requirement that content must be considered within the context of use and the intended audience; third, content accessibility, which emphasizes the importance of technical and representational aspects in order to make Web contents more accessible for users with various disabilities as regarded in the WAI-WCAG initiative (W3C, 1999); and lastly, legal compliance, as defined in Table 4. Table 4. Definition of the proposed Content sub-characteristics for specifying information internal and external quality requirements. Subcharacteristic Definition Content The capability of a Web product to deliver information that is correct, Accuracy credible and current. Content The capability of a Web product to deliver information with the right covSuitability erage, added value, and consistency, considering the specified user tasks and goals. Content The capability of a Web product to deliver information that is accessible Accessibility for all users (with or without disabilities) taking into account both technical and representational aspects. Content Legal The capability of the Web product to adhere to standards, conventions, and Compliance legal norms related to content as well as to intellectual property rights. In Figure 2, we define the sub-sub-characteristics for both the content accuracy and content suitability dimensions. Some of them could be just treated either as measurable attributes or as sub-dimensions to which attributes should be further associated accordingly. On the other hand, Figure 2 does not show the content accessibility sub-characteristic decomposition for instance. For space reasons we will model its sub-sub-characteristics and attributes in other related paper/works; however, the reader may surmise that the WCAG’s 14 content accessibility guidelines and the 65 checkpoints are very useful for this purpose, and could be reused completely, even the available tools???. Accessibility guidelines deal with both representational and technical aspects such as navigability and orientation, device independence transformations, equivalent alternatives to auditory and visual content, internationalization, among others. 1.
Content Accuracy 1.1. Correctness, the extent to which information is reliable in the sense of being free of errors. Note: information errors can be both syntactic and semantic; for example,
2.
semantic errors can be checked by known experts, or by peer contributors like in Wikipedia relying on the concept of ‘wisdom of crowds’. 1.2. Believability (synonym: Credibility), the extent to which the information is reputable, objective, and verifiable. 1.2.1. Authority (synonym: Reputability), the extent to which the source of the information is trustworthy. Note: it is well known that almost anyone can become a Web publisher and collaborate with content edition, even more in Web 2.0 applications. Although it is one of the Web 2.0’s great strengths also poses new evaluation challenges. 1.2.2. Objectivity, the extent to which the content (i.e., information or facts) is unbiased and impartial. 1.2.3. Verifiability (synonym: Traceability), the extent to which the owner and/or author of the content can be verified. Note: This also poses new evaluation challenges for Web 2.0 applications, because it relies often on social control mechanisms. On the other hand, we can trace versions, as in Wikipedia for instance. 1.3. Currency (synonym: Up-to-dateness), the extent to which the information can be identified as up to date. Note: Attributes of information currency like creation, posted, and revised dates can be used. Content Suitability 2.1. Value-added, the extent to which the content can be novel, beneficial, and contribute to react to a given user for the task at hand. 2.1.1. Novelty (synonym: Freshness), the extent to which the information is fresh and contributes to make new decisions for an intended user goal. 2.1.2. Beneficialness, the extent to which the information is advantageous and contributes to make new decisions for an intended user goal. Note: e.g., in marketing this attribute is related to the image of the company or organization as it is projected by the website and also to the target identified as relevant by the marketing people (Mich et al., 2003b). 2.1.3. Reactiveness, the extent to which the information is compelling and contributes to react for an intended user goal. 2.2. Coverage, the extent to which the content is appropriate, complete but also concise for the task at hand to a given user. 2.2.1. Appropriateness, the extent to which the information coverage fits to an intended user goal. 2.2.2. Completeness, the extent to which the information coverage is the sufficient amount of information to an intended user goal. Note: e.g. see the shopping cart case study performed in Olsina et al (2007b, p. 417), where five attributes for completeness and appropriateness have been measured and evaluated. 2.2.3. Conciseness, the extent to which the information coverage is compactly represented without being overwhelming. Note: e.g. to the writing for the Web heuristic, usually, shorter is better. 2.3. Consistency, the extent to which the content is consistent to the site’s piece of information or page with respect to the intended user goal.
Figure 2. Definition of the proposed Content Accuracy and Suitability sub-dimensions for specifying internal and external information quality requirements. As regard the last sub-characteristic –Content Legal Compliance- the pervasive nature of WebApps demands an increasingly attention to laws, regulations and policies. To align Web content and the uneven international regulations is a challenging issue: both cooperation with lawyers and supporting tools are needed. Also, sub-characteristics for Legal Compliance have to be defined accordingly to a given WebApp specific rules. Even if we have identified some attributes, we are not addressing this aspect in this work. In addition to the above content sub-characteristics, others to information architecture and organization could be addressed. Many of these sub-characteristics, such as global under-
standability –implemented by mechanisms that help to understand quickly the structure and content of the information space of a Web site like a table of contents, indexes, or a site map-, learnability, and also operability and attractiveness, can be related to the usability characteristic. Besides, other particular features of WebApps such as search and navigation functionalities can be specified in the functionality sub-characteristics; for example, are the basic and advanced search suitable for the end user? Or, are they tolerant of misspelled words and accurate in retrieving documents? In the same way, we can represent link and page maturity attributes, or attributes to deficiencies due to browsers’ compatibility into the reliability subcharacteristics. On the other hand, from the quality in use perspective, we have proposed to use the ISO model. However, for the satisfaction characteristic, specific (questionnaire) items for evaluating quality of content should be included. Also, for other quality in use characteristics such as effectiveness and productivity, specific user-oriented evaluation tasks that include performing actions with content and functions can be designed and tested. Finally, we have performed some preliminary studies. One of them was a quantitative evaluation for a shopping cart of a Web 1.0 app, where the content accuracy and suitability subcharacteristics have intervened –see Olsina et al (2007b). Recently, a qualitative evaluation was made with the aim of comparing two WebApps that belong to the tourism domain –see Sassano (2008). In particular, we have evaluated the content accuracy and suitability in addition to the accessibility and legal compliance sub-characteristics on opodo.co.uk, a Web 1.0 app, and on tripadvisor.com, which belong to the Web 2.0. The external quality evaluation was based on checklist considering a question for each sub-dimension of Figure 2 as well as to the content accessibility and content legal compliance sub-dimensions. Two experts have intervened in the inspection. Though we have non-conclusive evidence from this study, some initial comments and observations can be drawn about content features that distinguish Web 1.0 from Web 2.0 apps. First of all, it should be highlighted that the process of content production in Web 1.0 apps pursues rather a top-down approach, i.e., only content providers supply information to users. Conversely, in Web 2.0 apps this process becomes rather bottom-up; that is, mainly final users upload and update information. Moreover, content is submitted to a social control mechanism since users can share, edit, or comment content of other users, as happens in …/using for exmaple. blog, wiki, social network. In fact, initial observations have shown that some kind of information may be considered more accurate and suitable in ‘tripadvisor.com’ than in ‘opodo.co.uk’; particularly, information referring for instance to hotel review, location comment. In contrast, information as flight timetables, holiday price lists, etc. can be considered more accurate and suitable in ‘opodo.co.uk’. This fact is confirmed by recent companies strategies; Expedia has bought Tripadvisor to reinforce its business model with both the mechanisms of information provision. Lastly, in general terms we argue that the WebApp’s content quality does not depend on the kind of applications –whether Web 1.0 or Web 2.0; however, some kind of contents and services are more appropriate for Web 1.0 apps, while others for Web 2.0. Ultimately, we can state the content sub-characteristics we have specified for evaluation purposes can be applied to all WebApps, independently from which era they belong.
RELATED WORK The model presented in this paper, as an extension of the ISO 9126-1 quality models, has been elaborated taken into account also related researches about dimensions of data and information quality. We next remark the main information quality models, often called frameworks in the literature, developed over the last years. The reader can, however, find broader reviews for WebApps, for instance, in Knight & Burn (2005) and Parker et al (2006). It is worth mentioning that the difference in meaning between data and information –as remarked in the previous section- has often been neglected in these quality frameworks. Moreover, very often –and in some cases explicitly- these terms are used interchangeably.
One of the first studies intended to categorize data quality dimensions was made by Strong et al (1997). The focus in their work was on considering the dimensions of data quality for three user roles, i.e. data consumer, data custodian, and data producer. According to the authors, high quality data is data that is fit for use by the intended users. They developed a framework made up by four categories –intrinsic, contextual, representational, and accessibility- including 16 dimensions of data quality. Specifically, the intrinsic category indicates that information has its own quality per se. It contains four dimensions: accuracy, objectivity, believability, and reputation. The accessibility category states that information must be easily accessible but secure. It includes: accessibility, and security dimensions. The third category is contextual data quality, which indicates that information should be provided in time and in appropriate amounts. It includes: relevancy, value-added, timeliness, completeness, and amount of data. The last category is representational data quality, which focuses on format of data/information and its meaning. It includes: interpretability, ease of understanding, concise representation, and consistent representation. As a matter of fact, the Strong et al data quality framework was initially developed for traditional information systems. Nevertheless, this model has been used for WebApps too. For instance, Katerattanakul & Siau (1999) reuse the four categories and the characteristics including free-of-error webpage content, workable and relevant hyperlinks, and the navigational tools provided for accessibility. In a recent study, Caro et al (2007) have reused the Strong et al framework for modeling data quality of Web portals from the data consumer viewpoint. All these data quality frameworks neither consider different lifecycle stages of a WebApp and therefore different quality models as we propose, nor make any distinction between data and information quality. A different slightly way to model and evaluate the quality of information for a WebApp – both at page and site level- is proposed by Alexander & Tate (1999). They take into account six dimensions (criteria) such as authority, accuracy, objectivity, currency, orientation, and navigation. Authors include a checklist for a step-by-step quality evaluation to some kind of websites, namely, for the advocacy, business, informational, personal, news, and entertainment sectors. They evaluate information rather than data without considering different information quality models at different WebApp lifecycle stages. The first published study about extending the ISO 9126 model has been made by Zeist & Hendricks (1996). In a nutshell, their extended model consists of adding some subcharacteristics for each characteristic, with the aim of specifying data/information quality. Unfortunately, this study is quite limited because at that moment the ISO standard did not consider the internal, external, and quality in use views –as these were included only in the 2001 revised standard. Finally, as mentioned in the introduction, there exists an ongoing SQuaRE project that proposes harmonizing many ISO standards related to quality models, measurement and evaluation processes (ISO 15939, 2002; ISO 14598-1, 1999). According to Vaníček (2005: p.139) “these standards have not a unified terminology and do not fully reflect the current state of art in software engineering”. In his contribution he proposes a data quality model regarding the three ISO views, but these models are just for data (data as a new entity???) separated of the quality models for software functions. As the author is aware “the main problem concerning the development of new SQuaRE series of standard and also concerning the data quality standard is the enormous volume of standardisation documents … If we extend the number and span of standards, nobody will use them” Vaníček (2005: p.145). Differently of the SQuaRE approach, our aim is modeling nonfunctional requirements for WebApps’ functions, services and content, taking into account the three integrated quality models and Web lifecycle views.
FUTURE TRENDS AND CONCLUDING REMARKS While users are becoming more and more mature in the use of WebApps and tools, there is greater demand for the quality of these applications that should match real user needs in actual working environments. Hence a natural trend is the greater demand of quality planning and
assessing in early Web development stages to assure the quality of delivered products. On the other hand, most WebApps, besides the increasing support to functionalities and services will continue aiming at showing and delivering multimedia content. This basic feature stemming from the early Web 1.0 applications is currently empowered by the Web 2.0 and follow-on applications. Web 2.0 applications rely strongly on actual users sharing, collaborating and performing content tasks in real contexts of use. So other probably trend will be the greater demand for Web 2.0 application evaluations from the quality in use perspective. This could imply more pervasive user-centered testing and evaluations that is nowadays happening. As concluding remarks, in this chapter we have proposed how to specify quality requirements for functionalities, services and content for WebApps employing a minimalist and integrated approach. By reusing and extending the ISO 9126-1 quality models’ characteristics, we have discussed the need of modeling and adding the content characteristic for evaluating the quality of information. Specifically, we have argued that the internal and external quality models with the set of six characteristics, i.e. functionality, usability, reliability, efficiency, maintainability and portability, and their sub-characteristics respectively, are not sufficient to specify WebApps’ information quality requirements. As a consequence, we have proposed to include in both models the content characteristic and its sub-characteristics, i.e. content accuracy, content suitability, content accessibility, and content legal compliance. Besides, from the quality in use perspective, we have proposed to use the same ISO model. At the most, by redesigning accordingly the tasks in order to include content goals, and, by adding to the satisfaction characteristic specific (questionnaire) items for evaluating quality of content it would be enough. Ultimately, we have tried to give a quasi-standard and integral solution to the current concern which is how to identify and model WebApps’ quality and quality in use requirements at different lifecycle stages. Lastly, we did not discuss in this chapter –for space reasons- how to split the content subcharacteristics into measurable attributes or into some questionnaire items. Nor we gave clues about how different methods and techniques could be fit in for measurement and evaluation purposes. This will be discussed in an upcoming paper running a real case study as well. Acknowledgement. This work and line of research are supported by the following projects: UNLPam 09/F037 and PICTO 11-30300, from the Science and Technology Agency, Argentina. The Roberto Sassano’s scholarship to GIDIS_Web group was also funded by the University of Trento grant, Italy, in the framework of the University of Trento and National University of La Pampa agreement.
REFERENCES Alexander J. E. & Tate M. A. (1999). Web Wisdom: How to evaluate and create information quality on the web. Mahwah, NJ: Erlbaum. Berners-Lee, T. Hendler J. & Lassila O. (2001). The Semantic Web. Scientific American. Caro A., Calero C., Mendes E., & Piattini M. (2007). A Probabilistic Approach to Web Portal's Data Quality Evaluation, Proceedings of the IEEE 6th International Conference on the Quality of Information and Communications Technology, Lisbon, Portugal. pp. 143-153. Covella G. & Olsina L.; (2006). Assessing Quality in Use in a Consistent Way. In ACM proceedings, Int'l Congress on Web Engineering, (ICWE06), San Francisco, USA, pp. 1-8. ISO 14598-1. (1999). International Standard, Information technology - Software product evaluation - Part 1: General Overview. ISO 15939. (2002). Software Engineering - Software Measurement Process. ISO/IEC 9126-1. (2001). Software Engineering— Software Product Quality—Part 1: Quality Model, Int’l Org. for Standardization, Geneva.
Katerattanakul P. & Siau K. (1999). Measuring information quality of web sites: Development of an instrument. Proceedings of the 20th international conference on Information Systems. Charlotte, North Carolina, United States. pp.279–285 Knight S. & Burn J. (2005). Developing a Framework for Assessing Information Quality on the World Wide Web. Information Science Journal. Vol. 8. Mich L., Franch M., & Cilione G. (2003a). The 2QCV3Q Quality Model for the Analysis of Web site Requirements. Journal of Web Engineering, 2(1-2): pp. 105-127. Mich L., Franch M., & Gaio L. (2003b). Evaluating and Designing the Quality of Web Sites. IEEE Multimedia, (10)1: 34-43. Murugesan S. (2007). Web Application Development: Challenges and the Role of Web Engineering. Chapter in a Springer Book titled Web Engineering: Modeling and Implementing Web Applications. Rossi G., Pastor O., Schwabe D., and Olsina L. (Eds). pp. 7-32. Olsina, L.; Covella, G.; & Rossi, G.; (2005). Web Quality. Chapter in a Springer Book titled Web Engineering. E. Mendes, N. Mosley (Eds.). pp. 109–142. Olsina L; Papa F. & Molina H. (2007a). How to Measure and Evaluate Web Applications in a Consistent Way. Chapter in a Springer Book titled Web Engineering: Modeling and Implementing Web Applications. Rossi G., Pastor O., Schwabe D., Olsina L. (Eds). pp. 385–420. Olsina L.; Rossi G; Garrido A.; Distante D. & Canfora G. (2007b). Incremental Quality Improvement in Web Applications Using Web Model Refactoring. Springer LNCS 4832, IWWUA’07, M. Weske, M.S. Hacid, C. Godart (Eds.): pp. 411–422. O'Reilly T. (2005). What is Web 2.0? Design Patterns and Business Models for the Next Generation of Software. Retrieved August 27, 2008, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html Parker M., Moleshe V., De la Harpe R. & Wills G. (2006). An evaluation of Information quality frameworks for the WWW. In: 8th Conf. on WWW Applications, Bloemfontein, South Africa. Sassano R. (2008). Content quality model for Web 2.0 websites, Degree Thesis (in Italian), University of Trento, Italy. Strong D., Lee Y. & Wang R. (1997). Data Quality in context. Communications of the ACM, 40(5): 103-110. Vaníček J. (2005). Software and Data Quality. In: Proceedings conference Agricultural perspectives XIV Czech University of Agriculture Prague, September 20–21. W3C, (1999). WWW Consortium, Web Content Accessibility Guidelines 1.0. Retrieved August 27, 2008, http://www.w3.org/TR/WAI-WEBCONTENT/ Zeist R.H.J. & Hendriks P.R.H. (1996). Specifying software quality with the extended ISO model. Software Quality Management IV – Improving Quality, BCS, 145-160.