Evaluating Web Document Quality with Linguistic Variables: Combining Informative and Page Design Quality ´ Elena Garc´ıa Miguel-Angel Sicilia Tomasa Calvo Computer Science Department Computer Science Department Computer Science Department University of Alcal´a University of Alcal´a University of Alcal´a Ctra. Barcelona km.33.6 Ctra. Barcelona km.33.6 Ctra. Barcelona km.33.6 28871 - Alcal´a de Henares 28871 - Alcal´a de Henares 28871 - Alcal´a de Henares Madrid (Spain) Madrid (Spain) Madrid (Spain)
[email protected] [email protected] [email protected]
Abstract The concept of Web document quality is multidimensional and difficult to characterize. For assessment purposes, it can be separated in two main concerns: content or informative quality and design quality, being the latter essentially connected to the concept of Web usability. In addition, usability assessment is usually obtained from linguistic judgements, vague guidelines or uncertain indicators, which points out the suitability of a linguistic assessment framework. In this paper, a definition for such linguistic framework is described, along with an exploration of the possible approaches to combine linguistic usability assessments with content-quality judgements. Keywords: Usability Analysis, Fuzzy Set Theory, Web Page Quality, Linguistic Information Processing.
1
Introduction
Web documents can be considered as hypermedia nodes that expose information by using a purposeful interface design for which a significant amount of widely accepted guidelines and principles have been proposed [18, 19]. Thus, the usability of Web pages is a principal quality criteria that can be considered as orthogonal to informative quality, which
concentrates on the quality of the contents. Many usability evaluation methods are currently used both by researchers and by practitioners [16, 3], but there exists a lack of commonly accepted metrics or benchmarks, which hampers the feasibility of building systems that automate the filtering of Web pages according to their usability. A number of recent tools [11] and studies [12] focus on the building of usability analysis tools, that use usability indicators to obtain automated assessments of Web pages. But in any case, usability evaluation is inherently subject to uncertainty and vagueness. Uncertainty comes from subjectivity in expert or user judgements — arising, for example, in inspection methods [17] or usability questionnaires [14]—, and vagueness is related to the approximate nature of most usability guidelines [5]. As pointed out elsewhere [10], fuzzy models can be used to approach the problem of identifying the quality of Web documents by explicitly considering linguistic assessments and imprecise feedback, but existing contentoriented frameworks [8] should be extended to consider also the “form” of the documents, that is, their properties w.r.t. usability. Moreover, current content-presentation separation practices for Web documents recommend separating content and presentation physically, resulting in two distinct sets of artifacts. On the one hand, content is expressed in some XML1 sub–language, following a concrete semantic schema. On the other hand, several presentations are generated for those 1
http://www.w3.org/TR/REC-xml
schemas by using XSL stylesheets. It can happen that a given high-quality XML document becomes an unusable page when transformed into a Web page by a concrete XSL stylesheet2 , resulting in an artifact with low usability irrespective of the informative quality of the original document. Consequently, evaluation frameworks like the one described in [8] can be extended to integrate usability considerations, resulting in a model that judges the quality of pairs (si , {dj }), where si ∈ S represents a stylesheet, and the set {dj }, dj ∈ D denotes the set of XML documents that are used by the stylesheet to produce the Web document. In this paper, we approach the problem of modelling usability assessments from a linguistic perspective, assuming that these assessments are vague and uncertain, and that they would eventually be combined with content quality assessments also expressed linguistically. The rest of this paper is structured as follows. In Section 2, we discuss linguistic models of Web page design quality. In Section 3, the problem of combining such quality assessments with content-based ones is approached from the perspective of an existing linguistic modelling framework. Finally, conclusions and future research directions are provided in Section 4.
2
Linguistic Models of Web Page Design Quality
The development of a linguistic framework for Web usability assessments requires addressing several fundamental problems in which uncertainty and vagueness play a role: i First, the concrete concept of usability must be analyzed and eventually broken down in more concrete aspects that are more easily measurable. ii Once the usability components have been established, a measurement instrument is required for each of them. iii Then, an aggregation device is required 2
http://www.w3.org/Style/XSL/
to obtain an overall figure of usability, taking into account that usability components may not be independent but interacting. Problems [i–iii] result in a wide variability for possible approaches to assess usability, depending on the measurement and aggregation instruments, and on the number and granularity of the usability sub-aspects considered. But in any case, uncertainty and imprecision arise as inherent characteristics of the process due to a number of sources. On the one hand, the concept of usability and its contributing aspects ([i]) are not clearly defined [21], and the relationships among sub-aspects ([ii]) are also subject to controversy [4]. Some recent work [20] has tried to capture such imprecise relationships through fuzzy measures, and Takagi-Sugeno approximation has been proposed elsewhere [22] as a way to model empirical fuzzy data. On the other hand, measurement of usability is always subject to imprecision ([ii]) due to linguistic judgement [6] or to vagueness in guidelines [5]. As a result of that complexity, no single model currently exist that covers points [i–iii] in a comprehensive way. Since our focus is on providing the higher degree of automation to assessment tools, from here on we will focus on a simplified model that uses a granulated version of some empirically validated metrics [12] that are easy to automate. along with a consideration of the user model used to provide personalization regarding to structural properties (in the sense given in [15]). Our assessment framework can be expressed as a filtering function that assigns a value in a linguistic scale to each Web document i originating from the application of a stylesheet to one or several XML documents (or directly to conventional HTML Web pages, which can be considered as a fixed stylesheet–XML document pair), denoted as in (1). ri (s, {d}, u) : S × 2D × U → L
(1)
U denotes the user model — describing the characteristics of each user u ∈ U — and L =
{li }, i ∈ {0, . . . , K} is defined as a finite and totally ordered label set with odd cardinality and si ≥ sj if i ≥ j. The properties and trapezoidal form of the linguistic labels in L are the same of those described in HerreraViedma and Peis’ linguistic framework [9]. The values ri are obtained from our model of usability assessment in a two-step process: a First, a set of necessary conditions about the page are considered. b Second, page metrics are used to derive a linguistic value using the LOWA linguistic aggregation operator [7]. Step [a] is a way to filter out documents that are definitely not appropriate for the given user. It can be used, as a way to integrate accessibility considerations in the framework. For example, is the user model states that WCAG [2] “triple-A” conformance level is required, automatic tools like Bobby can be used to filter out non-conforming pairs (si , {dj }). Step [b] takes page metrics and expresses them in linguistic form according to existing statistical profiles. Statistical profiles are characterized in [13] through mean values for page clusters that belong to three categories (good, average, poor). In our current aggregation setting, an approximation for such domains is required, to fit the metrics to a linguistic scale of seven trapezoidal functions defined on [0,1] in the form (ai , bi , αi , βi ), with EH = Extremely High = (.98, .99, .05, .01), V H = V ery High = (.78, .92, .06, .05), H = High = (.63, .80, .05, .06), M = M edium = (.41, .59, .09, .07), L = Low = (.22, .36, .05, .06), V L = V ery Low = (.1, .18, .06, .05) and EL = Extremely Low = (.01, .02, .01, .05). Currently, such approximation is computed in an straightforward manner by mapping the [0,1] interval to an approximated interval for each metric, computed by mapping the average center to 0.5, and computing the spread of the interval as max(|good − average|, |poor − average|) · k, being k a constant that can be adjusted to
cover a more reasonable domain of values (we use k = 4 for the following values). The resulting intervals are provided in Table 1 in the form of intervals indicating the extremes and the center (i1 , i2 , i3 ). The TPC, GC and G% metrics have been eliminated since the centers (averages) of its statistical clusters are not ordered. Of course that this mapping results in a certain degree of loss of information from empirical data, but roughly preserves the form of the original page clusters. Future work should replace the current LOWA approach with an aggregation scheme supporting non-symmetric label sets. Table 1: Linguistic modelling of Web page metrics in [12] Metric
Description
i1
i2
i3
WC
Word count
119
326.2
533.4
BT%
Body text percentage
0.5613
0.7458
0.9303
EBT%
Emphasized Body text %
0.0785
0.1578
0.2372
TCC
Text Positioning count
0.3
1.9
3.5
LC
Link count
5.7
34.1
62.5
PS
Page size (K)
9.9
14,3
18.7
CC
Color count
6
7.6
9.2
FC
Font count
4.1
5.3
6.5
Then, the overall linguistic computation can be represented as the function in (2), where φ represents the LOWA operator. ½ ri =
l0 , ¬ comp(U, (si , {dj })) φ(m1 (si , {dj }), . . . , m8 (si , {dj })) , otherwise (2)
The function comp in (2) stands for a check of the necessary conditions for the document, w.r.t. the current user. For example, users with vision impairments would require WCAG compliance to . If one or more of such conditions are not satisfied, the overall page quality for the given user will be penalized (i.e. set directly to EL). The weighting vector W for the LOWA operator in (2) must be adjusted to a degree of orness compatible with existing statistical profiles. Since any of the selected metrics is significant by itself w.r.t. the overall judged quality of the page, a
¾
compensatory approach must be selected, i.e. orness(W ) > 0.5. In what follows, we describe a concrete instance of the framework presented, for illustration purposes. Let’s compare two Web pages with the same content: Bertrand Russell’s Nobel prize lecture. Page 13 provides some formatting, while Page 24 is almost plain HTML. The metrics obtained from WebTango Analysis Tool5 are provided in Table 2, along with the corresponding labels according to the above described mapping. Table 2: Example metrics for Russell’s lecture pages Metric p1 l (p1 ) p2 l (p2 ) WC 5832 EL 5735 EL BT% 0.9895 EH 0.9984 EH EBT% 0.0224 EL 0.0008 EL TCC 3 VH 0 EL LC 38 M 1 EL PS 52.5 EL 33.8 EL CC 7 M 3 VL FC 8 VH 3 L Using the weighting vector {0.3, 0.25, 0.25, 0.1, 0.1, 0.1, 0.05, 0.05} for the LOWA operator (orness(W ) = 0.828), it results in labels l2 (low) and l1 (very low) for p1 and p2 respectively. One of the usability defects of both pages is that the text has not been paginated, becoming uncomfortable to read [18]. Such simple re-arrangement would result in a good assessment of WC — e.g. to VH — and consequently, the final computed label will change to l4 (medium). Of course, more work and empirical adjustment and selection of metrics would be required to make the aggregation process better fitted from the viewpoint of usability experts. The approach described assumes that values over the ranges given in Table 1 are mapped to the closer linguistic label. This may not be
appropriate for any possible Web metric, so that a more complex mapping may be eventually required.
3
Combining Informative and Design Quality
The combination of informative and usability–oriented quality assessments can be realized as a two-step filtering process, since design quality must not be allowed to filter documents that have significantly better content quality levels (possibly except in cases of significantly low levels of usability) than the rest. In other words, design quality filters may act as a “correction” filter that gives preference to more usable documents from sets of documents with similar degrees of content quality. This can be accomplished by a LWA-based aggregation scheme (denoted by Φ) in which the label describing page design quality is given a lower importance than that of the informative quality, as expressed in (6), where ri0 denotes the informative quality assessment of element i, computed by some scheme like [9]. The computation r00 : ℵt × ℵt → {−K, . . . , +K} operates on a set of selected documents ℵt for a given topic t (this topic is required to express informative quality) and determines a total order relationship for sets of documents with similar informative quality (the concept of similarity is denoted by a predicate s in expression (6)).
Φ[(rda , w1 ), (rd0 a , w2 )] = li (3) Φ[(rdb , w1 ), (rd0 b , w2 )] = lj (4) ½ r00 (da , db ) =
∆(rda , rdb ) = i − j (5) ¾ ∆(rda , rdb ), , s(rd0 a , rd0 b ) (6) 0 , , otherwise
The comparison scheme provided in (6) has the weights w1 and w2 , the LOWA weights (underlying the definition of LWA), and the 3 http://www.nobel.se/literature/laureates/1950/russell-interpretation of “similar assessments” as palecture.html rameters. In the simplest case, we can take an 4 http://www.evilmutants.com/v1.0/people/bertrandequality comparison of labels as the concept russell/lectures/wdapi.html 5 of similar assessments, i.e. s(rd0 a , rd0 b ) ≡ rd0 a = http://webtango.ischool.washington.edu/tools/
rd0 b , and a pair of weights so that w1 > w2 , that is, page design quality is given more importance at this step. For example, if we take w1 = H and w2 = M , the LOWA weights as (0.7, 0.3), and LC~v = M IN (c, a), the effect of page design assessments can be illustrating by the following example: let’s consider page p1 and page p2 on the same topic t and with the same informative quality assessment rp0 1 = rp0 2 = H, but with rp1 = V H and rp2 = L. The resulting ∆(rp1 , rp2 ) = 3 − 2 = 1, so that p1 would be selected due to page design reasons. Label distance can be used as a metric for sorting recommendations or any other kind of adaptive behavior [1].
4
Conclusions and Future Work
A linguistic framework for the assessment of Web document quality from the perspective of page design has been presented. In addition, it has been discussed how such assessment framework can be integrated with content-oriented perspectives of Web document quality. The described model is founded on existing empirical research about Web usability quantitative metrics, but further refinement of the aggregation scheme is needed to achieve higher degrees of fidelity to statistical profiles. Further research is also needed to obtain richer models of usability according to the problems [i–iii] sketched in Section 2, including aggregation schemes using non-symmetrically distributed fuzzy sets, modelling interacting criteria and including weighting schemes that allow for the introduction of different strengths for each usability sub-criteria.
References [1] Brusilovsky, P. (2001) Adaptive hypermedia. User Modeling and User Adapted Interaction, Ten Year Anniversary Issue (Alfred Kobsa, ed.) 11 (1/2), 87–110 [2] W. Chisholm, G. Vanderheiden (1999). Web Content Accessibility Guidelines 1.0. W3C Recommendation 5-May-1999.
[3] J.S. Dumas, J. C. Redish (1993). A Practical Guide to Usability Testing. Norwood, NJ: Ablex. [4] E. Frøkjær, M. Hertzum and K. Hornbæk (2000). Measuring usability: are effectiveness, efficiency and satisfaction really correlated?. Proc. of Human Factors in Computing Systems, pp 345–352. [5] Garc´ıa, E., Sicilia, M.A., Guti´errez, J.A.: On the Vague Modelling of Web Page Characteristics Regarding Usability. In: Menasalvas, E., Segovia, J., Szczepaniak, P. (eds.): First International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, Vol. 2663. Springer-Verlag, Berlin Heidelberg New York (2003) 199–207 [6] Garc´ıa, E., Sicilia, M.A., Gonz´alez, L., Hilera, L. (2002).Machine Learning Techniques In Usability-Evaluation Questionnaire Systems. In Proceedings of the Learning 2002 Conference, Legan´es, Madrid. [7] F. Herrera, E. Herrera-Viedma, J.L. Verdegay (1996). Direct Approach Processes in Group Decision Making using Linguistic OWA operators. Fuzzy Sets and Systems 79, pages 175–190. [8] E. Herrera-Viedma, E. Peis, J.C. Herrera, K. Anaya (2003). Evaluating the Informative Quality of Web Documents Using Fuzzy Linguistic Techniques. In: Proceedings of the International Conference on Fuzzy Logic and Technology (EUSFLAT03), pages. 32–37, Zittau (Germany). [9] E. Herrera-Viedma, E. Peis (2003). Evaluating the Informative Quality of Documents in SGML-Format Using Fuzzy Linguistic Techniques Based on Computing with Words. Information Processing and Management, 39(2), pages 195–213. [10] E. Herrera-Viedma, G. Pasi (2003). Fuzzy approaches to access information
on the Web: recent developments and research trends. In Proceedings of the International Conference on Fuzzy Logic and Technology (EUSFLAT03), pages 25–31, Zittau(Germany). [11] M.Y. Ivory, M.A. Hearst (2001). The State of the Art in Automated Usability Evaluation of User Interfaces. ACM Computing Surveys, 33(4), pages 1–47. [12] M.Y. Ivory, R. Sinha, and M.A. Hearst (2001). Empirically Validated Web Page Design Metrics. In CHI 2001, ACM Conference on Human Factors in Computing Systems, CHI Letters 3(1), 2001. [13] M.Y. Ivory (2001). An Empirical Foundation for Automated Web Interface Evaluation. Ph.D. Thesis, University of California at Berkeley. [14] Kirakowski, J., Claridge, N., Whitehead, R.: Human centered measures of success in Web site design. Proc. of the Fourth Human Factors and the Web meeting (1998). [15] L´opez, L., Sicilia, M.A., Garc´ıa, E.: Personalization of Web Interface Structural Elements: A Learning-Scenario Case Study. In: International Symposia of Computer Science. Aguascalientes, Mexico (2001) 579–588 [16] J. Nielsen (1993). Usability Engineering. Boston: Academic Press. [17] J. Nielsen and R.L. Mack (eds.) (1994). Usability Inspection Methods. John Wiley & Sons, New York. [18] J. Nielsen (1999). Designing Web Usability : The Practice of Simplicity. New Riders. [19] L. Rosenfeld, P. Morville (1998). Information Architecture for the World Wide Web. O’Reilly. [20] M.A. Sicilia, E. Garc´ıa and T. Calvo (2003). On the use of the Choquet integral for the aggregation of usability interface related scores. In Proceedings of
the 2003 International Summer School on Aggregation Operators and their Applications, pp. 159–164. [21] van Welie, M., van der Veer, G., Elins, A.: Breaking down usability. Proc. International Conference on HumanComputer Interaction INTERACT99. IOS Press (1999) 613–620. [22] S.K. Wong, T.T. Nguyen, E. Chang, N. Jarayatna (2003). Usability metrics for e-learning. In: Proceedings of the Workshop on Human Computer Interface for Semantic Web and Web Applications, Springer Lecture Notes in Computer Science 2889, pages 235–253.