Assessing the Quality of Academic Websites: a ... - Semantic Scholar

Assessing the Quality of Academic Websites: a Case Study

Luis Olsina*, Daniela Godoy, Guillermo Lafuente GIDIS, Department of Computer Science, Faculty of Engineering, at UNLPam, *

also at UNLP – Argentina

TE-Fax +54 2302 430497 Int. 6502 E-mail [olsinal, godoyd, lafuente] @ing.unlpam.edu.ar

Gustavo Rossi LIFIA, Computer School at UNLP, also at UNM, and CONICET – Argentina TE-Fax +54 221 4228252 E-mail [email protected]

1

ABSTRACT In this paper, a quantitative evaluation approach to assess the quality of sites called Web-site Quality Evaluation Method (QEM) is proposed. This prescriptive and descriptive approach might be useful to evaluate and compare quality characteristics and attributes in different phases of a Web product lifecycle. Particularly, to discuss this methodology, we evaluate the level of accomplishment of required quality characteristics (like usability, functionality, reliability, efficiency, and derived subcharacteristics) in six typical academic sites. At the end of the evaluation process, a ranking for each selected site is obtained. Specifically, the evaluation process generates elemental, partial, and global indicators or quality preferences that can be easily analyzed, backward and forward traced, justified, and efficiently employed in decision-making activities. Hence, conclusions about the state-of-the-art of the quality in the operative phase of these sites can be drawn. In addition, recommendations for improvements can be given. The outcomes are indicators of the percentage of fulfillment of stated quality requirements. Finally, concluding remarks and in-progress research is presented.

KEYWORDS: Web-site QEM, Evaluation, Academic Sites, Quantitative Methodology, Quality.

1. INTRODUCTION Web artifacts have been developed at a hectic pace over the past few years. The increasing acceptability of websites raise challenges, for instance, in knowing or assessing where we are standing regarding product quality, and how we can improve it. However, there is still no widely recognized quantitative methodology for the evaluation of Web artifacts quality. One of the primary goals for website quantitative evaluation and comparison is to understand the extent which a given set of quality characteristics and attributes fulfills a set of needs and behaviors in consideration of specific audiences. In this direction, the Web-site QEM can be useful in providing this understanding, in an objective and sensible way.

2

On the one hand, even if software evaluation has decades as a discipline, the quantitative and objective quality evaluation of Hypermedia applications, and particularly of Web artifacts, is rather a recent and emerging issue. For instance, in the mid-nineties website style guides and design principles and techniques have emerged to assist developers in the development process (1, 2, 3); also, list of guidelines that author should follow in order to make sites more efficient and accessible (4). These guidelines and techniques have shed light on essential characteristics and attributes and might help to improve the Web designing and authoring process but, obviously, do not constitute evaluation methods by themselves. On the other hand, quantitative surveys (5, 6, 7, 8), and Web domain-specific evaluations are just emerging. In this direction, for example, Lohse and Spiller (7) identified and measured 32 attributes that influence store traffic and sales. In addition, Kirakowski et al. (8) have made important usability evaluation experiments grounded in subjective user-based questionnaires. However, to deal with the assessment and comparison of complex Web quality requirements, an integral, flexible, well-defined and engineering-based evaluation methodology and process model is needed. It is important to remark the efforts made by ISO/IEC 9126 (9) in the specification of product quality characteristics at a high level of abstraction, and the in-progress efforts in the process for software evaluators (10), centered in the aforementioned standard.

Ultimately, Web-site QEM is intended to perform an engineering contribution by proposing a systematic, disciplined and quantitative approach customized to the evaluation, comparison and analysis of the quality of complex Web artifacts. In general terms, the underlying strategy is evaluator-driven by domain experts rather than user-driven; it is objective rather than subjective; it is quantitative and model-centered rather than qualitative and intuition-centered.

In order to show the methodology strategies, models, products, and activities we deal with a recently ended case study on academic sites, -see also (11, 12). We have selected six internationally or regionally well-known academic sites, embracing regions of four different continents. In addition, they were published in the Web more than three year ago. On the other hand, with regard to selected quality characteristics and attributes for assessment purposes, about a 3

hundred and twenty quality characteristics and attributes were taken into account. Furthermore, we use the same high-level quality characteristics as those prescribed in ISO 9126 standard, and those reported in the annex A of IEEE 1061 STD (13). In order to effectively select quality characteristics a particular user viewpoint should be considered. Specifically, as it was observed elsewhere (14, 15), in the academic domain, there are mainly three different audiences regarding the visitor view, namely: current and prospective students, academic personnel, and research sponsors. Ultimately, the main goal of this study is to evaluate and determine the level of fulfillment of required characteristic such as usability, functionality, reliability, and efficiency, and compare partial and global preferences. This, potentially allows us to understand and draw conclusions about the state-of-the-art of the quality of academic sites, from the selected visitor’s point of view. At the end of the evaluation and comparison process, for each selected site a global indicator using the scale from 0 to 100% is obtained. Such cardinal rating falls in three categories or preference levels, namely: unsatisfactory (from 0 to 40%), marginal (from 40 to 60%), and satisfactory (from 60 to 100%). The global preference can be approximately interpreted as the degree of satisfied requirements (16).

The structure of this paper is as follows: in the next section, we present a panorama of Web-site QEM’s main steps. In Section 3, we put the academic case study in context, and we represent the characteristic requirement tree in Section 4. Following, in Section 5, we show some elementary criteria and elementary measurements. In Section 6, we show the process of aggregating elementary criteria to yield the global quality preference. Next, we analyze and compare partial and global outcomes. Finally, concluding remarks are considered.

2. OVERVIEW OF THE METHODOLOGY In order to evaluate, compare, and rank the quality of Web sites, we apply a set of activities regarding the proposed methodology. Fig. 1 shows a high-level view of major phases and procedures required for quality assessment. However, only the major process steps that evaluators should carry out are described next: 4

ü The specification of goals and the user standpoint ü The definition of Web-site quality requirements ü The definition of elementary criteria and measurement procedures (also called the determination of the elementary quality preference) ü The aggregation of elementary preferences to yield the global quality preferences ü The analysis, the assessing, and comparison of partial and global quality preferences

(Also, support activities such as the selection of evaluation domain and sites to assess, evaluation planning, scheduling, metric validation, and documentation process should be taken into account; however, for space reasons these will not be described in this work)

Requirement Definition and Specification

F(Xn)

Web-S2

GQ-Si

Global Evaluation: Definition and Implementation

Analysis of Quality Outcomes and Documentation

Fig. 1 A

high-level view of the Web-site QEM process

5

Web-Sn

0

Global Quality Preferences

EQn

Elementary Evaluation: Definition and Implementation

50% Web-S1

Satisfactory

Xn

0

100 %

Marginal

An

50%

Ranking Process

Unsatisfactory

User ViewPoint

(Logic Scoring of Preference Model)

Goals

100 %

Aggregation Process

Web-Sn

EQ1

F(X1)

Elementary Preferences

Web-S2

X1

Criteria Function Definition

Web-S1

A1

Measurable Attributes

Web-site Domain Selection

Quality Characteristics and Attributes Requirement Tree

Evaluation Planning and Scheduling

In the step “specifying goals and the user standpoint”, the evaluators should define and refine the goals and scope of the evaluation process. They could evaluate a Web development project or a Web operative one, and could assess the quality of a set of characteristics of a component, a whole product, or compare characteristics and global preferences of selected ones. The outcomes might be useful to understand, improve, control or predict the quality of Web artifacts. On the other hand, the formulation of goals and consequently the relative importance of quality characteristics and attributes vary depending on different users. In general terms, and considering standards (9), we abstract three views of quality: visitor, developer, and manager views. In turn, we could decompose the visitor category in more specific classes as will be seen later in this study.

In the “defining the Web-site quality requirement” task, the evaluators should agree and specify the quality characteristics and attributes, grouping them in a requirement tree. In order to follow wellknown standards, we use the same conceptual characteristics [also known as factors in (13)], like usability, functionality, reliability, efficiency, portability, and maintainability. From these characteristics, we derive subcharacteristics, and from these, we can specify attributes with minimal overlap. For each quantifiable attribute Ai, we can associate a variable Xi, which can take a real value: the measured and computed value. Therefore, regarding the domain, the specific goals, and the user standpoint (i.e. user needs and behavior), quality characteristics and attributes should be outlined in a requirement tree.

In the “defining elementary criteria and measurement procedures” activity, the evaluators should define the basis for elementary evaluation criteria and perform the measurement and rating process. Elementary evaluation criteria say how to measure quantifiable attributes. The outcome is an elementary preference, which can be interpreted as the degree or percentage of satisfied requirement. For each variable Xi (i = 1, ..., n), it is necessary to establish an acceptable range of values and define a function, called the elementary criterion. This function is a mapping of the variable value -the computed value from the empirical domain (17), into the new numerical domain, representing the elementary quality preference (EQi). We can assume the elementary

6

quality preference EQi as the percentage of requirement satisfied by the function value. In this sense, EQi = 0% denotes a totally unsatisfactory situation, while EQi = 100% represents a fully satisfactory situation.

In the task “aggregating elementary preferences and yielding the global quality preference”, the decision-makers should prepare and perform the evaluation process to obtain a global preference indicator for each selected product. For n attributes, the previous step produces n elementary quality preferences. Applying a stepwise aggregation mechanism, the elementary quality preference can be grouped accordingly, allowing getting the global quality preference. The global quality preference represents the global degree of satisfaction of all involved requirements. In turn, the global scoring falls in one out of three levels of preferences or quality bars –see Fig. 1-, this is, unsatisfactory, marginal, and satisfactory. In museums and academic case studies, we use a logical scoring model to yield the global quality of each site. Specifically, we use the Logic Scoring of Preference (LSP) model grounded in the continuous preference logic (18).

In the final step “analyzing, assessing, and comparing partial and global quality preferences”, the evaluators assess and compare elementary, partial and global quantitative results regarding the established goals and user standpoint. As previously said, the outcomes might be useful to understand the Web artifacts quality, and recommendations can be suggested.

In the following sections, we deal with specific aspect of both the methodology and the case study.

3. PUTING THE ACADEMIC CASE STUDY IN CONTEXT To prepare the academic study, six operative sites aging four years on an average were selected. In order to design the study, the chosen sites should be typical and well known regionally and/or internationally. We considered sites as the Stanford University (US, http://www.stanford.edu), the Chile University (http://www.uchile.cl), the National University of Singapore (http://www.nus.sg), the University Technological of Sydney (Australia, http://www.uts.edu.au), the Polytechnic 7

University of Catalunya (Spain, http://www.upc.es), and the University of Quebec at Montreal (Canada, http://www.uqam.ca). Fig. 2 shows a snapshot of four home pages.

The primary goal of this academic case study was the understanding and comparison of the current level of fulfillment of essential quality characteristics and attributes given a set of requirements with regard to prospective and current students’ viewpoint. Particularly, we assessed the level of accomplishment of standardized characteristics such as usability, functionality, reliability, and efficiency, and compare partial and global preferences to analyze and draw conclusions about the state-of-the-art of Web-site artifact quality. Important conclusions (mainly, domain specific as well as general ones) can emerge from this study as it will be seen. In addition, recommendations to apply in the early phases of new Web projects can be considered.

On the other hand, some consideration regarding data collection should be made. This activity can be done manually, semi-automatically, or automatically. Unfortunately, most of the attributes values were collected manually and observationally because there is no way to do it otherwise. However, automatic data collection is in many cases almost the unique way to collect data for a given attribute, and undoubtedly is the more reliable mechanism. This was the case to measure the Dangling (broken) Links, Image Title, and Quick Access Page attributes, among others. These attributes were automated using the SiteSweeper tool.

Besides, we should take into account that websites are artifacts that can evolve dynamically and we always access to the last on-line version. By the time of data collection (which began on January 22, and finished on February 22, 1999), we did not perceive changes in these Web sites that could have affected the evaluation process. Finally, this evaluation work was focused mainly on the university site as a whole rather than on each individual academic unit such as schools, colleges or laboratories.

8

Fig. 2 From upper-corner left to right the home pages of University of Chile, and University Technological of Sydney; and from bottom-corner left to right the ones of University Polytechnic of Catalunya, and Stanford University.

4. SPECIFYING THE QUALITY REQUIREMENT TREE The main goal of this step is to elicit, classify, and group in a requirement tree the characteristics and attributes that might be part of a quantitative evaluation, comparison, and ranking process. The prescribed characteristics aforementioned provide a conceptual framework for quality requirements and a baseline for further decomposition.

A quality characteristic could be decomposed into multiple levels of subcharacteristics, and so, a subcharacteristic could be refined in a set of measurable attributes.

9

1. Usability 1.1 Global Site Understandability 1.1.1 Global Organization Scheme 1.1.1.1 Site Map 1.1.1.2 Table of Contents 1.1.1.3 Alphabetical Index 1.1.2 Quality of Labeling System 1.1.3 Student-oriented Guided Tour 1.1.4 Image Map (Campus/Buildings) 1.2 Feedback and Help Features 1.2.1 Quality of Help Features 1.2.1.1 Student-oriented Explanatory Help 1.2.1.2 Search Help 1.2.2 Web-site Last Update Indicator 1.2.2.1 Global 1.2.2.2 Scoped (per sub-site or page) 1.2.3 Addresses Directory 1.2.3.1 E-mail Directory 1.2.3.2 Phone-Fax Directory 1.2.3.3 Post mail Directory 1.2.4 FAQ Feature 1.2.5 Form-based Feedback 1.2.5.1 Questionnaire Feature 1.2.5.2 Guest Book 1.2.5.3 Comments 1.3 Interface and Aesthetic Features 1.3.1 Cohesiveness by Grouping Main Control Objects 1.3.2 Presentation Permanence and Stability of Main Controls 1.3.2.1 Direct Controls Permanence 1.3.2.2 Indirect Controls Permanence 1.3.2.3 Stability 1.3.3 Style Issues 1.3.3.1 Link Color Style Uniformity 1.3.3.2 Global Style Uniformity 1.3.3.3 Global Style Guide 1.3.4 Aesthetic Preference 1.4 Miscellaneous Features 1.4.1 Foreign Language Support 1.4.2 What’s New Feature 1.4.3 Screen Resolution Indicator 2. Functionality 2.1 Searching and Retrieving Issues 2.1.1 Web-site Search Mechanisms 2.1.1.1 Scoped Search 2.1.1.1.1 People Search 2.1.1.1.2 Course Search 2.1.1.1.3 Academic Unit Search 2.1.1.2 Global Search 2.1.2 Retrieve Mechanisms 2.1.2.1 Level of Retrieving Customization 2.1.2.2 Level of Retrieving Feedback 2.2 Navigation and Browsing Issues 2.2.1 Navigability 2.2.1.1 Orientation 2.2.1.1.1 Indicator of Path 2.2.1.1.2 Label of Current Position 2.2.1.2 Average of Links per Page 2.2.2 Navigational Control Objects 2.2.2.1 Presentation Permanence and Stability of Contextual (sub-site) Controls 2.2.2.1.1 Contextual Controls Permanence 2.2.2.1.2 Contextual Controls Stability 2.2.2.2 Level of Scrolling 2.2.2.2.1 Vertical Scrolling

2.2.2.2.2 Horizontal Scrolling 2.2.3 Navigational Prediction 2.2.3.1 Link Title (link with explanatory help) 2.2.3.2 Quality of Link Phrase 2.3 Student-oriented Domain Features 2.3.1 Content Relevancy 2.3.1.1 Academic Unit Information 2.3.1.1.1 Academic Unit Index 2.3.1.1.2 Academic Unit Sub-sites 2.3.1.2 Enrollment Information 2.3.1.2.1 Entry Requirement Information 2.3.1.2.2 Form Fill/Download 2.3.1.3 Degree Information 2.3.1.3.1 Degree Index 2.3.1.3.2 Degree Description 2.3.1.3.3 Degree Plan/Course Offering 2.3.1.3.4 Course Description 2.3.1.3.4.1 Comments 2.3.1.3.4.2 Syllabus 2.3.1.3.4.3 Scheduling 2.3.1.4 Student Services Information 2.3.1.4.1 Services Index 2.3.1.4.2 Healthcare Information 2.3.1.4.3 Scholarship Information 2.3.1.4.4 Housing Information 2.3.1.4.5 Cultural/Sport Information 2.3.1.5 Academic Infrastructure Information 2.3.1.5.1 Library Information 2.3.1.5.2 Laboratory Information 2.3.1.5.3 Research Results Information 2.3.2 On-line Services 2.3.2.1 Grade/Fees on-line Information 2.3.2.2 Web Service 2.3.2.3 FTP Service 2.3.2.4 News Group Service 3. Reliability 3.1 Non-deficiency 3.1.1 Link Errors 3.1.1.1 Dangling (broken) Links 3.1.1.2 Invalid Links 3.1.1.3 Unimplemented Links 3.1.2 Miscellaneous Errors or Drawbacks 3.1.2.1 Deficiencies or absent features due to different browsers 3.1.2.2 Deficiencies or unexpected results (e.g. nontrapped search errors, frame problems, etc.) independent of browsers 3.1.2.3 Destination Nodes (unexpectedly) under Construction 3.1.2.4 Dead-end Web Nodes 4. Efficiency 4.1 Performance 4.1.1 Quick Access Page 4.2 Accessibility 4.2.1 Information Accessibility 4.2.1.1 Support for text-only version 4.2.1.2 Readability by deactivating Browser Image Feature 4.2.1.2.1 Image Title 4.2.1.2.2 Global Readability 4.2.2 Window Accessibility 4.2.2.1 Number of panes regarding frames 4.2.2.2 Non-frame Version

Fig. 3 Quality characteristics and attributes for the academic site domain

In order to effectively select quality characteristics we should consider different kind of users. In the academic domain, different audiences regarding the visitor view were observed. In this work, we have agreed about a hundred and twenty quality characteristics and attributes for the academic domain regarding the student viewpoint. Hence, student-oriented and paper-based questionnaires were conducted to help us in determining the relative importance of characteristics and subcharacteristics. Discussions among involved parties took place (i.e. among three groups of students, academic personnel, and three evaluators). These gave us feedback to observe, for example, that quality characteristics such as maintainability and portability are not relevant for this audience. Fig. 3, outlines the resulting characteristics and measurable attributes.

The Usability high-level characteristic is decomposed in sub-factors such as Global Site Understandability, Feedback and Help Features, Interface and Aesthetic Features, and Miscellaneous Features. The Functionality characteristic is split up in Searching and Retrieving Issues, Navigation and Browsing Issues, and Domain-related Student Features. The same decomposition mechanism is applied to Reliability and Efficiency factors. For instance, Efficiency characteristic is decomposed in Performance and Accessibility sub-characteristics. In turn, for Global Site Understandability subcharacteristic (within Usability), we have split up in Global Organization Scheme subcharacteristic, and in quantifiable attributes such as Quality of Labeling, Student-oriented Guided Tours, and Campus Image Map. However, Global Organization Scheme sub-characteristic is still too general to be directly measurable, so attributes such as Site Map, Table of Content, and Alphabetical Index are derived.

Focusing on Domain-related Student Features characteristic (where Functionality is the supercharacteristic), we have observed two main subcharacteristics, namely: Content Relevancy and On-line Services. As the reader can appreciate aspects ranging from academic units, degree/courses, enrollment and services information, to web services, ftp, and news groups provided for undergraduate and graduate students were evaluated.

11

Finally, and regarding quantifiable attributes we might easily see that no necessarily all attributes of a given characteristic should exist simultaneously in a Web site. For instance, this is the case for the Feedback characteristic, where a Questionnaire Feature, a Guest Book, or Comments attribute could alternatively exist. However, in many cases modeling the simultaneity relationship among attributes and characteristics might be an essential requirement for the evaluation. For example, to the Content Relevancy characteristic, it might be mandatory the existence of subcharacteristics such as Academic Unit Information, and Enrollment Information, and Degree Information simultaneously.

Ultimately, it is important to stress that we utilize the Logic Scoring of Preference model in the evaluation process, which allows us dealing with simultaneity, neutrality, and replaceability relationships taking into account weights and levels of and/or polarization. LSP uses aggregation operators based on weighted power means. These logical operators will be described in Section 6.

5. PERFORMING THE ELEMENTARY EVALUATION As previously said, for each quantifiable attribute Ai we can associate a variable Xi, which can take a real value. In addition, for each variable it is necessary to set a criterion function, called the elementary criterion function. This function models a mapping among the computed value from the empirical domain, and the value of the new numerical representation, resulting afterwards in an elementary quality preference, EQi. The interpretation of the elementary preference and used scale and unit can be extracted from Dujmovic et al. (16) that says: “the elementary preference is interpreted as a continuos logic variable. The value 0 denotes that Xi does not satisfy the requirements, and the value 1 denotes a perfect satisfaction of requirements. The values between 0 and 1 denote a partial satisfaction of requirements. Consequently, all preferences are frequently interpreted as a percentage of satisfied requirements, and defined in the range [0, 100%]”.

On the other hand, a hierarchical and descriptive specification framework to represent the above characteristics or attributes were presented in (12). Next, for the academic study, two attributes are documented (e.g., Table of Content, and Quick Access Page attributes) following the same regular 12

structure.

Title: Table of Contents; Code: 1.1.1.2; Type: Attribute High-level characteristic: Usability Super-characteristic: Global Organization Scheme Definition / Comments: It is an attribute that allows structuring the contents of the whole site permitting the navigation by means of linked text. It is usually available in the home page and emphasizes the information hierarchy so that users can become increasingly familiar with how the content is organized in subsites. In addition, it facilitates a fast and direct access to subsites contents (3). Elementary Criteria: It is an absolute and discrete binary criterion: we only ask if it is available [1], or it is not available [0].

0

1

Preference Scale: 0%

Data Collection Type: Manual, Observational

40%

60%

100%

Example/s: Examples of table of content availability are NUS, UTS, Stanford, and UPC sites. The computed elementary preference is 100%. Besides, in the subsite organization of the UTS table of content, an audience-oriented division is clearly established (e.g., for students, for staff, and for researchers and sponsors).

Title: Quick Access Page; Code: 4.1.1; Type: Attribute High-level characteristic: Efficiency Super-characteristic: Performance Definition / Comments: For the whole Web site, the size of each (static) page is measured considering all its graphic, tabular, and textual components. The access time to pages is specified as a function of the size and of the established speed for a given communication line. We have specified a size upper limit of 35.2 Kb. per page. A page of this size requires about 20 sec. to download at 14,400 bps. (as a limit of acceptable period of time that a user might wait).

13

“Users tend to become annoyed when a page takes longer than 20 seconds to load. This means it is best to limit the total of the file sizes associated with a page, including graphics, to a maximum of 30 – 45 kilobytes to assure reasonable performance for most users.”, IEEE Web Publishing guide (1), in Performance section. Elementary Criteria: It is an absolute and continuous multi-variable criterion. The formula to compute the preference is:

X = ( (X1 - 0.4 X2 - 0.8 X3) / (X1 + X2 + X3) ) * 100; where X1 represents the

number of pages within a download time ranging from 0 < t1 < = 20 seconds, and X2 represents the number of pages within a download time ranging from 20 < t2 < = 40, and X3 represents the number of pages within a download time where: t3 > 40 sec. 100

0

Preference Scale: 0%

40%

60%

100%

Data Collection Type: Automated. Example/s: As an example we may consider UTS site, where the tool reported “You specified a total download

size

limit

of

35.2K

bytes

per

page.

A

page

this

size

requires

about 20 seconds to download at 14.4K bps. Of the 18.872 pages on your site, 2.210 pages (12%) have a total download size that exceeds this threshold”. Regarding the above formula and the values reported by the SiteSweeper 2.0 tool, the following computation: (16662-0.4*18500.8*440)/18872 , yields a preference of 82 %. Amazingly, Stanford Web-site drew an elementary preference of 100% (No page overflows the threshold of 35.2K bytes).

Finally, once all elementary criteria were prepared and agreed, and necessary data collected, we can then compute the elementary quality preference for each site. Table 1, partially shows elementary results of preferences after computing the corresponding criteria function for each academic site attribute. We include some elementary results for Usability characteristic as well as Functionality, and Efficiency ones. Even if they are only elementary values where no aggregation mechanisms were yet applied, and no global outcomes yet produced, however, important results could be obtained. For instance, we can see that two out of six sites have not resolved Global Organization Scheme (i.e., they have neither Site Map, nor Table of Content, or Alphabetical Index attributes). As said elsewhere (11), 14

when users enter at a given home page for the first time, the direct or indirect availability of these attributes may help them in getting a quick Global Site Understandability both for the structure and the content. Likewise, attributes as Quality of Labeling, Guided Tours, and Campus Image Map contribute to the global understandability. Nonetheless, and regarding attributes of the Global Organization Scheme feature, we see that no necessarily all of them might exist at the same time (the replaceability relationship); a Table of Content, an Index attribute, or a Site Map could be required. However, others arrangements could be possible. This is the case with UPC, UTS, NUS, and Stanford sites, where only some attributes are present (and should not be punished for the absence of some others).

Table 1 Some results of elementary quality preferences for the six academic sites UPC Spain

Info Uchile Chile

Usability 1.1.1.1 100 1 1.1.1.2 100 1 1.1.1.3 0 0 1.1.2 90 1.1.3 0 0 1.1.4 100 1 Functionality 2.1.1.1.1 60 1 2.1.1.1.2 0 0 2.1.1.1.3 0 0 2.1.1.2 60 1 Reliability 3.1.1.1 0 -29 Efficiency 4.1.1 75.3 4.2.1.2.1 34.38

Info

UTS Info NUS Info Australia Singapore

0 0 0 90 0 100

0 0 0

0 1 1

0 1

0 100 100 90 100 100

100 0 0 60

2 0 0 1

60 100 0 60

75.02 75.02 50.46 45.36

Stanford USA

0 1 0

1 1

0 100 0 80 0 100

1 2 0 1

100 0 0 0

74.1 74.1 82 81.65

0 1

0 100 100 90 100 50

2 0 0 0

100 100 100 100

info

UQAM info Canada

0 1 1

0 0 0

1 0.5

0 0 0 80 0 100

2 2 2 2

100 0 100 100

2 0 2 2

68.06 68.06

58.32 58.32

51.46 36.22

100 47.29

0 1

0 -10 83.44 53.15

6. PERFORMING THE GLOBAL EVALUATION In this activity, we prepare the global evaluation process to obtain at the end a global indicator of the quality for each assessed site. In the process, the type of relationships among attributes and subcharacteristics and the relative weights might be agreed. For this purpose, we considered the utilization of a robust and sensible model such as the logical scoring model, regarding the amount of intervening characteristics and attributes and the complexity of the relationships. However, in simpler cases a merely additive scoring model can be used where partial and global indicators can be

15

computed using the following structure:

Partial/global indicator or score = ∑ (component weight * elemental indicator)

Likewise in the museum case study, a Logic Scoring of Preference model was used. (A broad treatment of LSP model and continuous logic preference (CLP) operators can be found in (16, 18)). Applying a stepwise aggregation mechanism, the elementary quality preferences can be structured accordingly, to allow computing partial preferences. In turn, repeating the aggregation process at the end we obtained the global preference. This indicator represents the global degree of satisfaction of all involved requirements. The strength of LSP model over merely linear and additives ones resides in the power to deal with different logical relationships to reflect the stakeholders’needs, namely:

ü

Simultaneity, when it is perceived that two or more input preferences must be present simultaneously

ü

Replaceability, when it is perceived that two or more attributes can be alternated

ü

Neutrality, when it is perceived that two or more input preferences can be grouped independently (neither conjunctive nor disjunctive relationships)

ü Symmetric relationships, when it is perceived that two or more input preferences affect evaluation in the same logical way (though maybe with different weights) ü Asymmetric relationships, when mandatory attributes are combined with desirable or optional ones; and when sufficient attributes are combined with desirable or optional ones

Fig. 5, depicts the final aggregation structure for a) Efficiency characteristic, and b) shows the highlevel aggregation of characteristics to yield the global quality preference. The stepwise aggregation process follows the hierarchical structure of the requirement tree (shown in Fig. 3) from bottom to top. The major CLP operators (depicted in Fig. 4) are the arithmetic means (A) that models neutrality relationship, and weak (C-), medium (CA), and strong (C+) quasi-conjunction functions, that model simultaneity relationships. In this sense, operators of quasi-conjunction are flexible and connectives. In 16

addition, we can tune these operators to intermediate values. For instance, C-- is positioned between A and C- operators; and C-+ is between CA and C- operators, and so on. The above operators (except A) mean that, a low quality of an input preference can never be well compensated by a high quality of some other input to output a high quality preference. SIMULTANEITY OPERATORS

WITHOUT POLARIZATION

PURE CONJUNCTION TYPE

QUASI-CONJUNCTION INTENSITY OF POLARIZATION

STRONG

C

C++

C+

MEDIUM

C+-

CA

C-

QUASI-DISJUNCTION INTENSITY OF POLARIZATION

WEAK

C--

A

PURE DISJUNCTION

TYPE

WEAK

C-+

REPLACEABILITY OPERATORS

LOGIC POLARIZATION OPERATORS

D--

D-

STRONG

MEDIUM

D-+

DA

D+-

D+

D++

D

NON-MANDATORY REQUIREMENTS

MANDATORY REQUIREMENTS

Fig. 4 Conjunctive and disjunctive LSP operators and levels of polarization.

For example, in Fig. 5a), at the end of the aggregation process we have the subcharacteristic coded 4.1 (called Performance in the requirement tree, 0.7 weighted), and 4.2 subcharacteristic (Accessibility, 0.3 weighted). These two subcharacteristics are input to the C-- logical function, which produces the partial global preference coded as 4 (called Efficiency). The C-- operator do not model mandatory requirements, that is, a zero in one input does not yield a zero at the output though punishes the outcome. (In Table 2, the reader can corroborate a weak punishment to the Efficiency factor of UPC University, where input values of 75.3, and 55.41 yield a preference of 69.09 %). Furthermore, Fig. 5 b), shows the end of the aggregation characteristics coded as 1, 2, 3, 4, respectively. These are inputs to the C-+ logic operator, which does model mandatory requirements in a medium-to-low intensity of polarization. So, a zero in one input will produce a global quality preference of zero. Thus, the higher the level of conjunctive polarization toward the C operator, the higher the strength of punishment to lower input preferences. 17

As with conjunctive operators, we can also utilize the quasi-disjunction operators in ranges of intensity of polarization as shown in Fig. 4. In this way we model replaceability relationships where alternatives can exist, i.e., a low quality on an input preference can always be compensated by a high quality of some other input. For example, to model the Global Organization Scheme output we use a D++ operator where the inputs are Table of Content, Global Index, or Site Map (as discussed in the previous section). The attributes are strongly replaceable each other, that is, the presence of one of them can replace the absence of the others.

Looking at Table 1 (1.1.1.1; 1.1.1.2; and 1.1.1.3

attributes), and Table 2 (1.1.1 subcharacteristic), and recalling the use of D++ operator, corroboration of results can be made.

4.1

4.1.1

1 Usability

0.7

4.2.1.1

0.3 4

4.2.1.2.1

0.4

4.2.2.1

0.7

0.6 0.6 0.4

3 Reliability

4.2

C-+

0.2

0.3 D++

4.2.2.2

Functionality

0.3

0.6 C--

Global Preference

2

4.2.1 A

C-4.2.1.2.2

C--

4.2.1.2

0.3

4.2.2 0.4

4 Efficiency

d) a)

0.2

b)

Fig. 5 Aggregation of preferences using the LSP Model. a) Depicts the aggregation structure for Efficiency characteristic; b) Shows the aggregation structure of main characteristics.

7. ANALYZING AND COMPARING THE QUALITY OF SITES In this task, as commented in Section 2, the evaluators analyze, assess, and compare the partial and total quantitative results regarding the established goals and user viewpoint. At this point, partial and final outcomes dumped, for instance, in tables 1, and 2; graphics diagrams (as that shown in Fig. 6, and others, as outputs of automated tools), and schemas depicting models of aggregation criteria functions (as in Fig. 5), are useful sources of information to analyze and draw conclusions about the quality of artifacts.

18

Table 2 Some results of partial and global quality preferences for each University site Stanford UQAM Characteristic and Subcharacteristics UPC UChile UTS NUS 1. Usability 1.1 Global Site Understandability 1.1.1 Global Organization Scheme 1.2 Feedback and Help Features 1.3 Interface and Aesthetic Features 1.4 Miscellaneous Features 2. Functionality 2.1 Searching Issues 2.1.1 Web-site Search Mechanisms 2.1.1.1 Scoped Search 2.1.2 Retrieve Mechanisms 2.2 Navigation and Browsing Issues 2.3 Student-oriented Domain Features 3. Reliability 3.1.1 Link Errors 4. Efficiency 4.1 Performance 4.2 Accessibility Global Quality Preference

76.18 77.76 97.91 70.15 80.6 73 61.84 48.59 48 24 50 62.01 73.46 60.40 50 69.09 75.3 55.41 66.91

51.01 43.5 0 44.6 69.72 43 50.39 42.37 50.84 40 25 50.09 57.43 87.62 82.51 52.03 50.46 55.79 56.55

80.08 98.17 99.08 73.8 83.11 35 48.89 29.32 52.17 44 0 57.66 61.80 90.86 87.05 76.11 82 63.04 69.61

57.71 75.70 96.29 75.8 34.9 35 38.99 19.73 35.12 40 0 38.80 61.87 88.70 84.03 53.12 51.46 57.10 54.46

71.93 83.17 99.08 69.4 75.71 35 82.04 100 100 100 100 78.02 73.13 83.05 76.16 85.99 100 56.57 79.76

60.94 42 0 68.79 73.10 80 71.12 80.55 82.98 60 75 50.89 82.37 63.61 50 69.47 83.44 40.84 66.05

Table 2, shows detailed results of partial and global quality preferences for each university (however, we do not include all the subcharacteristics results, for space reasons). The final global preference for each site can be observed at the bottom of the table. In addition, Fig. 6 represents the final ranking. The evaluation process has ranked first the Stanford University site with a 79.76 % of the global quality preference (falling into the green quality bar), and has ranked last the National University of Singapore site, which has got the 54.46 % of the global preference (falling into the gray quality bar). The colored bars on the left side of the figure indicate the levels of quality preferences as previously commented: satisfactory (green), marginal (gray), and unsatisfactory (red). For instance, a scoring within a gray bar can be interpreted as though improvement actions should be considered (this is the case for global preferences both for Chile and Singapore academic sites), as long as an unsatisfactory rating level can be interpreted as though necessary and urgent change actions must be taken. A scoring within a green bar can be interpreted as a satisfactory or acceptable quality situation.

Regarding the Usability characteristic, the highest score was UTS (80.08%), and the lowest was UChile (51.01%). For example, UTS site has reached a 98.17 % of preference in Global Site Understandability (see the components in Fig. 3), as long as UChile site was zero-scored in Global

19

Organization Scheme (it has neither global index, nor table of contents, or site map). However, considering Interface and Aesthetic Features it is better positioned than NUS site, which was scored in the red bar with 34.9 % of preference. Clearly, Table 2 shows that NUS site has serious problems with Interface, Searching, and Navigation Issues. (Analyzing each attribute that composes these subcharacteristics, recommendations to improvement can be made).

Fig. 6 Final ranking for the six academic sites

Regarding the Functionality characteristic, the highest score was to Stanford (82.04%), and the lowest was to NUS (38.99%). Stanford site has reached the outstanding score of 100% in Searching Issues (in-site global and scoped searching and retrieval features). Nonetheless, NUS and UTS are in the gray bar for that subcharacteristic. Besides, NUS site has amazingly no navigation controls to go back at the home page or to navigate within a subsite. On the other hand, considering Student-oriented domain features (so important for the evaluated audience) the range for all the sites goes among 57.43% up to 82.37% what are respectable scores; the best mark was for UQAM site.

Regarding the Reliability characteristic, the highest score was to UTS (90.86%), and the lowest was to UPC (60.40%) all the sites falling in the green quality bar. Furthermore, considering the Efficiency characteristic, the highest score was to Stanford (85.99%), and the lowest was to UChile (52.03%). The Stanford site has reached the outstanding score of 100% in the Performance subcharacteristic (as 20

was seen in Section 5); however, NUS (51.46%) and UChile (50.46%) might plan some improvement in performance. With regard to the Accessibility subcharacteristic, the lowest preference was to UQAM (40.84%). Unfortunately, no site has the Text-only Version attribute. This would be useful because users may need to have total information accessibility on pages, mainly for people with disabilities, or when speed is a problem (4).

Finally, we observe that the state-of-the-art of Web-site quality on typical academic sites, from the student point of view, is rather high, but the wish list because of poor-designed or absent attributes is not empty. Lastly, it is important to say that the final ranking reflects these specific requirements for this specific audience and should not be regarded as a general ranking. However, it is important to stress the objective components of this evaluation process.

8. FINAL REMARKS The assessment of Web artifacts is a relevant lifecycle process to be committed by Web software engineers. We presented a quantitative methodology, mixture of prescriptive and descriptive strategies, that can be useful to evaluate and compare quality characteristics and attributes of complex Web sites in the operative phase. Moreover, Web-site QEM can be used in early phases of Web development projects. At the end of the evaluation process -for instance, in this study-, a ranking for each selected site, which reflects the level of satisfaction of all quality requirements is obtained.

The evaluation process generates elemental, partial, and global preferences that can be easily analyzed, backward and forward traced, justified, and efficiently employed in decision-making activities. Moreover, the LSP model allows a stepwise aggregation of sophisticated criteria functions which gives objective and reliable evaluation results for intervening characteristics and attributes.

In order to establish user quality requirements we started from six prescribed characteristics, which describe with minimal overlap, software quality. As stated by ISO 9126 standard, software quality may be evaluated, in general, by the following characteristics: usability, functionality, reliability, 21

efficiency, portability, and maintainability. These high-level characteristics provide a conceptual foundation for further refinement and description of quality. However, the state of the art in product metrics is such that a direct measurement of these characteristics is not viable. What is possible it is to assess these characteristics based on the measurements of attributes of lower level of abstraction. In this way, the evaluators can use their expertise to carry on the decomposition and the assessment process. Besides, the relative importance of each characteristic in the quality requirement tree varies depending on the user view, application domain, and component criticism, among other factors. Therefore, in field studies like this, student-oriented meetings, questionnaires, or other techniques should also be led to help in determining the requirement tree and the relative importance of the components. Generally, Web site visitors are mainly concerned in using the site, i.e., in its searching and browsing functions, in its specific user-oriented content and functionality, in its reliability, in its performance, in its accessibility mechanisms, in its feedback and aesthetic features, etc.; ultimately, they are interested in quality of its use. Nonetheless, maintainability and portability are rather neglected for this kind of users.

Regarding the present case study, the state-of-the-art of the quality on typical academic sites, from the current and prospective student standpoint is rather high as it was observed. The evaluated websites satisfied globally among 54% up to 80% of the specified requirements. Nevertheless, each evaluated site has at least a subcharacteristic in the red (and gray) quality bar, where improvement actions must be planned. Thus, engineers and designers have a powerful and flexible tool to redirect their efforts on the weaker subcharacteristics and absent attributes of the site.

Currently, we are running an evaluation project on well-known sites in the arena of e-commerce. On the other hand, these studies are allowing us to strengthen the validation process on quality criteria and metrics as long as our experiences are growing.

22

ACKNOWLEDGMENT This research is partially supported by the "Programa de Incentivos, del Ministerio de Cultura y Educación de la Nación, Argentina", in the research project called Metodología de Evaluación y Comparación Cuantitativa de Calidad de Artefactos Web. Thanks to Santiago Nicolau and Gustavo Lafuente who contributed in a tool implementation.

REFERENCES 1.

IEEE Web Publishing Guide, Available in: http://www.ieee.org/web/developers/style/

2.

SHNEIDERMAN, B., Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd Edition, Reading, MA: Addison Wesley, 1998.

3.

ROSENFELD, L., MORVILLE, P., Information Architecture for the World Wide Web, O`Reilly, 1998.

4.

W3 Consortium, W3C Working Draft, WAI Accessibility Guidelines: Page Authoring, http://www.w3c.org/TR/WD-WAI-PAGEAUTH/, 1999.

5.

NIELSEN, JAKOB; The Alertbox Column, http://www.useit.com/alertbox/

6.

TILSON, R., DONG, J., MARTIN, S., KIEKE, E., Factors and Principles Affecting the Usability of Four E-commerce Sites, In: 4th Conference on Human Factors & the Web, Baking Ridge, NJ, US, 1998.

7.

LOHSE, G.; SPILLER, P., Electronic Shopping, CACM 41,7 (Jul-98), 1998, 81-86.

8.

KIRAKOWSKI, J., CIERLIK, B., Measuring the Usability of Web Sites, In: Human Factors and Ergonomics Society Annual Conference, Chicago, US, 1998.

9.

ISO/IEC 9126-1991(E) International Standard, Information technology – Software product evaluation – Quality characteristics and guidelines for their use, Geneva, Switzerland, 1991.

10. ISO/IEC 14598-5:1998(E) International Standard, Information technology -- Software product evaluation -- Part 5: Process for evaluators Geneva, Switzerland, 1998. 11. OLSINA, L., Web-site Quantitative Evaluation and Comparison: a Case Study on Museums, In:

Workshop on Software Engineering over the Internet, at Int’l Conference on Software

23

Engineering, http://sern.cpsc.ucalgary.ca/~maurer/ICSE99WS/ICSE99WS.html, Los Angeles, US, 1999. 12. OLSINA, L., GODOY, D; LAFUENTE, G.J; ROSSI, G.; Quality Characteristics and Attributes for

Academic

Web

Sites,

In:

WWW8

Web

Engineering´99

Workshop,

http://budhi.uow.edu.au/web-engineering99/web_engineering.html, Toronto, Canada, 1999. 13. IEEE Std 1061-1992, IEEE Standard for a Software Quality Metrics Methodology, NY, US, 1992. 14. LOWE, D.; WEBBY, R.; The Impact Process Modeling Project In: 1st International Workshop on

Hypermedia

Development,

at

ACM

Hypertext

98,

Pittsburgh,

US,

http://ise.ee.uts.edu.au/hypdev/, 1998. 15. OLSINA, L, Building a Web-based Information System applying the Hypermedia Flexible Process Modeling Strategy, In: 1st International Workshop on Hypermedia Development, at ACM Hypertext 98, Pittsburgh, US, http://ise.ee.uts.edu.au/hypdev/, 1998. 16. DUJMOVIC, J.J.; BAYUCAN, A., A Quantitative Method for Software Evaluation and its Application in Evaluating Windowed Environments,In: IASTED Software Engineering Conference, San Francisco, US, 1997. 17. FENTON, N.E.; PFLEEGER, S.L., Software Metrics: a Rigorous and Practical Approach, 2nd Ed., PWS Publishing Company, 1997. 18. DUJMOVIC, J.J., A Method for Evaluation and Selection of Complex Hardware and Software Systems, In: The 22nd Int’l Conference for the Resource Management and Performance Evaluation of Enterprise Computing Systems, 1996, V1, 368-378.

24