Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
Early estimation of users’ perception of Software Quality DIMITRIS STAVRINOUDIS Computer Engineering and Informatics Department, Patras University, Greece.
[email protected] MICHALIS XENOS School of Sciences and Technology, Hellenic Open University, Patras, Greece. R.A. Computer Technology Institute, Patras, Greece.
[email protected] PAVLOS PEPPAS Department of Business Administration, Patras University, Greece.
[email protected] DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department, Patras University, Greece. R.A. Computer Technology Institute, Patras, Greece.
[email protected] Abstract: This paper presents a methodology for estimating users’ opinion of the quality of a software product. Users’ opinion changes with time as they progressively become more acquainted with the software product. In this paper, we study the dynamics of users’ opinion and offer a method for assessing users’ final perception, based on measurements in the early stages of product release. The paper also presents methods for collecting users’ opinion and from the derived data, shows how their initial belief state for the quality of the product is formed. It adapts aspects of Belief Revision theory in order to present a way of estimating users’ opinion, subsequently formed after their opinion revisions, using the initial measurements and without having to conduct surveys frequently. It reports the correlation that users tend to infer among quality characteristics and represents this correlation through a determination of a set of constraints between the scores of each quality characteristic. Finally, this paper presents a fast and automated way of forming users’ new belief state for the quality of a product after examining their opinion revisions.
1. Introduction In every definition of quality, the satisfaction of users’ expectations is emphasized (Crosby, 1979; Juran et al., 1980). In a similar manner, software quality is highly related to conforming to end-users requirements and fulfilling their needs. Standards such as ISO9001 (ISO/IEC 9001, 2000) and commonly applied models such as CMM (Paulk et al., 1995) emphasize measuring and evaluating the opinion of the end-users. It is self-evident that the sooner a software developer obtains an estimate of users’ perception of software quality, the easier it is to proceed towards corrective action and define market strategies. In most cases, such estimations are made by surveying users. The value of surveys is well recognised in measuring product quality characteristics as perceived by customers. As argued by Kaplan (Kaplan et al., 1995), surveys are quantifiable and therefore are not only indicators in themselves, but also allow the application of more sophisticated analysis techniques appropriate to organizations 1
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
with higher levels of quality maturity. On the other hand, surveys can be quite expensive and their results may change even during the data analysis phase, since –as will be presented in section 2– users’ opinion is subject to frequent changes. As several surveys over time reveal, the weight of the customer’s opinion of the quality of a software product increases over time. Consequently, every new characteristic of the product detected by the customer, which will undoubtedly lead to the revision of his belief of the product, must be considered. Moreover, with each newly−discovered feature, users become more assured of the validity of their opinion. Because of the high cost of surveys, a simpler, faster and more automated methodology must be adopted in order to estimate the revision of the opinion of user groups without needing to conduct a new survey. The goal of this paper is to present such a methodology. In the following sections, issues related to users’ opinion and how this opinion may change over time are discussed, with emphasis on the distinction between experienced and inexperienced users. Section 3 discusses the basic elements of the proposed methodology, while section 4 focuses on the methods used for collecting users’ opinion. Section 5 presents rules from Belief Revision theory and discusses how these can be adapted so as to meet the needs of the proposed methodology. Finally, conclusions are presented in section 6.
2. Issues related to users’ opinion The measurement and evaluation of users’ opinion of software quality are essential. Each user is considered to have three different parts of knowledge related to the subject that is measured: personal background, syntactic knowledge of the product and semantic knowledge of the application. (Xenos et al., 1995). In detail, the personal background consists of all the user’s attributes that are not related to the software product, such as the user’s age, his general education, his culture, etc. Syntactic knowledge is the knowledge of the existing software applications and the familiarization with the use of computers in general. Finally, semantic knowledge defines how well a user knows the activity that the software application automates. In other words, it defines how well a user understands the semantics of the under automation problem. These three parts, which are not considered to contribute equally to form the user’s profile, categorise users according to their experience; e.g. experienced and inexperienced users. In order to measure users’ opinion of software quality, we focused on the useroriented quality characteristics derived from the ISO9126 (ISO9126, 1991) standard (functionality, reliability, efficiency, usability). Nevertheless, the proposed method is applicable even if a different way to model the criteria is used, i.e. FCM model (McCall et al., 1977), FURPS (Grady et al., 1987), CUPRIMDSO (Kan, 1996). It was found that, although these characteristics are not correlated, users tend to believe that a correlation exists among them. Although this fact has been observed more emphatically in the case of inexperienced users than on experienced ones, this is a common mistake among both inexperienced and experienced users. In order to measure users’ opinion of a software product efficiently, surveys must be conducted at fixed time intervals. Despite the fact that such practices cannot be applied in a professional setting due to the high cost, monthly surveys were conducted for research purposes using the same sample of users for the same software products (Stavrinoudis et al., 1998). Figure 1 sums up the results of these surveys. 2
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
In figure 1, the limits of the differentiation of the user’s opinion over time of a typical software product that was chosen as an example are illustrated. The horizontal bar represents the time in monthly intervals and the vertical bar represents the user’s opinion, as was measured in the surveys. The user’s opinion in each survey takes values from 0 to 10, where 10 should represent the best opinion of the user-oriented quality characteristics mentioned above (functionality, reliability, efficiency and usability) and 0 the worst. The line AvgO represents the average users’ opinion of the quality of the software product, formed after the final opinion of users has been measured. The opinion of experienced users over time varies between the curves e1 and e2, whereas the opinion of inexperienced users over time varies between the curves i1 and i2. The size of these curves may vary according to the software. Usually, the initial difference between e1 and e2 has a value of 3 and the initial difference between i1 and i2 has a value of 6, where the line AvgO is in the middle of these differences. However, over time the differences between these curves are on a decrease. The experienced users, in contrast to the inexperienced, form an opinion on the quality of the product from the early stages of its release, which is very close to their final opinion. On the contrary, the inexperienced users will form an opinion close to their final opinion after using the software product for several months. The length of this period depends on the complexity of the product, the number and variety of functions it supports, the amount of usage (which in the case presented in figures 1 and 2 was not very frequent, as was determined from the surveys that were conducted) and the conditions under which usage occurs, as well as usage of similar software products. As noted from our experiments, this period of time usually varies from six to twelve months, when the user is experienced in the use of this specific product. 10 9 i1 8 e1 User's opinion
7
AvgO
6 e2 5 i2 4 3 2 1 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time in months
Figure 1: Boundaries of users opinion
After a period of time, the line AvgO usually starts to decline as the user requirements usually increase over time. This phenomenon is dependent on factors, such as the similar software products that may be released and the advances in hardware. It was
3
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
also observed that when an experienced user gives the software a higher score than his final score or vice versa, this does not display fluctuations but seems to slowly close the gap between the high or low score and the final score. Amongst inexperienced users however, such predictable variability was not observed; opinion fluctuated widely between i1 and i2. Over time, the degree of fluctuation receded to the users’ final opinion of the product quality. For example, the differentiation of inexperienced users’ opinion over time can be intimated by the diagram of figure 2, where UO represents a typical example of the changes in a user’s opinion. The calculation of UO depends on the method of collecting users’ opinion. In our research, we focused mainly on questionnaire-based surveys, where the equation (E.2), which is described in section 4, was used. This fluctuation results from the inexperienced user either finding a new feature of the product, which has remained undiscovered or has uncovered some aspect of the product, which the user has sought and has not found up till now and, as a result, rates the product highly. Similarly, if the user uncovers a flaw in the product (whether real or perceived), the user will rate it lowly regardless of whether the aforementioned flaw could not have been avoided at the production stage. Another important issue related to this analysis, as was revealed from the data of the surveys that were conducted, is that software quality factors are not clearly perceived by inexperienced users. If they discover a characteristic indicating that the product fails in one particular factor, then they consider that the product fails in all the other areas as well. On the contrary, experienced users do clearly perceive the independent nature of these factors. After a justifiable time period, inexperienced users become accustomed to the new features or flaws they discover in the product and, as a result, their opinion begins to lean towards the final opinion as is the case with experienced users. 10 9 i1 8
User's opinion
7
AvgO
6 UO
5
i2 4 3 2 1 0
1
2
3
4
5
6 7 8 9 Time in months
10
11
12
13
14
15
Figure 2: Fluctuation of inexperienced users’ opinion
4
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
3. Methodology From the measurements of the surveys, it is obvious that over time: a) the experienced users’ opinion of the quality of the software product approaches their final opinion and b) the deviation of the inexperienced users’ opinion from their final opinion declines continuously. Thus, the more a user uses a product, the more weight must be given to his opinion. In other words, the time factor must also be taken into account for effective measurements of software quality. The purpose of the proposed methodology is not to substitute the already existing techniques and methods of estimating the users’ perception of software quality. On the contrary, it aims to complement them, while it focuses mainly on the revisions of the users’ opinion. For example, software metrics can also be applied in parallel and supplementary to the proposed methodology. Besides, from our previous research (Xenos et al., 1996) we found that software metrics provide an easy and inexpensive way to detect and correct possible causes for low product quality as this will be perceived by the customers. Setting up measurement programs and metric standards will help in preventing failures to satisfy customers’ demand for quality. However, satisfaction of internal quality standards does not guarantee a priori success in fulfilling this customers’ demand for quality. Programs having a high score in software metrics perhaps will not receive the same acknowledgement by the customers. As a result, rare deployment of user-oriented measurements could be used in order not only to test the soundness of metrics measurement programs, but occasionally, even to calibrate internal metrics. Quality assurance teams must never forget that, despite what internal measurements indicate, the final judge for the quality of the produced software is the customer. The methodology centers primarily on the appropriate, as the case may be, choice of the method of collecting users’ opinion and then on the revision of the belief state of a user for the quality of a program, as his/her opinion changes over time. The analysis of the previous section revealed that, after a long period of time, inexperienced users will form an opinion that will be close enough to their final opinion of the quality of the product. The length of this period may surpass six months. As a result, in order to define the quality of a software product sufficiently in the early stages of its release, the sample of the users being asked could be restricted to experienced users. However, this policy is not always efficient and valid, since it is usually difficult to have a representative and adequate sample of experienced users. Furthermore, a software product may be used mostly, if not exclusively, by inexperienced users. As a result, we have to concentrate on their opinion as well. Additionally, in the early stages, the opinion of inexperienced users fluctuates greatly. Their opinion can be considered only if the sample of users is large enough to be considered representative, thus ensuring sound results. Moreover, the opinion of experienced users should be given greater weight than that of the inexperienced users, regardless of their being fewer than the former. The values for these weights can either be set a priori (a good recommendation is 70% to experienced and 30% to inexperienced users) or formed by using a formula that takes into consideration the qualifications of users, such as the equation (E.3) that is described in section 4. Furthermore, from the findings for individual user groups participating in the surveys, it was also observed that the larger the degree of fluctuation in their opinion, the more difficult it was for them to learn the features which are more relevant to their specific context. The first step of this methodology is to collect the initial users’ opinion of the quality of a software program. In order to accomplish this task, one or more of the methods 5
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
described in section 4 can be used in this initial survey. From this survey, the user’s opinion for every quality characteristic will be defined and represented by a value in the interval [0,10], which will allow the researcher to plot the initial belief state of this user for the quality of the software program. Over time, this belief state may change because of the revisions of the user’s opinion. Every new characteristic of the software program detected by the user, which will undoubtedly lead to the revision of his belief for the product, must be considered, since his opinion of the quality of this program increases over time. However, because of the high cost of surveys, a simpler, faster and more automated way must be found, in order to estimate the revision of the users’ opinion without needing to conduct a new survey every while. The alternative solution we propose is the use of belief revision rules to model the user belief states. As previously mentioned, users, especially inexperienced ones, tend to believe that a correlation exists between quality characteristics. Moreover, users’ opinion usually changes over time. This is especially true in the case of the inexperienced users, whose opinion fluctuates radically. During the first two months of the product’s use this fluctuation may surpass 25% of the value of the final opinion. Tracking such opinion changes using surveys can be problematic since it might require using questionnaire–based survey methods so frequently that it cannot be feasible or cost effective. In other words, the procedure of measuring users’ perception of quality is time–consuming especially when the results of a given survey need to be confirmed by subsequent surveys. Using aspects from Belief Revision theory, when the user’s opinion of one specific characteristic changes, we are able to calculate his/her opinion for all the characteristics he/she thinks are related to the specific one without needing to evaluate his/her opinion for each one. Thus, user belief revisions can be identified just by a simple phone call or brief contact with the user and there is no need to conduct a new survey frequently. An example of characteristics that most users incorrectly tend to consider related is “suitability” (a sub-characteristic of the “functionality” characteristic) and “operability” (a sub-characteristic of the “usability” characteristic), as these characteristics are stated in the ISO9126 standard. Furthermore, as stated in ISO924111 (Bevan, 1997), changes in functionality, reliability or efficiency can also have an impact on user performance and satisfaction and, therefore, an impact on usability. With this in mind, it can be understood how changes in users’ opinions regarding a quality characteristic often affect their opinion of many other characteristics. The representation of the above relation is the determination of a set of constraints between the scores of each quality characteristic, which is the second step of this methodology. Each constraint defines the relation between two characteristics and takes the form Ai ∈ [h1,h2] ⇒ Aj ∈ [c1,c2], where Ai and Aj are the scores of two different quality characteristics i and j, and [h1,h2] and [c1,c2] are subintervals of [0,10], which means that if the score of the criterion i takes a value from h1 to h2, then the score of the criterion j will take a value from c1 to c2. In this paper, we shall assume that such constraints expressing dependencies between quality characteristics are not probabilistic. That is, the constraint Ai ∈ [h1,h2] ⇒ Aj ∈ [c1,c2] states that whenever the value of Ai is between h1 and h2, then the value of Aj is always (i.e. with probability 1) between c1 and c2. In future work, we shall consider ways of relaxing this assumption by extending our models with probabilistic rules. In order to diversify the users’ perceptions of the relation between the quality characteristics, different sets of constraints must be determined according to the users’ experience level. The more experienced the user, the less strict the constraints must be. In other words, users must firstly be categorised into different levels of 6
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
experience, each one corresponding to a different set of constraints. This categorisation may be achieved by using an appropriate questionnaire for measuring users’ experience. The following example represents the constraints between 3 characteristics: A1 ∈ A1 ∈ A1 ∈ A1 ∈ A1 ∈ A2 ∈ A2 ∈ A2 ∈
[0,2] [4,6] [7,10] [0,2] [8,10] [0,2] [5,7] [9,10]
⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒
A2 ∈ A2 ∈ A2 ∈ A3 ∈ A3 ∈ A3 ∈ A3 ∈ A3 ∈
[0,4] [3,7] [5,10] [0,4] [5,10] [0,5] [3,8] [6,10]
A2 ∈ A2 ∈ A2 ∈ A3 ∈ A3 ∈ A3 ∈ A3 ∈
[0,2] [5,7] [9,10] [0,1] [9,10] [0,2] [9,10]
⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒
A1 ∈ A1 ∈ A1 ∈ A1 ∈ A1 ∈ A2 ∈ A2 ∈
[0,4] [2,8] [7,10] [0,5] [6,10] [0,5] [6,10]
If, for example, the current belief state of a user is CBS={A1=6, A2=4, A3=2} and his opinion for A1 changes to 7, then this revision has minimal changes to the CBS according to belief revision rules. As a result, according to the constraints, his new belief state will be NBS={A1=7, A2=5, A3=3}. In order to take these constraints into consideration, an appropriate belief revision model must be defined. The main aim of this model is to determine the new belief state of the user without having to conduct another survey. The basic philosophy of this model is illustrated in Figure 3, where the inputs of the model are the current belief state of the user (Initial BS) and the new info of the software program that he/she has discovered and the output is his/her new belief state (New BS). New Information
initial BS
New BS
Inital User's Score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score initial user's score
New User's Score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score
new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score
new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score new user's score
Figure 3: The basic philosophy of the model
In order to relate the revision of quality characteristics scores to belief revision, and in particular the ones related to the AGM paradigm, Grove’s Systems of Spheres (Grove, 1988) will be used. Section 5 shows a more analytical presentation of this system.
4. Collecting users’ opinion As previously mentioned, the first step of the proposed methodology is a survey to measure the initial users’ opinion of the quality of a software program. Although during our research we preferred to conduct surveys based on questionnaires, this methodology can be easily modulated in order to use another method of collecting 7
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
users’ opinion that has already been developed. These methods can be firstly divided to analytic and empiric ones (Nielsen, 1993), as presented in figure 4. The analytic methods are theoretical models, rules or standards that simulate user’s behaviour. They are mainly used during the requirement analysis phase and usually even before the development of the prototypes of a product. As a result, the users’ participation is not required in these methods. On the contrary, the empiric methods, in which the proposed methodology focuses, depend on the implementation, the valuation and the rating of a software prototype or product. In this rating it is necessary for the participation of a representative sample of the end-users or/and a number of experienced valuators of the quality of a software product. The empiric methods can be divided into experimental and inquiry ones. Collecting users' opinion methods
Analytic methods
Empiric methods
theoretical models, rules or standards Experimental methods
Inquiry methods
Performance measurement Thinking aloud protocol User actions logging etc.
User questionnaires User interviews Focus groups Field observation etc.
Figure 4: Collecting users’ opinion methods
The experimental methods require the participation of the end-users in a laboratory environment and the most known are the following: • Performance measurement. It is a classical method of software evaluation that provides quantitative measurements of a software product performance when users execute predefined actions or even complete operations. The users are let to perform these actions having only a narrow guidance at the beginning, so that the interaction between them and the responsible person of the survey will be restricted to a minimum. • Thinking aloud protocol. This method focuses on the measurement of the effectiveness of a system and the user’s satisfaction. According to this method, a small number of users, usually 3 to 4, interact with the system, while they state aloud their thoughts, opinions, emotions and sentiments of the system. All the above are recorded, in order to be analysed in combination with the users’ actions, which are also recorded. • User actions logging. There are many techniques to record the actions of users while they interact with a software product. The most common are the notes of the researcher, the voice recording of the users, the video recording of the users, computer logging and user logging. The researcher can use one or more of the above techniques simultaneously. The inquiry methods are concerned in the examination of the quality characteristics of a software product by measuring users’ opinion. According to these methods, the survey is generally conducted at the physical working place of the users, who evaluate 8
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
either a forward prototype of a product or its final version. A large number of users are needed for the inquiry methods and the most popular are the following: • User questionnaires. In this method, users are requested to express their opinion of the quality of a software product by completing a structured questionnaire, which consists of questions usually with a multiple-choice format. These questionnaires are sent to users, who answer them unaffectedly, i.e. without any possible influence from the person who conducts the survey. Each question is concerned with a specific quality characteristic and has its own weight to the whole questionnaire. These weights are either equal for all characteristics or may vary in order to allow emphasis on one or more specific characteristics. In the former case, the questionnaire designer aims at the assessment of software quality as equally affected by all quality characteristics. In the latter case, emphasis is placed on some specific characteristics, e.g. in the case of educational software used by small children, the questionnaire designer may need to weigh usability quite higher than all the other ISO9126 characteristics. • User interviews. This is a structured method of evaluating a software product, where the researcher is in direct contact with the user. The questions of the interview follow a hierarchical structure, through which the general opinion of the product is first formed after which more specific matters of the quality characteristics are considered. • Focus groups. This method is a variation of the previous one, where a group of 5 to 10 users is formed under the supervision of a coordinator, who is responsible for the preparation of the topics of their conversation. At the end of this conversation, the coordinator will gather their conclusions on the quality of the software product. • Field observation. With this method, the researcher observes the users at their working place, while they are using and interacting with the software product. In order to analyse statistically the derived data, we focused mainly on questionnairebased surveys. We assume that all the questions have a multiple-choice format and users select predefined responses. The users were given specific instructions that the differences among the possible answers are of equal gravity, so as responses to be considered in interval scale instead of ordinal. An example of how responses were offered to the users, is shown in figure 5. However, this statistical analysis can be easily generalized, so that it can be applied to any of the aforementioned methods of collecting users’ opinion. In order to determine a user’s opinion of the quality of a product, his/her responses to the already conducted survey must be retrieved. In the case of a structured questionnaire, the questions are clustered in groups, according to which quality characteristic they refer to. 0 Poor
1
2
3
Nearly poor
4
5 Average
6
7
8
Good
9
10 Excellent
Figure 5: Example of how responses were offered to the users
The formula CjOi measures the opinion of a single user ‘i’ for the quality of the product, according to a software quality characteristic ‘j’. In equation (E.1) ‘m’ is the number of questions for this quality characteristic in the questionnaire, ‘Qk’ is the weight given to the question ‘k’ and ‘Vk’ is the value of the response that the user selected.
9
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
m
(E.1)
C jO i =
∑ (Q k =1
⋅ Vk )
k
m
∑Q k =1
k
The formula Oi measures the opinion of a single user ‘i’ o the quality of the product according to all the quality characteristics that are dealt with in the questionnaire. In equation (E.2) ‘n’ is the number of the different quality characteristics, ‘Cj’ is the weight given to the quality characteristic ‘j’ (by the questionnaire designer) and ‘CjOi’ is the opinion of the user for this quality characteristic.
∑ (C n
(E.2)
Oi =
j=1
j
⋅ C jO i )
m
∑C j=1
j
Finally, in order to measure the average users’ opinion of the quality of a software product, either the QWCO (Qualifications Weighed Customer Opinion) technique, which is measured using the formula shown in equation (E.3), or the QWCODS (Qualifications Weighed Customer Opinion with Double Safeguards) technique, which is measured using the formula shown in equation (E.4) can be selected. x
(E.3)
QWCO =
∑ (O i =1
i
x
∑E i =1
QWCO DS =
i
⎛
x
(E.4)
⋅ Ei )
∑ ⎜⎜ O i =1
⎝ x
i
⋅ Ei ⋅
⎛
∑ ⎜⎜ E i =1
⎝
i
⋅
⎞ Si ⋅ Pi ⎟⎟ ST ⎠
⎞ Si ⋅ Pi ⎟⎟ ST ⎠
The aim of these techniques is to weigh users’ opinions according to their qualifications. In order to achieve this, ‘Oi’ measures the normalised score of user ‘i’ opinion, as shown at the equation (E.2), ‘Ei’ measures the qualifications of user ‘i’, while ‘x’ is the number of users who participated in the survey. In order to detect errors, we use a number of safeguards embedded in the questionnaires. Safeguard is defined as a question placed inside the questionnaire so as to measure the correctness of responses. In equation (E.4) ‘Si’ is the number of safeguards that the customer ‘i’ has replied to correctly, ‘ST’ is the total number of safeguards and ‘Pi’ is a boolean variable which is zero when even a single error has been detected by this safeguard when measuring the qualifications of customer ‘i’.
10
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
5. Using principles of Belief Revision Theory 5.1. A Review of AGM Belief Revision
In this section, we shall briefly review the main ideas and results from the area of belief revision, and in particular the ones related to the AGM paradigm. Firstly however, we need to introduce some notation. The AGM paradigm is based on a formal logical language L, which for the purposes of this article will be assumed to be propositional1. For a set of sentences Γ of L, we denote by Cn (Γ) the set of all logical consequences of Γ , i.e. Cn (Γ) = {φ ∈ L : Γ φ} . A theory K of L is any set of sentences of L closed under , i.e. K=Cn(K). We shall denote the set of all theories of L by TL. A theory K of L is complete iff for all sentences φ ∈ L , φ ∈ K or ¬φ ∈ K . We shall denote the set of all consistent complete theories of L by ML. In the context of belief revision, consistent complete theories often play the role of possible worlds. Following this convention, in the rest of the article we shall use the terms “possible world” (or simply “world”) and “consistent complete theory” interchangeably. For a set of sentences Γ of L, [ Γ ] denotes the set of all consistent complete theories of L that contain Γ . Often we shall use the notation [ φ ] for a sentence φ ∈ L , as an abbreviation of [{ φ }]. For a theory of K and a set of sentences Γ of L, we shall denote by K + Γ the closure under K ∪ Γ , i.e. K + Γ = Cn (K ∪ Γ) . For a sentence φ ∈ L we shall often write K + φ as an abbreviation of K + {φ} . Having fixed some notation, let us now briefly review the main definitions and results from AGM approach to belief revision. In their framework, Alchourron, Gardenfors and Makinson (Alchourron et al., 1985), represent belief states as theories of L, and the process of belief revision is modelled as a special function * over theories called a revision function. More precisely, a revision function * is defined as a function from TL x L to TL, mapping K , φ to K ∗ φ , that satisfies the following postulates:
(K.1) K ∗ φ is a theory of L. (K.2) φ∈K ∗φ. (K.3) K ∗φ ⊆ K + φ . (K.4) If ¬φ ∉ K then K + φ ⊆ K ∗ φ . (K.5) K ∗ φ = L iff ¬φ . (K.6) If φ ↔ ψ then K ∗ φ = K ∗ ψ . (K.7) K ∗ (φ ∧ ψ ) ⊆ ( K ∗ φ ) + ψ . (K.8) If ¬ψ ∉ K ∗ φ then (K ∗ φ) + ψ ⊆ K ∗ (φ ∧ ψ) . The above set of postulates are mainly motivated by the principle of minimal change, according to which as little as possible changes at the original belief state K in order to accommodate the new information φ . For a detailed discussion on the principle of minimal change and the motivation behind (K.1)-(K.8), refer to (Gardenfors, 1988). Apart from this axiomatic approach to belief revision, a number of explicit constructions for this process have been proposed (Alchourron et al., 1985), (Grove, 1988), (Gardenfors et al., 1988), (Peppas et al., 1995). In this article we will examine only one of these constructions, proposed by Adam Grove (Grove, 1988), which is based on system of spheres. 1 This is not required by the AGM paradigm, but it simplifies our presentation. For a description of the minimal requirements for L, refer to (Gardenfors, 1988).
11
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
Let A be a subset of ML (i.e. A is a set of consistent complete theories). A system of spheres S centred on A is a collection of subsets of ML, the elements of which are called spheres, that satisfies the following conditions: (S.1) S is totally ordered with respect to set inclusion; that is, if U, U ′ ∈ S then U ⊆ U ′ or U ′ ⊆ U . (S.2) The smallest sphere in S is A; that is A ∈ S , and if U ′ ∈ S then A ⊆ U′ . (S.3) M L ∈ S (and therefore ML is the largest sphere in S). (S.4) For every φ ∈ L , if there is any sphere in S intersecting [φ] then there is also a smallest sphere in S intersecting [φ] . For a system of spheres S and a consistent sentence φ ∈ L , the smallest sphere in S intersecting [φ] is denoted C S (φ) . When φ is inconsistent C S (φ) is taken to be ML. With any system of spheres S, Grove associates a function f S : L α 2 M L defined as follows, f S (φ) = [φ] ∩ CS (φ) , for every φ ∈ L . Consider now a theory K of L and let S be a system of spheres centered on [K]. Grove uses S to define constructively the process of revising K, by means of the following condition. if [φ] ≠ O / ⎧∩ f S (φ) ⎪ K ∗φ = ⎨ (S*) ⎪ L otherwise ⎩
Let us briefly consider the intuition behind a system of spheres as a basis for constructing revision functions. Let K be a consistent theory of L taken as a belief state. A consistent complete theory of L is treated as a possible world, and a system of spheres S centered on [K] is regarded as an ordering on possible worlds, representing their comparative plausibility, given K as the current belief state. The closer a possible world is to the center of S, the more plausible it is. With this reading of a system of spheres, condition (S*) defines the revision of K by φ to be the theory determined by the most plausible worlds satisfying φ . Grove shows that the class of functions generated from systems of spheres by means of (S*) is precisely the family of AGM revision functions (i.e. the functions satisfying the AGM postulates). Grove’s result is central to the forthcoming analysis, and it is through this result that we shall prove the connection between users’ opinion dynamics and AGM belief revision in section 5.3. 5.2. Revising Software Scores
The AGM paradigm was introduced as a general framework for studying the process of belief revision. In this article however, we are mainly interested in the more specific process of changing assessments about software packages. We shall therefore introduce in this section a new formalism tailored specifically to revising software assessments. In section 5.3 we shall compare our approach to AGM belief revision. Let A1, A2, …, An be the “factors” that represent the user’s opinion about a software package (for example, A1 may stand for functionality, A2 for reliability, etc.). Moreover, assume that the values of these factors are integers in the range [0,10].
12
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
We define a software score (or simply score) to be a vector υ = r1 , r2 ,..., rn , where the values of ri are in the interval [0,10]. The intended meaning of υ = r1 , r2 ,..., rn is that it assigns values to the factors A1, A2, …, An (i.e. A1=r1, A2=r2, etc). We shall denote the value that a vector υ assigns to the factor Aj, by υ( j) (i.e. υ( j) =rj). Clearly, there are 11n different scores. We shall denote the set of all scores by S. Consider now a user whose assessment about the quality of a particular software package are represented by the score υ = r1 , r2 ,..., rn . Moreover, assume that, after further interaction with the software, this user changes his belief about the value of the factor Aj and, instead of rj, he now believes that Aj has the value pj. Then, the principle of minimal change dictates that this user’s new beliefs about the software should be represented by the score υ ′ = r1 ,..., r j−1 , p j , r j+1 ,..., rn , which differs from the initial score υ only on the value of Aj. More formally, we shall call an ordered pair A j , p j , consisting of a factor Aj and a (new) value pj in the interval [0,10], an assignment pair. Given an assignment pair A j , p j and an initial score υ , by υ ο A j , p j we shall denote the score revision of υ by
A j , p j . Hence, according to this notation, in the above example,
υ ο A j , p j = r1 ,..., r j−1 , p j , r j+1 ,..., rn . Defining score revision for situations like the one above, is of course straightforward. Things become more complicated when dependencies between factors enter the scene. In this article, we assume that dependencies take the form A i ∈ [h 1 , h 2 ] ⇒ A j ∈ [c1 , c 2 ] , where [h1,h2] and [c1,c2] are subintervals of [0,10]. We say that a score υ satisfies the above formula, which we shall call a constraint, iff h 1 ≤ υ (i) ≤ h 2 entails c1 ≤ υ( j) ≤ c 2 (that is, if the value of Ai is in the interval [h1,h2], then the value of Aj is within the range [c1,c2]). We assume that a set C of such constraints is given a priori and is dependent upon the particular features of the software package and the characteristics of the user (or user group) under consideration. We shall say that a score υ is valid iff it satisfies all the constraints in C. We shall denote by SC the set of all valid scores. Let us now re-examine the process of score revision in the presence of constraints. Assume that the initial score is υ = 0,0,0,...,0 . Moreover, assume that C consists of the following constraints: (C.1) A 1 ∈ [2,5] ⇒ A 2 ∈ [3,6] (C.2) A 2 ∈ [2,4] ⇒ A 3 ∈ [4,7] Let us now consider a user’s beliefs after changing the value of A1 to 3. Changing only the value of A1 results in the score υ′ = 3,0,0,...,0 . The problem with υ′ however is that it violates the constraints in C and consequently cannot be chosen as the user’s new score. In particular υ′ violates constraint (C.1). To satisfy (C.1), we need to give A2 a value in the interval [3,6]. In an attempt to minimize change, let us assign A2 the value 3, thus deriving the score υ′′ = 3,3,0,...,0 . Unfortunately, the new score υ′′ is also invalid; this time it violates the constraint (C.2). To remedy this, while at the same time keeping changes to a minimum, we set A3 to 4, thus arriving at the score u = 3,3,4,...,0 . This time the new score u is valid (i.e. it satisfies the
13
Pre-print version of the paper published in Software Quality Journal, 13, pp. 155-175, 2005.
constraints) and therefore appears to be the right choice for the user’s new assessment of the software. Consider however the following alternative scenario. Starting from the invalid score υ′ = 3,0,0,...,0 , this time we change A2 to 5, thus generating the score u ′ = 3,5,0,...,0 . Notice that no further changes are required to u ′ since constraint (C.2) no longer applies. Hence, u ′ is a valid score having the right value for A1, and furthermore, it is in a certain sense, a minimal change from the initial score υ . Examining closer the scores u and u ′ we observe that in the latter we have essentially traded an increase of A2 by 2 with a decrease of A3 by 4. Does this make u ′ a smaller change to υ than u ? The answer depends on the application at hand; it relates to the comparative importance of A2 and A3 for the user and the software under consideration. To deal with this ambiguity, we shall assume that the factors are listed in order of importance; that is, A1 is more important than A2, which again is more important than A3, and so on2. Under this assumption, u is a smaller change to υ than u ′ and it is therefore the user’s new score. More generally, let υ , u and z be three scores, with u ≠ z . Moreover, let Aj be the first factor for which u and z have different values. We shall say that u is closer to υ than z , which we denote by u < υ z , iff either υ ( j) − u ( j) < υ ( j) − z ( j) or υ ( j) − u ( j) = υ ( j) − z ( j) and u ( j) < z ( j) . We shall use u ≤ υ z to denote the fact that
either u < υ z or u = z . Notice that the binary relation ≤ υ depends on the initial score υ ; in fact, there is a different ≤ υ for each υ ∈ S . Lemma 5.2.1 For any score υ ∈ S , the binary relation ≤ υ is a total order. Proof. Reflexivity follows immediately from the definition of ≤ υ . For transitivity, consider any three scores u , z, w ∈ S , such that u ≤ υ z ≤ υ w . If any two of these scores are identical, then u ≤ υ w trivially follows. Assume therefore that u ≠ z ≠ w and consequently u < υ z < υ w . Let Aj be the first factor at which u and z differ. Similarly, let Ak be the first factor at which z and w differ. From u < υ z it follows that either υ ( j) − u ( j) < υ ( j) − z ( j) or υ ( j) − u ( j) = υ ( j) − z ( j) and u ( j) < z ( j) . On the other hand, from z < υ w it follows that either υ (k ) − z (k ) < υ(k ) − w (k ) or υ (k ) − z (k ) = υ (k ) − w (k ) and z (k ) < w (k ) . We distinguish between three cases:
jk. Starting with the first case, let us assume that j