visual query language, namely QBD*, or a well-known traditional textual language such as ... 2) developing a task for users to perform;. 3) measuring ...... IOS Press, (Proceedings of the Second European-Japanese Seminar on. Information ...
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof Tiziana Catarci & Giuseppe Santucci Dipartimento di Informatica e Sistemistica, Università degli Studi di Roma “La Sapienza”, Via Salaria, 113, 00198 Roma, Italy Tel./Fax: +39-6-49918331 Email: [catarci/santucci]@infokit.dis.uniroma1.it The importance of designing query system which are effective and easy to use has been widely recognized in the database area. Also, it is well known that the adequacy of a system can be mainly tested against actual users in a well settled experiment. However, very few such experiments have been conducted. The overall objective of our study is to measure and understand the comparative ease with which subjects can construct queries in either a novel visual query language, namely QBD*, or a well-known traditional textual language such as SQL. More specifically, we are interested in determining whether there is significant interaction between: 1) the query class and the query language type; and 2) between the type of query language and the experience of the user. The analysis of the experiment results allows us to say that the effectiveness of a query language varies depending on the classes of queries and the kinds of users. However, the result trend is generally in favor of the QBD* approach, which is based on a conceptual data model, closer to the user view of the reality than the relational model, a visual representation of such a model, more attractive and graspable than a textual list of table names, and direct manipulation commands, having a syntax much easier than the SQL one. Keywords: user interfaces, databases, visual query languages, usability tests
Tiziana Catarci & Giuseppe Santucci
2
1 Introduction The problem of easily extracting information from databases has been discussed extensively in the database community. Several query modalities representing alternatives to traditional query languages such as SQL, have been proposed. Most of them are based on the use of visual representations and direct manipulation interaction mechanisms (see (Batini et al., 1991) for a survey of visual query languages, and (Shneiderman, 1983) for the definition of direct manipulation). All of the proposed approaches emphasize, in principle, the role of the end user in the design life cycle of a query system. In addition, the importance of designing and testing interfaces for ease and effectiveness of use has become widely recognized (Nielsen, 1993; Bevan & Macleod, 1993). For example, a recent panel on user interfaces at VLDB’94 (Spaccapietra, 1994) has argued for the inclusion of usability considerations in the design of very large data bases. However, despite this growing recognition, very few empirical studies aiming at testing and validating the effectiveness of various query styles and interfaces have been conducted in the database field. Reisner (1988) reviews most of the experimental research on traditional query languages covering mainly work done on SQL and QBE. Other earlier surveys on experiments in the database field include papers by Shneiderman (1978, 1980), Thomas (1977), and a recent review by Ahlberg et. al. (1993). A common goal of such empirical studies is to provide a quantitative basis for the comparative effectiveness and ease of use of a given query language. The general approach used in most such studies includes: 1) defining precisely what one is to measure; 2) developing a task for users to perform; 3) measuring relevant parameters of user performance (i.e. error rate, time to complete a task, etc.). Measuring the ease of using a query language is itself a difficult task. It involves having to identify and control a large number of extraneous variables including, among others, the cognitive capabilities and limitations of the user. To capture at least some aspects of ease of use, experimenters have developed a number of different tasks (Reisner, 1988). The most common are query writing and query reading tasks, which are performed by investigating the relationships between database queries expressed in natural language and the same queries expressed in the query system under study. In query writing, the question is: “Given a query in natural language how easily can a user express it through the query language statements? The question for query reading is: “Given a query expressed through the query language statements can the user express the query easily in natural language?”. Two different classes
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
3
of experiments have been reported in the literature: one aiming at evaluating the ease of use of a single language, the other aiming at establishing which is the easiest to use among two or more languages. Our approach falls in the second category. We performed a series of experiments in order to compare, based on their usability, QBD*, a diagrammatic query language developed in our Department (Angelaccio Catarci & Santucci, 1990a; Angelaccio Catarci & Santucci, 1990b) and SQL. The actual experiment was preceded by a teaching period. We tried to make as many of the conditions of the experimentation of the two languages the same as we reasonably could (i.e., teaching method, exam questions, exam procedure, method of scoring). We also examined some background information about the various users in order to divide them into equivalence classes and to assign to each language the same set of classes. This is commonly done in experiments in order to minimize the probable effects of uncontrolled extraneous variables leading to a confounding of results. It is worth noting that, as far as we know, this is the first rigorous study of comparison between a visual query language and a well-known traditional language, such as SQL. The results of the experiment confirmed the superiority of the QBD* approach, which is based on a conceptual data model, closer to the user view of the reality than the relational model, a visual representation of such a model, more attractive and graspable than a textual list of table names, and direct manipulation commands, having a syntax much more easier than the SQL one. Our belief (see (Catarci et al., 1995)) that a visual approach is particularly suited for medium and naive users, and for certain types of queries, was also confirmed. The paper is organized as follows. Section 2 provides a short description of the QBD* system (we assume the reader familiar with SQL). Section 3 introduces the user and query clasification and the evaluation techniques that have been adopted in the experiment. Section 4 describes the experimental framework and comments on the numerical results. conclusions are drawn in Section 5.
Finally,
2 QBD* The QBD* system is a visual query system based on a diagrammatic representation of Entity Relationship (ER) schemata (Chen, 1976) describing underlying (relational) databases. It balances a high expressive power with a noticeable facility of use. Concerning the expressive power, it has been proved to be relationally complete; moreover it includes a significant class of recursive queries (transitive closure) and handles an extension of the relational algebra set-
Tiziana Catarci & Giuseppe Santucci
4
oriented operators that we call generalized set-oriented operators. Through those operators it is possible to compute set-oriented operations on any pair of relational tables sharing the same identifier. The ease of use has been reached through a fully graphical environment. In that environment the user interacts with the system mainly with a mouse-like device, using the keyboard only when necessary. The general structure of the QBD* query is based on the location of a distinguishing concept, called main concept, that can be seen as the entry point of one or more subqueries; these subqueries express possible navigations from the main concept to other concepts in the schema. The attributes belonging to each subquery are determined by the following strategy: the attributes of the main concept are automatically considered (unless explicitly deleted by the user), while the other ones are shown in the result only if requested by the user. The presence of a main concept associates a type with each subquery: as an example, if the main concept is the entity person, the result of the query will be a set of persons (eventually enriched with attributes coming from other concepts), no matter what kind of subsequent operations the user performs to specify it. The subqueries can be combined together by applying several set-oriented operators on two or more subqueries derived from the same entity. The structure of the involved subqueries being, in general, different, it is impossible to perform simple set-oriented operations, and a more general approach is needed (see (Santucci & Sottile, 1993) for the specific solution adopted within QBD*). Once the main concept has been selected, two different types of primitives are available for navigating in the schema. The first one allows the user to follow existing paths on the schema (see Figure 2.1); the other one (see Figure 2.2), is used for comparing two concepts by building a new path (relationship) on the schema. Comparison conditions, as well as generic conditions on the attributes, are expressed by means of a simple window mechanism including a suitable set of icons. Roughly speaking, we can say that a path on the schema corresponds to an ordered sequence of natural joins1 among the pairs constituting the path, followed by a final selection and projection. The explicit presence of relationships releases the user from the need of looking for concepts like “foreign keys” and the system can perform in the right way the join operations. On the other hand, comparing two concepts via a new relationship corresponds to a theta-join between the two involved entities, and it is up to the user the specification of the theta-condition.
1
We mean here the natural join and theta-join operators as defined in relational algebra [Codd 1972].
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
5
Figure 2.1: Schema navigation through existing paths. Queries involving transitive closure are expressed by a visual approach similar to the one in Figure 2.2b, the difference being the pre-selection of a particular icon, which signals the beginning of a recursive session. However because of standard SQL does not support recursion, this kind of queries has not been included in the experiment.
2.2 b
Figure 2.2: Schema navigation through the building of a new relationship (2.2a) on the basis of attribute comparisons (2.2b).
3 The Experiment Factors In this study, we conjecture that there is a significant relationship between the independent variables, user skill and query type, and the query language used on the time and accuracy of query construction. Therefore in the experiment, users of different skill levels will be asked to construct different types of queries. This choice reflects (and aims to validate) the current literature trend which relates the appropriateness of the visual representations to the level of the user’s skill and to the kind of queries s/he wants to perform (see, e.g., (Catarci et al., 1995;
Tiziana Catarci & Giuseppe Santucci
6
Cruz, 1992; Jarke & Vassiliou, 1985)). In this section we describe the evaluation technique we followed in the experiment, as well as the user and query classes.
3.1 Evaluation Technique Recently, a lot of emphasis has been placed on evaluation techniques, since there are several reasons to evaluate software. People involved in developing software products are interested in evaluations to assist them in making design decisions and to determine whether or not the products achieve the quality measure that must be met. Individuals, who buy the software because they want to use it, need to evaluate the software before purchasing it to check whether it answers key question about usability measures. Even if there is some similarity in the various evaluations, since they are all looking to answer whether the system adequately meets the needs of the user, the above interests are not served by a single evaluation technique. On the contrary, a variety of techniques have been proposed. The first requirements in order to design a system evaluation are to assess the evaluation purpose, what is being evaluated, who the evaluation is for and, finally, what one hopes to gain. In our case the purpose of the evaluation is to compare two different query systems (a visual one and a traditional one) in order to understand which one is the most easy to use by different user classes. In other words, we are trying to compare the usability of the two systems. Several different definitions of usability exist (Miller, 1971; Shackel & Richardson, 1991; ISO, 1991 ISO, 1993; Badre, 1993; Maissel et al., 1993). In particular, in MUSiC (Maissel et al., 1993) a very comprehensive definition of usability is given as "the extent to which a product can be used with efficiency, effectiveness and satisfaction by specific users to achieve specific goals in specific environments". From this point of view, the usability is the quality of interaction between the user and the overall system. It can be evaluated by measuring three factors: 1) effectiveness, i.e. the extent to which the intended goals of the system can be achieved; 2) efficiency, i.e. the time, the money, the mental effort spent to achieve these goals; 3) satisfaction, i.e. how much the users feel themselves comfortable using the system. It is worth noting that the usability depends from the overall system, i.e. the context, which consists of the types of users, the characteristics of the tasks, the equipment (hardware, software and materials) and the physical and organizational (e.g. the working practices) environment.
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
7
The elements that make up the quality of interaction need in general to be measured. Since the quality of interaction depends on the context of use, there is no general rule for how measures can be combined. Usually we will have to provide at least one measure for each element of the quality of interaction. In our experiment we aimed at measuring the effectiveness and the efficiency of the query systems. The effectiveness has been evaluated by relating the goals or subgoals of using the system to the accuracy and completeness with which these goals can be achieved. In our case the main goal is to extract information from the database by asking queries and the accuracy in achieving such a goal has been measured in terms of the user correctness rate when writing the queries. Measures of efficiency relate the level of effectiveness achieved at the expense of some resources, like mental and physical effort, time, financial cost, etc. In principle, we can consider both the user and the organization point of view, while for our experiment purposes we have measured only the former in terms of time spent carrying out the task. In the latter, we could measure the economic efficiency as the effectiveness in relation to the total cost, which is the labor costs of the user's time, the cost of both resources and equipment used, and the cost of any training required by the user. Moreover, other kinds of measures can be evaluated. Measures of satisfaction describe the comfort and acceptability of the overall system used by people. Some measures are related to the need to quantify learnability and flexibility, which can be assessed by measuring effectiveness, efficiency and satisfaction across a range of contexts. The learnability of a product may be measured by comparing the usability of a product handled by one user, keeping in mind his/her knowledge and experience. The flexibility of a product handled by users to carry out different tasks can be assessed by measuring usability in different contexts. The method we adopted in order to measure the effectiveness and efficiency of the systems was the so-called observational evaluation, one of the most well-known methods for evaluating usability (Macleod, 1992; Bevan & Macleod, 1993). Such a method involves real users that are observed when interacting with the system and offers a broad evaluation of usability. We may either apply the observational evaluation by direct observation or record the interaction between users and system. The recording (done by video camera) is more valuable, since it allows to store a lot of information, for examples the critical points during the interaction (when the user has to consult the manual, when and where s/he is blocked, etc.), the time a user spends to perform a task, the mistakes a user makes, and so on. However, it is also very expensive, not for setting up the equipment, but for the time required
Tiziana Catarci & Giuseppe Santucci
8
to analyze the recorded data. For this reason we preferred to rely on a very careful direct observation.
3.2 User Classes A few user classifications have been proposed in the database literature (e.g., (Catarci et al., 1995; Jarke & Vassiliou, 1985; Cuff, 1980)). Each of them identifies a certain number of features which permits the labeling of a homogeneous group of users. The number and the kinds of groups differ depending on the specific classification. However, there is at least a general agreement on the initial splitting of the users in to two large groups: those who have had a certain instruction period and have computer and database knowledge, and those who do not have specific training in computer usage and database interaction. In our experiment we call those two groups naive and expert users respectively. Our naive user is roughly characterized by the following features: 1) s/he interacts with the computer only occasionally, 2) has little, if any, training on computer usage, 3) has low tolerance for a formal query language, 4) is unfamiliar with the details of the internal organization of an information system. Usually this user does not want to spend extra time in order to learn how to interact with a query system, and finds it irritating to have to switch media, e.g., to manuals, in order to learn how to interact with the system. Moreover, the naive user wants to know where s/he is and what to do at any given moment of the interaction with the system. Notice that the naive user is very similar to Cuff’s (1980) casual users. On the other hand, expert users possess knowledge of database management systems, query and manipulation languages, etc., and often like to acquire a deep understanding of the system they are using. In order to come out with a thinner classification we individuated a third kind of users, who are in a medium position with respect to the two categories above (we call them intermediate users). They are persons having a general knowledge about computer science but no experience on databases
3.3 Query Classes For the purpose of our experiment we need to identify prototypical classes of queries. One could argue that several query classifications have been proposed in the literature (see, e.g. (Chandra 1988; Kanellakis, 1990; Catarci, 1991)). However, those classifications are all based on considering the so-called expressive power of the query language, i.e. the ability of the language to extract meaningful information from the database, and/or the complexity of computing the query answer. These kinds of classifications, while extremely valuable in
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
9
general, are inadequate for our purposes. Indeed, they are mainly based on the idea of grouping queries depending on the constructs they involve, taking as a ground a specified set of constructs, say of relational algebra. For example, queries involving only selection operations are commonly considered as belonging to a specific class, despite the kind of formula used for expressing the selection condition (e.g., with or without boolean operators). On the contrary, we need a query classification which evaluates in a different way queries belonging to the same “traditional” class, but having different intrinsic complexities in terms of, for instance, the number of involved database concepts and the kind of mutual relationships among concepts. Also, we have to consider that the query languages to be compared are based on two different data models. Indeed, QBD* is based on the Entity-Relationship model, while SQL is based on the relational model. In order to partly overcome this difference, and lay the basis for the query classification, we consider the two systems as based on a third model, the Graph Model introduced in (Catarci, Santucci & Angelaccio, 1993). It has been shown in (Catarci, Santucci & Angelaccio, 1993) that a mapping exists between the Relational model and the Graph Model, and in (Catarci & Santucci, 1992) that there is a trivial one between the ER model and the Graph Model. The Graph Model allows one to define a database D in terms of a triple , where g is a Typed Graph, c is a set of suitable Constraints, and m is the corresponding Interpretation. The schema of a database is represented in the Graph Model by the Typed Graph and the set of Constraints. The instances of a database are represented by the notion of Interpretation. The set c may be specified by using a logic-based constraint language, which is intended to be exploited by the system designer in order to specify constraints on and meaningful properties of the nodes represented in the Typed Graph. A node in a Typed Graph can be either a class-node representing a class of objects or a role-node representing a relationship between two classes, whereas an edge connects a class-node to a role-node. Moreover, class-nodes are partitioned into two subsets: the set of printable class-nodes and the set of unprintable nodes. The former represent sets of objects that are actually values of distinguished domains (integers, reals, characters, etc.), whereas the latter represent sets of objects denoted simply by object identifiers. We call path in a Typed Graph a sequence of adjacent class-nodes and role-nodes always starting and ending with a class-node. Given a path it is possible to associate with it a multivalued function which maps every object belonging to the Interpretation of the first class-node to a set of objects of the last class-node of the path. Note that the definition of path does not exclude the presence of cycles.
Tiziana Catarci & Giuseppe Santucci
10
The formal characterization of the user's queries for both systems is given in terms of suitable subgraphs of the Typed Graphs representing the database schemata (Catarci, Santucci, & Angelaccio, 1993; Catarci & Santucci, 1992). As a consequence, we lay the basis of our classification on the analysis of those features of such subgraphs which relate to the “user oriented” query characteristics. In particular, we exploit the notion of path for specifying the two coordinates the classification is based on, that is: • the maximum semantic distance of the paths involved in the query; •
the overall number of cycles of the paths involved in the query.
The concept of semantic distance is introduced for ordering the importance of user’s access paths. The semantic distance, formally introduced in (Massari, 1995), is a mathematical function modeling both the meaningfulness and the expected utility of a path for the composition of a query. The meaningfulness of a path can be regarded as the difficulty a user encounters in understanding the path meaning. The expected utility measures "how likely" the path will be used in the query. More precisely, let P be the set of well formed paths on a Typed Graph, a semantic distance function SD: P->R is a function mapping every path p ∈ P into a real number SD(p). Given a path p it is possible to derive numerical attributes, called features of the path, each one representing a characteristic useful for modeling the semantic distance function. In (Massari, 1995) several features have been introduced and motivated. The most significant are: the length of the path, the number of printable class-nodes included in the path, and the global maximum cardinality of the path, i.e. the product of all the maximum cardinalities met during the path. These three features have been also used in the present work. For the sake of simplicity, the codomain of the semantic distance function has been partitioned into four ranges, namely L, M, H, VH. In the L range, we have included those queries whose paths have a length less than or equal to one, no printable inclusion and a maximum cardinality of 1 (single valued functions). In the M range we put the paths with a length up to 3, still single valued and no printable inclusion. The H range has been derived from the M range by relaxing the constraint of single valueness of the function still maintaining no printable inclusion. Finally the VH range includes all the paths with printable inclusion. As for computing the second classification coordinate, we recall from (Massari, 1995) that each path is associated with a function. Such a function is determined by first declaring for each class-node of the path an existentially quantified variable ranging on the class-node interpretation, and then specifying a set of connection conditions between pairs of consecutive variables, each pair being forced to be included in the interpretation of a role-node.
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
11
N
Now, we define the number of cycles in a path p as a function C(p): P-> . In particular, C(p) = length(p) - number of class-nodes in the path +1, where length(p) is the length of the path p in terms of number of role-nodes included in the path. Referring to the path function, note that when the number of cycles is greater than zero it follows that there exist at least two existentially quantified independent variables ranging on the interpretation of the same classnode. As a consequence, a query whose formulation requires cyclic paths forces the user to deal with different references to the same class-node.
4 The Experiment The overall objective of our study is to measure and understand the comparative ease with which subjects can construct queries in either QBD* or SQL. More specifically, we are interested in three questions: 1) given a query in natural language, how easily can a user express it by using either QBD* or SQL; 2) given different types of queries, is there significant interaction between the query-type and the query language used; and 3) given different levels of skill is their significant interaction between skill level and query language used? The experiment is designed to determine if there is significant interaction between: 1) the query class and the query language type; and 2) between the type of query language and the experience of the user.
4.1 Participants The users involved in the experiment were undergraduate students, secretaries and professionals for a total of 104 people. They were classified into the three categories introduced in Section 3: 1) Naive users for an amount of 58 people; 2) Intermediate users for an amount of 30 people; 3) Expert users for an amount of 16 people. The subjects were assigned to one of two groups: one involved in QBD* tests, the other one in the SQL tests. Each group was composed by 26 non-experts, 15 intermediates, and 8 experts. It is worth noting that most users were naive, being the QBD* environment mainly thought for them. Nevertheless, in order to get results also on intermediate and expert users, we analyzed the performance of each category separately, as explained in Section 4.3.
Tiziana Catarci & Giuseppe Santucci
12
4.2 Procedure A pretest as well as a background questionnaire were administered to all subjects.
The
purpose of the pretest was to determine each subject's level of knowledge and understanding of the database query languages. The subjects were then assigned to one of two treatment conditions. Level of experience with query languages and pretest performance scores were used to equate the two groups. Next all subjects underwent a short training course of equal times in using the environments. The SQL course was about the relational model and the SQL language, the QBD* course was about the Entity-Relationship model and the usage of the QBD* environment. In this way the users were totally acquainted with the query language, the underlying model, and, in the QBD* case, also with the environment implementation. In this way, during the final test, they were free of concentrating exclusively on the query formulation. From a technical point of view, courses were set on a PC based environment and arranged in a self-learning fashion. Each user was provided with a floppy containing her/his course and was left free to study it. Several check points insured us of the correct learning of the subjects. After the training session, each subject was presented with a set of queries of different levels of complexity in natural language and was asked to construct these queries in one of the two query language environments. The order in which subjects were given the queries (closeuncyclic, far-uncyclic, far-cyclic, veryfar-uncyclic) was counterbalanced across subjects in order to minimize the learning effect. Basically, the close-uncyclic (CU) queries consisted of selections on single relational tables (single entities in QBD*), with different kinds of selection conditions: simple positive, simple negative, multiple in conjunction, multiple in disjunction. The far-uncyclic (FU) and veryfar-uncyclic (VFU) queries comprised join operations (path navigations in QBD*) with different join and selection conditions. Far-cyclic queries contained self-joins (paths involving twice the same concept in QBD*).
4.3 The Experiment Results Time in seconds to complete a query and accuracy of query completion were used as the two performance measures. For each group of users we report the results obtained for the corresponding classes of queries (close-uncyclic and far-uncyclic for naive users, the same plus veryfar-uncyclic queries for
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
13
intermediate users, all the four query classes for the experts). Figure 4.1 shows the results on query accuracy for both the SQL and QBD* user groups, while 4.2 is for the analogous results on time completion. SYSTEM USERS
QUERY CLASS
SQL
QBD*
NAIVE
CU
85
100
FU
89
100
CU
85
100
FU
84
100
VFU
77
95
CU
95
100
FU
99
100
VFU
88
95
FC
97
93
INTERMEDIATE
EXPERT
SQL QBD
100 90 80 70 60 50 40 30 20 10 0 CU
FU
Naive
CU
FU
VFU
Intermediate
CU
FU
VFU
FC
Expert
Figure 4.1: Accuracy of queries per user category.
4.4 Discussion The hypothesis that a graphical query language would perform better than SQL was confirmed by the experiment results.
Tiziana Catarci & Giuseppe Santucci
14
QBD* shows good results both on the accuracy and on the response time with respect to the semantic distance of the paths involved in the query expression. On the other hand, whenever queries contain cycles, expressing them using QBD* is slightly less accurate than in SQL. USERS
QUERY CLASS
NAIVE
INTERMEDIATE
EXPERT
SYSTEM SQL
QBD*
CU
76
39
FU
121
87
CU
43
31
FU
78
59
VFU
175
75
CU
38
28
FU
76
57
VFU
119
76
FC
216
125
SQL QBD
25 0
20 0
15 0
10 0
50
0 CU
Naive
FU
CU
FU
VFU
Intermediate
CU
FU
VFU
FC
Expert
Figure 4.2: Time completion per user category. The reason why QBD* users perform better when expressing queries characterized by a high semantic distance could very well be that paths in QBD* are displayed on the screen as part of the database schema, and s/he has only to follow them. Moreover, the user has a total control on the path specification, since it is always possible to backtrack and, once the path
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
15
has been completed, to change the conditions which have been previously specified for each single step. Considering the queries containing cycles, we can say that whenever a query involves two attributes belonging to the same concept the QBD* users get a bit confused. Our explanation resides mostly in the way in which the system presents the attributes to the user. Indeed, the attributes of any concept (entity or relationship), are represented as labels grouped into a form associated with the concept. As a consequence, when the query is cyclic, the user sees more copies of the same form, belonging to the different occurrences of the same concept met along the path. In order to associate each form with the right concept occurrence, the system automatically adds a number to the attribute labels. However, we noticed that, even using numbers, most users are still confused, not clearly understanding the meaning of the replicated labels. Referring to QBD*, we noticed that some veryfar-uncyclic queries involving the use of the “OR” connective between the selection conditions, gave the most of the problems to the users. This is certainly due to the fact that the system makes use of some default assumptions which can be modified by the user. In this case the default consisted in considering all the selection conditions in conjunction. This means that if the user is interested in using the “AND” connective s/he does not need to specify it, while s/he should do it for the “OR”, but s/he probably tends to forget this different treatment of the two connectives. This also happens with another system default, namely the fact that the system automatically includes the main concept attributes in the final result, while the attributes of the other concepts selected along the path have to be explicitly added. We noticed that the users tended to forget to include the attributes not belonging to the main entity in the final result. In general, the use of the default does not help the user, on the contrary, it introduces more error possibilities. On the other hand, even in case of complex queries, the users never did “conceptual” errors, such as a wrong choice of the main concept or of the path to be followed. It is interesting to note that the naive users performed surprisingly well in both languages. Moreover, referring to SQL, their accuracy was even better than the intermediate users’ one. This is probably due to the fact that the intermediate users were mainly acquainted with some types of procedural languages, then it was difficult for them to learn a declarative language such as SQL. Moreover, they felt themselves confident in solving so easy questions, as a consequence they did not put enough attention in doing it. On the other hand, the naive users were completely unaware of programming techniques, so their mind was free of influences in
Tiziana Catarci & Giuseppe Santucci
16
learning a certain kind of language. Furthermore, even expert users did small errors in simple questions. Such errors were mainly due to the need of remembering table names and using a precise syntax. This confirms our believes that even experts can profit from a more flexible query language. In summary, we can say that QBD* was superior to SQL in many respects: -
it offers a complete view of the database at a higher abstraction level, which has been particularly appreciated by the users; the syntax is extremely simple; the user does not need to remember table or attribute names, but just to choose them navigating on the schema;
-
the time needed to directly manipulate objects on the screen is less than the time used for writing SQL statements;
-
the error rate is reduced (but we understand from the experiments that we have to improve the default mechanisms); the users find QBD* more attractive, so they are not bored by working with it.
-
5 Conclusions and Future Work This paper describes an experiment aiming at estimating the advantages (or disadvantages) of using a visual query language instead of a traditional one. In order to do this we compared QBD*, a visual query system based on the direct manipulation of Entity-Relationship diagrams, against the well-known SQL language. The results of the experiment were in favor of QBD*. However, we have to note that QBD* not only adopts direct manipulation mechanisms and a diagrammatic representation of the database, but also an external data model, the Entity-Relationship model, conceptually more expressive than the relational model. This choice was done on purpose, because we believe that the model of the database information presented to the user should be close to his/her mental model of the reality of interest, and far from the database model. So, we think that user interfaces to database systems can gain from the adoption of effective external models and metaphors, expressive visual representations, friendly interaction mechanisms. We recall that the data model and its visual representation are two distinct aspects of the interface and different mappings can be established among them (see (Batini et al., 1993), (Haber, Ioannidis & Livny, 1994), and (Catarci, Costabile & Matera, 1995)), but it is surely a good choice to try to improve both of them.
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
17
Our future work along these lines will be the comparison of two visual query systems based on the same conceptual model, but proposing different visual representations, namely iconic and diagrammatic, and interaction mechanisms. We have two system prototypes and already run the experiment. At the present we are analyzing the experiment results.
Acknowledgments We are deeply grateful to Carlo Zerbo and Emanuele Mioni, who assisted us in designing and realizing the experiments. Also, we wish to thank Albert Badre for his helpful comments on a previous version of this paper.
References Ahlberg, C. Williamson, C. & Shneiderman, B. (1993), “Dynamic Queries for Information Exploration: an Implementation and Evaluation”, in “Sparks of Innovation in HumanComputer Interaction”, B. Shneiderman (ed.), Ablex Publ., Norwood, NJ, pp. 281-294. Angelaccio, M., Catarci, T. & Santucci, G. (1990a), “QBD*: A Fully Visual Query System”, Journal on Visual Languages and Computing, 1(2), 255-273. Angelaccio, M., Catarci, T. & Santucci, G. (1990b), “QBD*: A Graphical Query Language with Recursion”, IEEE Transactions on Software Engineering, 16(10), 1150-1163. Badre, A.N. (1993), “Methodological Issues for Interface Design: a User-Centered Approach”, Technical Report DI/DS - 93/01 of Dipartimento di Scienze dell’Informazione, University of Rome "La Sapienza". Batini, C., Catarci, T., Costabile, M.F. & Levialdi, S. (1991), “Visual Query Systems”, Technical Report N.04.91 of Dipartimento di Informatica e Sistemistica, University of Rome "La Sapienza". Batini, C., Catarci, T., Costabile, M.F. & Levialdi, S. (1993), “On Visual Representations for Database Query Systems”, in Proceedings of the Second International Conference Interface to Real & Virtual Worlds, EC2 Ed., Montpellier, France, pp. 273-283. Bevan, N. & Macleod, M. (1993), “Usability Assessment and Measurement”, in "The Management of Software Quality", M. Kelly (ed)., Ashgate Technical/Gower Press. Catarci, T. (1991), “On the Expressive Power of Graphical Query Languages”, in E.Knuth & L.M. Wegner (eds.), Proceedings of the 2nd IFIP W.G. 2.6 Working Conference on Visual Databases, Budapest, North Holland, pp. 404-414.
Tiziana Catarci & Giuseppe Santucci
18
Catarci, T. & Santucci, G. (1992) “Graphical Primitives for Querying Heterogeneous Databases”, in “Information Modelling and Knowledge Bases IV”, H.Kangassalo et al. (eds.), IOS Press, (Proceedings of the Second European-Japanese Seminar on Information Modelling and Knowledge Bases, Finland, 1992), pp. 106-125. Catarci, T., Santucci, G. & Angelaccio, M. (1993) “Fundamental Graphical Primitives for Visual Query Languages”, Information Systems, 18(2), 75-98. Catarci, T., Chang, S.K., Costabile, M.F., Levialdi, S. & Santucci, G. (1995) “A Visual Interface for Multiparadigmatic Access to Databases”, IEEE Transactions on Knowledge and Data Engineering, to appear. Catarci, T., Costabile, M.F. & Matera, M. (1995) “Which Metaphor for Which Database?”, in these Proceedings. Chandra, A.K. (1988) “Theory of Database Queries”, in M. Yannakakis (ed.), Proceedings of the ACM Symp. on Principles of Database Systems, pp. 1-9. Chen, P.P. (1976) “The Entity-Relationship Model toward a Unified View of Data”, ACM Transactions on Data Base Systems, 1(1), 9-36. Codd, E.F. (1972) “Relational completeness of database sub-languages”, in “Data Base Systems”, R.Rustin (ed.), Prentice Hall, Englewood Cliffs, pp. 65-98. Cruz, I.F. (1992) “DOODLE: A Visual Language for Object-Oriented Databases”, in P.C. Kanellakis (ed.), Proceedings of the ACM SIGMOD Conf. on Management of Data, pp. 71-80. Cuff, R. N. (1980) “On Casual Users”, International Journal of Man-Machine Studies, 12, 163-187. Date, C.J. (1987) “An Introduction to Database Systems - Vol.I”, Addison-Wesley Publishing Company. Haber, M.E., Ioannidis, Y.E. & Livny, M. (1994) “Foundations of Visual Metaphors for Schema Display”, Journal of Intelligent Information Systems, Special Issue on "Advances in Visual Information Management Systems", 3(3/4), 1-38. ISO (1991) “ISO 9126: Software product evaluation - Quality characteristics and guidelines for their use”. ISO (1993) “ISO DIS 9241-11: Guidelines for specifying and measuring usability”. Jarke, M.& Vassiliou, Y. (1985) “A Framework for Choosing a Database Query Language”, ACM Computing Surveys, 17(3), 313-340. P.C., Kanellakis (1990) “Elements of Relational Theory”, in “Handbook of Theoretical Computer Science”, J.van Leuween (ed.), Elsevier Science Pub, pp. 1073-1156. Macleod, M. (1992) “An Introduction to Usability Evaluation”, NPL Report DITC 102/92.
Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof
19
Maissel, J. ,Macleod, M., Dillon, A., Thomas, C., Rengger, R., Maguire, M., Sweeney, M. & Corcoran R. (1993) “Context guidelines handbook”, Version 2.1, National Phisical Laboratory, DITC, Teddington, UK. Massari, A. (1995) “An Iconic Query System for Large Medical Databases”, Ph.D thesis, Dipartimento di Informatica e Sistemistica, University of Rome "La Sapienza". Miller, R.B. (1971) “Human Ease of Use Criteria and their Tradeoffs”, IBM Report TR 00.2185, 12 April. Poughkeepsie, NY. Nielsen, J. (1993) “Usability Engineering”, Academic Press, San Diego, CA. Reisner, P. (1988) “Query Languages”, in “Handbook of Human-Computer Interaction”, M . Helander (ed.), Elsevier Science Publ., pp. 257-280. Santucci, G. & Sottile, P.A. “Query By Diagram: a Visual Environment for Querying Databases”, Software Practice and Experience, 23(3), 317-340. Shackel, B. & Richardson, D.J.(1991) “Human Factors for Informatics Usability”, Cambridge University Press. Shneiderman, B.(1978) ”Improving the Human Factors Aspect of Data Base Interactions”, ACM Transactions on Database Systems, 3(4), 417-439. Shneiderman, B. (1980) “Software Psychology”, Winthrop, Cambridge, Mass. Shneiderman, B. (1983) “Direct Manipulation: A Step beyond Programming Languages”, IEEE Computer, 16, 57-69. Spaccapietra, S. (1994) “User Interfaces; Who Cares?”, in J.Bocca, M.Jarke, C.Zaniolo (eds.), Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann, p.751. Thomas, J.C. (1977) “Psychological Issues in Data Base Management”, in Proceedings of the 3rd International Conference on Very Large Data Bases, Tokyo, Japan.