Query Representation and Management in a ... - CiteSeerX

8 downloads 10299 Views 115KB Size Report
To formulate such queries, the user focuses on one or more classes of objects, selects a subset of ..... additional information on the mechanics that repair cars.
Query Representation and Management in a Multiparadigmatic Visual Query Environment TIZIANA CATARCI*, SHI KUO CHANG° & GIUSEPPE SANTUCCI* *Dipartimento di Informatica e Sistemistica Università di Roma "La Sapienza" Via Salaria, 113 - 00198 Roma Italy [catarci/santucci]@infokit.dis.uniroma1.it °Department of Computer Science University of Pittsburgh Pittsburgh - PA 15260 [email protected]

Abstract. We propose a framework for database querying providing the user with several interaction paradigms based on different (i.e., form-based, diagrammatic, iconic, and hybrid) visual representations of the database. A unified model, namely the Graph Model, is used as the common underlying model, in terms of which databases expressed in the most common data models can be easily converted. Graph Model databases can be queried by means of the multiparadigmatic interface. The semantics of the query operations is formally defined in terms of the Graphical Primitives. Such a formal approach enables the query manager to maintain the same query consistently in any representation. In the proposed multiparadigmatic environment, the user can switch from one interaction paradigm to another during query formulation, so that the most suitable query representation can be found. Keywords: visual query, multiparadigmatic user interface, data model, query consistency

1 . Introduction Information systems have evolved continuously so that an ever increasing amount of processed data are provided both to computer experts and to a larger community of people. To access and manage data for such complex information systems, databases are designed, created and possibly modified by professional people. However, there are several different kinds of users whose job requires them to access databases frequently for extracting information. Towards this aim, special purpose languages, called query languages, have been defined. Recently, the growth of the class of database users, including more and more non-expert and casual users, has motivated the development of easy-to-use query languages that are part of new, friendlier interfaces to databases. Visual Query Languages (VQLs) are defined as query languages essentially based on the use of visual representations to depict the domain of interest and express the related requests. As a consequence, also the intrinsecally non-visual information (such as traditional databases) is treated as visual, since a visual representation is proposed for it, and visual, direct-manipulation mechanisms for realizing the querying facilities are provided. Systems implementing a visual query language are called Visual Query Systems (VQSs) (a survey is in [Batini, et al. 1991a]). They include both a language to express the queries in a pictorial form and a variety of functionalities to facilitate man-machine interaction and data visualization. In recent years, many VQSs have been proposed in the literature. However, they usually adopt single predefined visual representations and interaction modalities, without allowing the user to tailor them according to her/his preferences and needs. On the contrary, the presence of several paradigms, each one with different characteristics and advantages, will help both naive and experienced users in interacting with the system. For instance, icons may be the better way to Research partially supported by the EEC under the Esprit Project 6398 "VENUS".

1

evoke the objects present in the database, while relationships among them may be better expressed through the edges of a graph, and collections of instances may be easily clustered into a form. The way in which the query is expressed depends on the visual structure as well. In many existing VQSs icons can be combined following a spatial syntax [Chang, 1990], while queries on diagrammatic representations are mainly expressed by following links, and forms are often filled with prototypical values [Batini, et al. 1991b]. In this paper we aim to improve the visual querying facilities of database systems, overcoming the limitations inherent in a single visual representation and interaction mechanism to address the distinct needs of the various classes of users and their different tasks. At this aim, we propose a framework providing the user with a multiparadigmatic query interface. That is, an interface in which different visual representations and interaction modalities are available, together with the possibility of switching between them, to exploit the advantages embedded in the different representations and reducing the existing drawbacks. This is profitable for both the single user, who can utilize different visual representations depending on the kind of query s/he wants to express and can change visual representation dynamically during query formulation, and for different classes of users who are provided with interfaces adequate to their skill and needs (see [Chang, et al. 1992] ). Moreover, while existing VQSs are limited to databases expressed in a single model, in our approach we propose to use a common underlying model, namely the Graph Model (GM) [Catarci and Santucci, 1992; Catarci, et al. 1993] , which is powerful enough to map the databases expressed in the most common data models in terms of its own databases. Graph Model DataBases (GMDBs) can be queried by means of the proposed multiparadigmatic interface. Obviously the user may be unaware of both the nature of the original data model underlying the database s/he is accessing and the existence of a model itself. Furthermore, the selection of the appropriate interaction modality can be made according to a "user model" [Chang, et al. 1992; Catarci, et al. 1993] . The semantics of the query operations performed in the various visual representations is formally defined in terms of the Graphical Primitives (GPs) [Catarci, et al. 1993] . Such a formal approach leads to the concept of "atomic query", which is the minimal portion of a query that can be translated from one representation to another and computed by the system. In principle, the user can access the database by means of any interaction modality, independently from the original data model. However, since certain representations are more suited to certain data models and certain interaction modalities are more suited for certain query typologies, the system can help the user by suggesting the more appropriate visual representation and interaction modality. In order to allow the user to switch among the paradigms, while still obtaining a consistent result, the expressive power of the different query representations is mutually coherent. In particular, we concentrate on the class of conjunctive (select-project-join) queries, which abstract a set of requests commonly made by casual users, so being utilized in most cases by the majority of users [Chandra and Merlin, 1977; Jarke and Vassiliou, 1985; Chandra, 1988] . To formulate such queries, the user focuses on one or more classes of objects, selects a subset of their instances and properties, and reaches other classes through shared properties. Every query representation provided in our multiparadigmatic interface allows one to express at least the conjunctive queries, obviously without the user be acquainted on the fact that is expressing a query belonging to a certain class. Note that this does not prevent us to take into account certain particular visual representations and interaction paradigms which permit to express queries other than conjunctive. Indeed, not all the visual representations have been recognized to be equally effective for representing certain types of operators. We believe that it is unfruitful to propose unnatural representations, so that, while in principle it is possible to represent everything through every visual formalism, this does not give necessarily help to the user if the representation does not conform to her/his mental model. The choice of conjunctive queries is motivated also by the fact that such a class is the largest one that can be effectively expressed in all kind of visual representations.

2

It is worth noting that some existing VQSs have higher expressive power than conjunctive queries (see, e.g., [Angelaccio, et al. 1990; Consens and Mendelzon, 1990; Cruz, et al. 1988; Paredaens, et al. 1991, Cruz, 1992]). However, while the main goal of the above papers is to present a single visual query language based on a specific visual representation, the main strength of the present work is to provide the user with several visual representations and the possibility of switching among them, always maintaining the query consistency. Preliminary and partial results on the proposed adaptive framework appeared in [Catarci, et al. 1992; Catarci, et al. 1993a; Catarci et al. 1993b] . In this paper we refine and extend such results, in particular by concentrating on the translation algorithms used for switching from one representation to another and on the properties such algorithms should have in order to maintain the query consistency. The paper is organized as follows. In Section 2 we describe the overall structure of our proposed framework and present several examples of user-system interaction. In Section 3 we review the definition of the Graph Model and Graphical Primitives. In Section 4 we formalize the different visual representations and interaction paradigms, and describe the translation algorithms used for switching from one representation to another, proving their consistency through the introduction of the concept of Atomic Query. Finally, in Section 5, we draw some conclusions.

2 . The Multiparadigmatic Approach The aim of this section is twofold. Firstly, we illustrate the overall architecture of the system. Then, we describe how the user can interact with such a system, presenting the main ideas and giving several detailed examples (the formal basis underlying our approach will be described in Sections 3 and 4).

2.1 The System Architecture In Figure 2.1 we show the architecture of the multiparadigmatic system. Such a system consists of a Visual Interface Manager, a User Model Manager, a GMDB & Query Manager, and one or more Database Management Systems. The kernel of the system consists of the three managers that are cooperating processes. In this paper we concentrate on the Visual Interface Manager and in particular on its algorithmic aspects. Figure 2.1: The system architecture The Visual Interface Manager is the module that handles the different visual representations associated with the GMDBs and the corresponding interaction modalities. The system suggests the interaction modality most appropriate to the user on the basis of the stored user model. At any moment, the user has the freedom of shifting to any one of the available interaction modalities. In order to maintain the consistent state of the query, the switching is permitted only when the query has an unambiguous meaning. This is constantly verified by the system, which allows the user to change representation only when her/his query is atomic (see Section 4). The check of the query atomicity is transparent to the user. Furthermore, if the user tries to change representation when it is not allowed, the system suggests to her/him several alternatives to change the query into an atomic query. The User Model Manager is responsible for collecting data and maintaining a knowledge-base of the user model components. The components are: the class stereotype, which is based on the user classification scheme presented in [Batini, et al. 1991a] ; the user signature, i.e., a compressed history of the interaction of the user

3

with the system expressed in terms of the set of queries s/he formulates; and the system model, i.e., the knowledge of the user about the database content and structure. It is worth noting that the class stereotype and the user signature components taken together constitute what is usually called the individual user model in the literature. At the bottom of Figure 2.1, different databases structured according to several data models are shown. Each database is translated into a GMDB by the GMDB & Query Manager, using the mappings defined in [Catarci, et al. 1993] , available for the most common data models. The GMDBs are queried through the above interface. The semantics of the query operations is formally defined in terms of the Graphical Primitives. Finally, the GMDB & Query Manager translates a set of Graphical Primitives, forming a GM query, into a corresponding DBMS query.

2.2 User-system Interaction It is our opinion that, in order to improve the communication with users of any type, a query system should provide multiple representations for both databases and queries performed on them, thus allowing different interaction paradigms. For instance, icons may be most suitable to evoke the objects in the database, while relationships among them may be better expressed through the edges of a graph, and collections of instances may be easily clustered into a form. This is the approach we propose in our system, in which the most appropriate interaction paradigm is automatically suggested to the user who has, at any moment, the freedom of overriding the suggested paradigm and choosing any other one. In other words, the system adapts its interface to the user, depending on the available model of that user, in which her/his interests and skills are stored. The user model is initially generated by the system and then dynamically maintained according to the history of the interactions, i.e., both queries and user reactions to system messages. The user model is particularly useful for interacting with a distributed database system, because these databases are queried by users with very different interests, either experts or non-experts, requiring information at various levels of abstraction depending on their particular motivations. The "switching" among different interaction paradigms is achieved by using several windows that are simultaneously present, thus exploiting the spatial differentiation. In the beginning, the main window contains the visual representation and the interaction paradigm that the system evaluates to be the most appropriate for the current user, and in the subsequent interaction the main window always displays the representation that is currently in use. The smaller windows show different visual representations and interaction paradigms (see Figure 2.2). Figure 2.2: A sample display It is worth noting that although we consider a minimal set of basic representations, i.e., diagrammatic, iconic, and form-based, our approach can be extended to include hybrid representations, constituted by various representations combined together. In Section 2.3 we will show detailed examples of user-system interaction. In order to do that, we first need to introduce the Graph Model, i.e., the internal model used by the system, and the available set of basic visual representations. In accordance with the aim of this section, we will proceed first through a simple example; a more formal definition of the Graph Model will be presented in Section 3.

4

The following example refers to a database containing information about employees, persons, cars, and relationships among them. A diagrammatic representation of such a database is shown in fig. 2.3. Figure 2.3: The diagrammatic representation The diagrammatic representation reflects the structure of the Graph Model: we have nodes representing classes of objects (Person, Employee, and Car), nodes representing elementary domains of interest (Integer, Date, and String), nodes relating classes to either elementary domains (Salary, Age, Name, and Plate) or other classes (Owns). Moreover, ISA relationships are expressed through arrows (Employee ISA Person). Cardinality constraints are expressed using pairs of numbers between brackets. In particular, the cardinality constraints in Figure 2.3 say that a Person has one and only one Age, and a Car is owned by one and only one Person. We would like to point out that this representation is only an example of diagrammatic representation of the Graph Model; we could have used different and equally expressive diagrammatic representations. The same holds for the other representations we are going to introduce. The main visual operations the user can perform in the diagrammatic paradigm are the selection of nodes (in particular, a double node selection corresponds to including the node in the final result) and the drawing of edges. For instance, let us assume the user query is "list the name of all the employees owning a car". To express such a query, the user has only to select the nodes Employee, Owns, and Car and doubly select String and Name, giving rise to the diagram shown in Figure 2.4. The form-based representation of the same database is shown in Figure 2.5. In this representation the whole database is represented through a single form, where the information belonging to a class is spatially clustered, allowing a quick location of all the attributes of the same class. On the other hand, the structure presented to the user is in some sense less rich than the diagrammatic one. For instance, the ISA relationship between Employee and Person is not yet available (it can be found in a separate form, as described in Section 4); moreover, the inheritance property has been made explicit by duplicating the relationships Owns, Name, and Age. Figure 2.4: The example query expressed through the diagrammatic representation Figure 2.5: The form-based representation The main operations the user can perform in the form-based paradigm are the selection of a header and the filling of an empty cell with a string. Let us consider again the above simple query. In order to express it in the form-based representation, the user has to select the headers Employee, Name, String, Owns and Car and write the string "P." (which stands for "print") in the empty cell under Name. Note that no ambiguity arises from the duplicated headers, because of the spatial relationships imposed by the form structure. In particular, the user selects the headers Name and Owns adjacent to Employee. In Figure 2.6 the query expressed in the form-based representation is shown. Figure 2.6: The example query expressed through the form-based representation Figure 2.7: The iconic representation The iconic representation of the example database is shown in Figure 2.7. Each icon represents a node of the Graph Model, yielding a visual pattern directly corresponding to the concepts the user wants to select. Icons are

5

particularly effective in resembling objects of the real world having an intrinsic visual representation. On the other hand, they are less adequate when representing the relationships among objects. To overcome such a limitation we use boxes to represent links among icons. Each box is characterized by an owner icon, which identifies it. Moreover, arrows between icons represent ISA relationships. To allow the user to focus on the desired subset of concepts, the iconic representation is provided with a screen region called query area. In order to select a concept, the user has to drag the related icon into the query area; a further selection of an icon within the query area results in including the corresponding domain in the final result. In order to express the example query, the user has to drag into the query area the box corresponding to Employee, Owns, Car, Name, String and to select the pair (see Figure 2.8). Figure 2.8: The example query expressed through the iconic representation In the above considerations, we analyzed the different visual representations independently on each other. In Section 2.3 we will show how the system is able to switch among them, allowing the user to choose the visual paradigm that best fits her/his needs. Moreover, if there is a query in progress, the system translates it into the new paradigm, saving the user's previous work. It is worth recalling that the paradigm switching while performing a query is sometime forbidden: indeed, the switching is allowed only if the query is atomic. The precise definition of atomic query will be given in Section 4.4; here we only note that the system can alert the user with a green light when the switching is possible. If the switching is not allowed (red light), the system suggests the user how to complete the query in order to make it atomic.

2 .3 Detailed Examples of User-System Interaction Let us consider several more complex queries. Example 1. The query Q1 is: "List the name and the age of all the employees younger than 30 and whose salary is greater than $ 100,000". The user initially interacts with the diagrammatic paradigm as shown in Figure 2.9a, where the user has selected several nodes and expressed the first condition (younger than 30). S/he now decides to switch to the iconic paradigm. However, the system does not allow the switching (notified to the user by the upper red light of the traffic-light icon) since the query is not yet atomic. In fact, the user has not specified the structure of the result (no role-node is set to "displayed"), thus violating one of the atomicity conditions (see Section 3, Lemma 1, and the definition of atomic query in Section 4.4). Note that, on the basis of the user selections, it is impossible to determine in an automatic way the action to be performed to produce an atomic query. In particular, since the class-node Integer is displayed, some role-node linked to Integer should appear in the final result, but the system does not know which one between Age and Salary. After the suggestion of the system on the possible ways for completing the query (i..e., doubly selecting either one of the role-nodes Salary and Age or both) the user doubly selects the node Age, so including it in the final result and making the query atomic. Note that having an atomic query also means to be able to precisely determine the intensional and extensional parts of the query result. In our example, the query specified up to now retrieves the list of the ages of the employees younger than 30. Figure 2.9b shows the state of the iconic representation resulting from the switching. In figure 2.9c the query is completed; Figure 2.9d shows the same query expressed in the form-based paradigm, which seems to be the most suitable for this kind of query (see [Batini, et al. 1991b; Costabile, et al. 1992] ).

6

Figure 2.9: The query Q1: (a) diagrammatic paradigm; (b-c) iconic paradigm; (d) form-based paradigm Example 2. Consider the query Q2: "List the name of all the employees whose salary is greater than the salary of their managers". We extend the initial database with information about the employees' managers (see Figure 2.10). Figure 2.10: The database of Query 2 We show the query expressed in the iconic paradigm in Figure 2.11a and in the diagrammatic paradigm in Figure 2.11b. Figure 2.11: The Query Q2: (a) iconic paradigm; (b) diagrammatic paradigm Example 3. In the following we show an example based on the combination of the iconic and diagrammatic paradigms. The main idea is to take advantage of both diagrams in representing links and icons in resembling real word objects. This interaction scenario is based on an extension of the database introduced in fig. 2.4 with additional information on the mechanics that repair cars. Since the representation of the Typed Graph nodes is made through icons directly linked by edges, in the following we use the terms icon and node interchangeably. In addition to the visual mechanisms available in the iconic representation, several operations are at the user disposal. In particular, the open operation allows the user to "open" an icon and to access all its characteristics, i.e., the role-nodes linked to the icon itself. Moreover, through the zoom-in (zoom-out) operation, the user causes the system to display the hierarchy tree of an icon in order to choose one of the child (parent) nodes. Finally, the find-path operation provides the user with the list of all the existing paths between a chosen pair of icons. The user query Q3 is the following: "Find the name and the address of all the mechanics that repaired one of the John Smith's cars". The user drags in the query area the Person icon (Figure 2.12a). Successively, s/he performs a zoom-in operation on such an icon (Figure 2.12b), and selects the Mechanic icon as well. Furthermore, s/he asks for all the paths connecting mechanic and person (Figure 2.12c). Finally, the user opens the icons of Person and Mechanic, specifies the selection condition, and chooses the properties s/he wants to see in the final result. This example illustrates the advantages of the hybrid approach, which can be supported by our system combining several paradigms. Figure 2.12: Query expression in the hybrid approach: (a) the initial display; (b) zoom-in operation; (c) pathsearch operation; (d) open operation and condition specification

3 . The Graph Model and the Graphical Primitives In this section we review the basic concepts underlying the Graph Model and the Graphical Primitives, as described in [Catarci, et al. 1993] . We first introduce the syntax and the semantics of the Graph Model in terms of Typed Graph and Interpretation. Then, we describe the basic graphical query primitives associated with the Graph Model. Finally, we introduce the result database D @ , a particular GMDB that can be automatically built from any GMDB D and contains all and only the information requested by the user by applying a set (possibly empty) of Graphical Primitives on D.

7

3.1. The Graph Model The Graph Model allows us to define a GMDB D in terms of a triple , where g is a Typed Graph, c is a set (possibly empty) of suitable Constraints, and m is the corresponding Interpretation. The schema of a database, i.e., its intensional part, is represented in the Graph Model by the Typed Graph and the set of Constraints. The instances of a database, i.e., its extensional part, are represented by the notion of Interpretation. A Typed Graph g is a tuple: g= < N, E,

l 1, l2, f1 , f1, f3 >, where N is the set of nodes, divided into NC,

the set of class-nodes, and N R , the set of the role-nodes. Moreover, N C is partitioned into N Cp , the set of printable class-nodes, and NCu, the set of unprintable nodes. E is the set of edges; 1 and 2 are the sets of node and edge labels ( 2 includes a special label T, corresponding to the true value); f1 and f2 are functions

l

associating nodes and edges with labels in

l

l

l 1 and l2 respectively; finally, f3 is a total function mapping each

node to a value in {unselected, selected, displayed}. In our model several types of constraints may be specified by means of a Constraint Language, which allows for representing the basic features of the most widely diffused models (see [Catarci, et al. 1993] for more details). A suitable subset is represented by the ISA and cardinality constraints. Such constructs are graphically represented in a Typed Graph as shown in Figure 2.3, i.e., an arrowhead edge for the ISA relationships, and a pair of numbers between brackets for the cardinality constructs1. Moreover, we indicate with circles the role-nodes, with squares the unprintable class-nodes, and with rounded rectangles the printable class-nodes. Finally, let us turn our attention to the notion of Interpretation, which is used for characterizing the instances of the database. An Interpretation for a Typed Graph g is a function mapping the printable class-nodes of g to a subset of the set of elementary printable values, the unprintable class-nodes to a subset of the set of elementary unprintable values, and the role-nodes to a subset of a set of structured objects, defined as the smallest set containing the set of elementary values and all the possible labeled tuples (of any arity). In particular, given a role-node n, its Interpretation is constituted by a set of tuples whose arity is equal to the number of class-nodes adjacents to n, and each component is labeled with the label of one adjacent class-node and takes its values in the corresponding Interpretation. An example of Graph Model usage, modeling the information concerning persons, employees, and cars, is shown in Figures 2.3 and 3.1. Figure 2.3 shows the Typed Graph g with the following constraints: Employee ISA Person; ATLEAST(1,Person,Age); ATMOST(1,Person,Age); ATLEAST(1,Car,Owns); ATMOST(1,Car,Owns). Figure 3.1 shows a possible Interpretation m for g. Figure 3.1: A possible Interpretation for the Typed Graph in Figure 2.3.

3.2 Fundamental Graphical Primitives In this subsection we recall the Graphical Primitives (GPs), first introduced in [Catarci, et al. 1993] . The main idea is to express any query-oriented user interaction with a database in terms of two simple graphical operations: the selection of a node and the drawing of a labeled edge. The former is the simplest graphical operation available

1

An equivalent textual syntax will be also used in the rest of this paper, i.e.: n1 ISA n2; where n1,n2 ∈NC; ATLEAST(k,n1,n2); where n1 ∈ NC, n2 ∈NR, and k∈ Z; ATMOST(k,n1,n2); where n1 ∈ NC, n2 ∈NR,and k∈ Z.

8

to the user, and corresponds to switching the state of a node. The latter is the linkage of two nodes by a labeled edge, and corresponds to either restricting the node Interpretations according to the rules stated in the label, or performing a set operation on them. This last case will not be discussed in the present paper, because we are concerned with select-project-join queries only. We assume that several views of a database may be used during query formulation, built by using the DUPLICATE function and we call initial GMDB a GMDB D on which no GPs have been applied, so that the following conditions hold: 1. the function f3(n) = unselected for each n ∈ N; 2. f2() = T for each ∈ E; 3. E ∩ (N R × N R ) = Ø. In the rest of the paper we denote by D = the database we operate on, and by D'=

Suggest Documents