128
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
Query by Templates: Using the Shape of Information to Search Next-Generation Databases —ARIJIT SENGUPTA AND ANDREW DILLON Abstract—We present a user-centered database query language called QBT (Query By Templates) for user communication with databases containing complex structured data, such as data stored in the Extensible Markup Language (XML). XML has revolutionized data storage as well as representation and transfer methods in today’s internet applications. The growing popularity of XML as a language for the representation of data has enabled its use for several applications involving storage, interchange, and retrieval of data. Several textual query languages have been proposed for XML retrieval, including the World Wide Web Consortium’s (W3C) recommendation of XQuery. Native XML database systems have been implemented, all of which provide methods for user communication with the database, although most communication methods use text-based query languages or form-based interfaces. QBT, the language presented here, is one of the first attempts toward a generalized alternative language that is based on human factors of familiarity. It is ideal for documents with a simple yet highly recognizable layout (e.g., poems, dictionaries, journals, etc.). We present the QBT language and report results from an initial usability test that shows promise for this type of an interface as a generalized user–database communication method.
Index Terms—Complex structured data, Extensible Markup Language (XML), information shape, query evaluation, query languages, query processing, visual languages, XQuery.
T
he Extensible Markup Language (XML) is widely regarded as one of the fastest growing technologies in recent times [1]. XML has created new opportunities for using documents as a means of information representation, interchange, and retrieval. With XML, structurally rich documents can be created for the traditional purposes of reading, browsing, and printing, as well as searching and querying. Searches in word-processor documents are usually restricted to linear word or phrase searches. With the current HTML internet technology, searches continue to be limited primarily to Boolean keyword matching. However, with proper use of the metadata embedded in XML documents, users have the ability to pose queries (complex searches involving both textual content and structure) in document collections, similar to database queries. The increase in popularity of XML has led to its use in most areas of application development. The Extensible HyperText Markup Language (XHTML) has already replaced HTML as the language for the internet [3]. The next generation of the web, termed as the Semantic Web, is based entirely on a series of XML-based standards [4]. This will
Manuscript received November 9, 2004; revised May 30, 2005. A. Sengupta is with the Department of Information Systems and Operations Management, Raj Soin College of Business, Wright State University, Dayton, OH 45435 USA (e-mail:
[email protected]). A. Dillon is with the School of Information, University of Texas, Austin, TX 78712 USA (e-mail:
[email protected]). IEEE DOI 10.1109/TPC.2006.875073 0361-1434/$20.00 © 2006 IEEE
necessitate efficient management of XML resources. This next-generation web will use database systems capable of handling high volumes of XML data and will need to allow users to query such information. In our view, this is the next generation of database systems—systems that will need to manage and search complex data with well-defined semantics. This leads us to the next logical question—how can users take advantage of the embedded structure in XML documents in their searches? Although most search engines provide some form of Boolean query formulation, users rarely utilize the Boolean query features in search engines. One expects that keyword searches will continue to be the most widely used type of search. However, the fact that XML documents include rich structural information will affect how users formulate their searches. The basic premise of user–database communication is deeper than simply the ability to retrieve documents via some content. Users can only communicate efficiently with databases when the communication medium provides users with the facilities for discovery as well as search using a familiar environment. Toward this goal, we generalize the Query By Example (QBE) language so that it can be applied to databases containing complex structured data [2]. QBE is suitable for relational databases since it uses tabular skeletons, which are analogous to tables in the relational model, as a means for constructing queries. In other words, the template for presenting queries in QBE is similar to the conceptual structure of the
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
data instances. We use this idea to generalize QBE for databases where each data instance, albeit complex, has a simple visual model. We base this assumption on the fact that human beings form a mental image of the result of the tasks that they intend to perform [5]. For example, users performing a search on a collection of purchase orders may not know the internal database representation of the purchase orders, but they usually know what a typical purchase order looks like. In our method, which we term QBT or Query By Templates, the basis of the interface is a visual template representing a data instance, such as (1) a sample purchase order for a sales database, (2) a small poem for a poetry database, (3) a table for a relational database, (4) a representative word definition for a dictionary database, and (5) a sample citation entry in a bibliographic database. This paper is organized as follows. First, we provide the rationale for our method using a document-oriented example and make our goals explicit in the rest of this section. We then describe some of the related work in this area to put our method in perspective. Next, we introduce the QBT concept, and we describe how queries are formulated using QBT. In the following section, we describe the implementation of QBT. Then, we describe an empirical study to demonstrate the usability of this method. Finally, we present some concluding remarks.
RATIONALE The idea of querying using templates comes from the fact that users tend to form a “mental model” for tasks they perform with computers which guide their interactive behavior [5]. Simply described, the “mental model” of a task is an internal representation of how to perform the task in any given situation. Recent work has suggested that users of digital documents tend to exploit spatial cues in determining how information is organized and create models of layout and organization that make it possible to navigate through the information space. Levene and Wheeldon suggest that users form specific patterns of usage based on their mental models [6]. Such patterns are also found in predominantly visual data, such as images, and Malik et al. show how spatial cues for organizing information are important for users when navigating through large image datasets [7]. Furthermore, on the basis of a life history of reading and using information, people have expectations of how information should be structured, and these expectations are applied during task performance in anticipation of locating target information within a document. This has been called the “shape” property of information [8]. It is argued that most information systems fail to exploit this property. As a result, locating information is unduly difficult for users. If information truly has shape in the minds of users,
129
then such characteristics may be a powerful way of encouraging accurate search processes. Let us explain this further with an example. Jane Doe was having difficulties locating a poem in a poetry database. She knew that the poet’s name was Blake, and she thought that the word “tiger” was somewhere in the first line. Using conventional search techniques, she was retrieving too many matching poems for her search. As she formulated her search, she also remembered the occurrence of the word “burning” in the first line, and with some effort, she could retrieve Blake’s poem “The Tyger.” Of course, the word tiger in this particular instance is spelled as t-y-g-e-r. One might argue correctly that Jane’s problem could be solved using a search method that can perform approximate searches. However, the goal of this research is not to design approximate search techniques. What is more important here is the fact that Jane possessed only a partial mental representation of the target poem, a common occurrence. Although her initial guess was unsuccessful, a refinement of the guess eventually resulted in a match—again, a common occurrence. We can represent this graphically as in Fig. 1 with the conceptual image (a) and the retrieved result (b). Our goal is to construct a search interface that can best exploit the situated and naturalistic search behavior of a typical user, where partial information of both content and structure can be combined to query a database. The methodology presented in this paper utilizes the users’ mental model of the result in the query formulation. The main challenge is the development of this seemingly abstract concept in a user-friendly manner. Goals and Contributions The goal of QBT is to improve on the state-of-the-art in user–database communications in complex structured databases, such as databases for XML documents. QBT is based on human factor principles of information seeking from electronic sources. The primary contribution of this paper is the design of a visual query language. The language is: (1) INTUITIVE: Users feel comfortable using the method because of its association to real data. (2) EXPRESSIVE: The language is not limited to a fixed ad hoc set of queries but can express most first-order queries. (3) GENERALIZABLE: The language can be used for any structure, particularly those which can be visually conceptualized. (4) EXTENSIBLE: The language is extensible to include different structures and functionalities. (5) CLOSED: The language can provide results in the same domain as the input using the same templates.
130
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
(6) IMPLEMENTABLE: The language can be easily implemented using currently available technology. (7) EXPLORATORY: The language encourages users to explore document structures and discover innovative ways of retrieving information. (8) ORIGINAL: Although the concept is based on the QBE technique, the adaptations in the method create an original technique that can be applied to data of any type of structure and database backend [1]. We support the design of the language with results from an initial usability analysis, in addition to some formal results.
RELATED WORK Looking for information in documents has been a subject of research for the last two decades. BROWSING, or “surfing,” documents is a method of information seeking wherein the user utilizes the organization of documents to reach the information he/she is looking for. This is a common method for information extraction, but its success depends upon the reader’s perseverance, visual attention, knowledge of the subject matter, the organization of the text, and related typographical factors. Carmel distinguishes three types of browsing: (1) search-oriented browsing—finding and reviewing information relevant to a fixed task; (2) review browsing—reviewing to integrate information in the presence of transient goals; and (3) scan-browsing—scanning (and not reviewing) for integrating information with transient goals [9]. There is a significant body of research that indicates that users experience a myriad of
Fig. 1.
difficulties with electronic documents that are not apparent with paper, such as loss of reading speed [10], lower accuracy [11], and many related affective factors. Within hyperlinked documents, the problems are exacerbated since the nonlinear format of most information adds significant cognitive overhead to the task [12], [13]. The introduction of hypertext has complicated the issue even further [14] (for a complete review of this literature, see [15]). Searching is another common method for information seeking, where the user provides a query as a set of Boolean search terms or a database query that is used to reduce the scope of browsing. A search is typically a precursor to browsing the results obtained from the search. With widespread usage of highly capable search engines such as Google which uses the famed PageRank technique for determining relevance [16], users now have a number of options to jumpstart their information seeking process. Domain-specific search engines have also become fairly common. For example, several search engines such as CiteSeer (http://citeseer.ist.psu.edu/), DBLP (http://dblp.uni-trier.de/), and Scirus (http://www.scirus.com/) provide access to academic articles to aid in the process of citation searches. However, all such search engines employ the use of simple web forms that primarily allow keyword searches of these repositories. The objective of this work is to find an alternative to current search solutions that matches the users’ cognitive model of search, instead of a form-based retrieval method followed by browsing. We are going to use XML as the primary vehicle for these advanced searches since XML has the metadata representation capabilities that advanced search processes need [2].
Example of a conceptual image of a search and the retrieved result.
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
Since keyword searches (including Boolean searches) have been accepted as a common mechanism for information retrieval, let us first explore the possibilities of performing standard keyword searches in XML [17]. The challenge in keyword searches for XML comes from the fact that the hierarchical structure may require the keyword searches to be deep within the structure. Florescu et al. propose a graph-based model and extend XML-QL with keyword search capabilities [18], [19]. They introduce the “like” operator in XML-QL that allows keyword searches within specific structural regions. Murata and Robie discuss the possibilities of integrating structured and full-text queries as well as transformation of data using queries [20]. Abiteboul et al. present a method for representing data in XML with incomplete information and a method to query the same [21]. Several other related approaches to structured document querying and views pursue the use of queries in structured documents [22]–[24]. Although keyword searches are quick, easy, and provide a common search methodology for documents, a simple keyword search typically generates too many hits. Ranking of XML search results has been researched to aid in this process [25]. Visual methods have been developed for assisting users with the search process. Baeza-Yates et al. propose a method for querying documents using both content and structure [26]. The method uses a visual query language that uses block-like structures to represent structural areas of a document. The authors note that “it is important that the users should know the document structure and hence the interface should show the structure and assist the query process in this way” [26, p. 8]. For Boolean queries without structure, interfaces similar to Venn diagrams have been proposed by Jones et al. [27], [28]. Jones uses VQuery, a graphical query interface application that provides an alternative query interface to textual Boolean query specifications and dynamically provides responses to refinements of queries [29]. Other related work in this area includes Query By Browsing (QBB)—a relationally complete visual query language for heterogeneous data sources that uses the desktop paradigm to provide a user interface for querying relational, object-oriented, and XML data sources [30]. Mohan and Kashyap present a visual query language called VQL capable of posing queries on object-oriented databases, using a set of “graphical primitives” along with a set of rules for combining them [31]. The graphical primitives represent object-instance-value relationships which can be combined graphically to pose complete Boolean queries. Vizla is another visual language which utilizes the procedural nature of query processing instead of a declarative nature [32]. In Vizla, answers to queries are developed by using sets of iconic representations of operators and
131
control constructs. Finally, Visual Knowledge Query Language (VKQL) provides an abstraction layer for user-interface driven query formulation using “a combination of direct-manipulation and form-filling techniques” [33, p. 698]. The novelty of VKQL lies in its use of the user-interface for the purpose of creation of the query, in which icons for different structural components can be selected by the user and combined by means of form-driven conditions. The most widely used search interfaces use search forms, which is a common method of searching in internet-based applications. In form-based interfaces, the user is presented with a list of searchable fields, each with an entry area that can be used to indicate the search term(s). To pose a query, the user needs to fill in the areas of the form relevant to the search. A more general form interface would have the option of specifying Boolean operations. Such query forms can be developed by application designers or automatically generated from databases or applications using methods such as XQForms, which allow designers to formally specify a form interface for any type of application [34]. Last but not least, we look at QBE—a visual language for querying relational databases [1], [35], [36]. This language has a simple interface composed of tabular skeletons representing tables in the database. Users specify queries by entering sample values (or examples) in appropriate areas of the table skeleton. These values can be either search terms or variables for the purpose of a “join” (or other operations that require variables). The idea behind QBE is that the users provide an example of outputs that they expect from the query, and the query engine looks in the database for data that match the given example. This works nicely for relational databases, primarily because the tabular structure of the database fits quite well with the tabular skeletons used in the interface. Although the literature reveals many novel techniques for visual as well as form-based methods for user-centered query processing, more research needs to be performed for finding a language suitable for use with XML that is formally complete yet user-friendly. A problem with some of the languages mentioned above is that they are either too specific, leading to problems when applied to different problem domains, or too general, thus failing to capture the users’ mental model of a search. The objective of this research is to propose and test an alternative solution which adheres to the users’ mental model for searching yet is generalizable enough for many applications.
QUERY
BY
TEMPLATES (QBT)
The current work generalizes QBE for databases containing complex structured data. In QBE, the
132
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
template for presenting queries is similar to the internal structure of the database. We use this idea to generalize QBE for any type of database in which each data instance has a simple visual template. In this generalized method, termed QBT, the basis of the interface is one or more visual templates that represent different types of data instances in the database. Any collection of data containing instances with a simple visual representation of their content can be used with QBT. Multiple templates may be necessary if different types of data are stored in the same collection. For databases that do not store information with a commonly accepted visual form, QBT can still be used by utilizing tables, forms, or other textual patterns as templates. Notations To provide a concrete and formal presentation of the semantics of the QBT language, we first develop some notations. Assume that we have a set of documents D belonging to a consistent type , determined by a Document Type Definition (DTD) or other schema. Each document type is represented using a template T , which is usually a simple visual representation of the document as described above. Each object t in the database is an instance of this template T (written as t 2 T ). We first define a basic template expression (or simply, a basic template) as TE , usually suffixed with the template type E . A basic template TE Ti (P; x), where P is a path expression (e.g., an expression using the XPath language [37]), and x is either (1) a constant (i.e., a query string), (2) a symbol s, or (3) a Boolean expression of strings using and, or, and not, and parentheses. Symbols analogous to variables are used primarily to indicate joins by placing the same symbol in different basic templates. Since several templates can be used in a query, each template can be suffixed
Fig. 2.
by a number or the name of the template used. A query in QBT is presented in two stages: (1) the query stage containing a Boolean expression (E ) of basic templates; and (2) the retrieval stage (R) containing a set of output specifications. The semantics of a QBT query Q = hE; Ri is thus: “retrieve the R components of every database instance that makes E true.” As an illustration, assume that a template Tp exists for the poetry example in the introductory section of this paper. The notation for the query discussed would be
= Tp (poem==poet; 'Blake') and Tp (poem== ine; '%tiger%') and T (poem== ine; 'burning') R = Q(Tp hpoemi):
Q
Note that the retrieval step of the above query simply fetches the complete poem from the template, and that the string ‘%tiger%’ represents the approximate match. QBT: The Basic Design At the simplest level, a QBT interface displays a template for a representative instance of the database. The user sees an example of the type of data he/she would expect to find in the database, such as a poem in a poetry database. For example, a poem template for a poetry database would contain regions corresponding to the poem’s title, stanzas, author, etc. The user specifies a query by entering examples of what she is searching for in the appropriate regions of the template, and the system retrieves all the database entries that match the example she provided. To illustrate the design of templates, we use a simple template for a poetry database, as in Fig. 2. In this figure, we indicate a prominent logical region of the poem by circling it and labeling it with the corresponding region’s
Simple template for poems, with its logical regions.
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
name. In general, the QBT interface consists of a template image divided into areas corresponding to different logical regions in the database, as in Fig. 2. Depending on the layout of the regions, the templates can be of different types, which is discussed in the following sections. Flat Templates: As described in the previous section, QBT relies on the presence of a simple visual template for the instances in the database. In most cases, this template could be planar or flat. This means that all logical regions of the template can be displayed simultaneously in a two-dimensional image without overlapping, as in Fig. 2. We call these templates “flat templates,” since all searchable regions are disjoint. Flat templates usually are easy to display and use, since the structural regions can be displayed simultaneously in a plane. Embedded structure also can be displayed by showing multiple instances of some regions. For example, in Fig. 2, a template needs to include two stanzas, one to represent a stanza as a whole, and another to represent its subregions First Line and Any Line. Nested Templates: Although flat templates are easy to display and navigate, they cannot model structures with deep levels of nesting. In this case, we use nested templates, in which regions are allowed to overlap. In particular, certain regions can be completely inside other regions to represent subregions. To display embedded logical regions we use one of two methods: (1) embedded regions and (2) recursive regions. In the first method, termed embedded regions, subregions are displayed inside the parent region, thus relaxing the requirement that template regions be disjoint. This method is a simple extension of flat templates, but it makes the templates much more powerful while retaining the simplicity of flat structure. However, this method again is limited to structures with shallow nesting depths, having a
Fig. 3.
133
top-level region that is physically large enough to include all the nested regions without completely obscuring itself. An example of this type of nesting is shown in Fig. 3(a). The second method, termed recursive regions, is the most general method of nesting regions. In this method, a region with subregions can be recursively expanded. During traversal, the user may “zoom-in” on a parent region to display its subregions. The magnified portion of the template can be an independent template and can be magnified subsequently to get to any additional levels of nesting. Although this method can capture any general structure, the templates have to be designed cleverly so that users are not disoriented by the nested templates. Fig. 3(b) shows this method of displaying internal structures for the same poem example. Structure Templates: Structures, particularly large ones, may become too complex to use nested templates. In these cases, it is often necessary to display the internal structure simultaneously with a template that displays the relative position of the current region. As mentioned earlier, most documents, particularly those in XML, can be thought of as having a hierarchical structure that can be visualized as a tree. Showing a template simultaneously with a hierarchy of logical regions depicting the context simplifies the nested structure visualization. An example of the structure template is shown in Fig. 4, which is a screen-shot from the prototype implementation of QBT, further described in a later section of this paper. Multiple Templates: Many queries require the use of more than one template. In relational databases, queries that derive the results from the contents of multiple tables require the constituent tables to be “joined” using a common attribute. QBE implements this by displaying skeletons for all the constituent
Nested templates with (a) embedded regions and (b) recursive regions.
134
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
tables. QBT incorporates a very similar strategy. Even though “joins” in text databases are less common since the data are implicitly linked in the structure of the documents, they are still necessary and give rise to many interesting queries, specifically when the results involve multiple databases or related fragments of the same database. To express these queries, two or more templates, connected with a joining region, are displayed. We give examples of such queries in a later section of this paper. Generalizability and Formal Stability of QBT As stated earlier, QBT captures the mental model of users where the database objects possess an intrinsic visual representation which the users are familiar with. QBT works best when a user can relate to the contents of the database simply by looking at the interface. However, this may not be possible for all database structures. For such structures, the QBT technique can be used, although the visual aspects of QBT will need to be compromised for more textual, pattern-based templates. For instance, in the XML domain, one might consider an incomplete XML document to be a template for specifying a query that retrieves the document fragments that satisfy the template. In this case, the template is specified as a pattern which is matched by the query processing engine. When a pattern-based template is used, the concepts of nested templates using both embedded and recursive regions can be used for the purpose of specifying nested patterns, very much like nesting queries. Interestingly, for applications using objects that lack a consistent visual shape, a form can be considered as a special case of a template. The structure template, as described above, is completely general and can be used for any type of complex
Fig. 4.
structure. A fully textual equivalent of QBT is one future direction of this research. Another positive aspect of QBT is that it is not simply a set of ad hoc query formulation constructs. The template expressions introduced here have been used to formally prove theoretical results (stated below) that show the theoretical stability of this language. Due to space limitations, these proofs are not part of this paper. The following results have been proven: • QBT is a first-order relationally complete language capable of expressing all relational algebra/calculus queries. • QBT is a strict subset of XQuery, and so QBT can be used as the front-end to any XQuery-capable database [38]. • QBT is closed in its domain of templates (i.e., the result of a QBT query can be represented using the same template as the query).
QUERYING WITH QBT Normal keyword searches within structural regions are simple and most natural with the QBT interface. As illustrated earlier, users express their queries by indicating the keywords in the appropriate regions of the template. In this section, we show the different search constructs allowed in QBT. One can treat QBE as a special case of QBT where the templates used are table skeletons that represent tables in the database [1]. In QBE, queries are specified by entering values in the column of the example table corresponding to the search attribute. These values may be constants (i.e., strings or numbers), variables/symbols or examples, usually
Screen shot from the prototype showing the template screen and the structure screen.
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
differentiated from the constants by underlining, or expressions involving constants and variables combined with arithmetic and logical operators. The output of the query is specified by marking the regions that need to be presented in the output. QBT uses the same basic principle with the extension that the templates are not restricted to table skeletons but can be any visual representation of the database instances. The primary difference between expressing queries in QBT and in QBE lies in the fact that the templates in QBE are essentially one-dimensional. QBE uses two-dimensional tables for querying, but the metadata, the attributes of the relations, appears along the horizontal axis as column headings of the tables, thus making the templates essentially linear. QBE uses the rows to specify multiple search conditions as well as logical operations between the search conditions (see examples in [1]). In QBT, the regions, or metadata, are distributed along both dimensions of the template, utilizing the whole template plane for visualizing the structure. Logical operations between regions (interregion) can be expressed by physically connecting two or more regions via a logical operator. Logical operations within regions (intraregion) can be specified using a Boolean expression within the scope of that region. In the rest of this section, we discuss how different types of queries are performed using QBT. Simple Selections Simple selections include searching for constant strings within the logical regions of the document; the whole document itself can be used as one region as in a standard keyword search. In QBT, such searches are performed by entering the search string in the corresponding region of the template. As a result of such a search, database instances that match all the specified search constraints are returned. In other words, the search criteria are combined using logical conjunctions. The result of the query, by default, consists of complete individual matching instances. However, users can mark the regions that they want returned by placing a print-marker on them to avoid complete instances
Fig. 5.
135
being retrieved. The simple selection method is obviously limited to one word or phrase per region. For more complex Boolean combinations of keywords, the users need to use the condition box, shown in the section on queries with complex conditions. Fig. 5(a) denotes the simple query: “Find the poem titles and poets of all the poems that have the word ‘hate’ in the title and the word ‘love’ in the first line.” Note that unlike QBE, searches are substring matches instead of exact string matches. So, entering the word “love” in the region first line locates all the poems where the first line contains the word “love.” In the case of documents, substring matches make more sense since exact searches are far less common. A side-effect of this way of matching is that the search for the word “love” will also retrieve text where “love” is a part of another word such as “beloved” or even “clover.” However, exact word boundaries can be easily specified using regular expressions. Using the notations described above, the query would be formally represented as
= T (poem==title; 'hate') and T (poem==body== ine; 'love') R = Q(T hpoem==title; poem==poeti):
Q
Selections With Multiple Conditions We have just seen that if multiple conditions are specified in different regions, they are combined using logical conjunctions, implying that the results returned from the query will satisfy all the specified search conditions. If this is not desired, search conditions can be combined using the logical operators and, or, and not. Negation of individual conditions is done by placing the keyword not in front of the search string. Implementations of the interface may use some visual mechanisms to place this negation operator. Connecting various search strings using the binary operators and and or involves simply connecting the two strings using a pointing device and selecting the proper operation type for that connection. Fig. 5(b) demonstrates how this is done by expressing the query: “Find the poem titles and poets of all poems
Query formulation with QBT: (a) simple selections and (b) logically combined selections.
136
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
that either do not have the word ‘hate’ in the title or have the word ‘love’ in the first line.” Notice the introduction of the negation and the or connection. Using the notations described above, the above query is represented as
= not T (poem==title; 'hate') or T (poem==body== ine; 'love') R = Q(T hpoem==title; poem==poeti):
Q
Note that if multiple clauses are connected using logical constructs, the order in which the expressions are evaluated depends on the direction of the arrow. However, it is possible to override this order of evaluation by placing parentheses in appropriate places in a condition box (see the section entitled queries with complex conditions). Providing a two-dimensional visualization for a strictly ordered chain of query components connected with logical operations can be somewhat tricky. In our approach, we tried to keep the interface as simple as possible by implying conjunctive connectors when there are no arrows and explicitly specifying disjunctive or conjunctive connectors when necessary. The algorithm to derive the logical expression from its graphical representation is very similar to a minimal spanning tree algorithm [39]. The algorithm is initiated with one of the nodes, which does not have any incoming arrows. A minimal spanning tree is then built with all the nodes not already visited, reachable from the starting node. This process is continued until all nodes have been included. This process ensures that each node is entered only once in the expression. However, it is only a heuristic method, and may or may not correspond exactly to the query the user had in mind. In order to ensure that the proper query is processed, the condition box needs to be used. Joins and Multiple Templates A JOIN is an operation in which multiple fragments of a database are combined together based on some common property. “Joins” are indispensable in relational databases, since the relational design involves “normalizing” a schema by breaking it into flat tabular fragments. This fragmentation necessitates combining the individual
Fig. 6.
Query formulation with QBT: Joins.
fragments together at the time of query processing using the join operation. However, in document databases, the structure is not normalized into planar fragments but is allowed to grow hierarchically, so “joins” are not required to combine fragments. However, “joins” are still useful to solve queries that require a comparison between different parts of a database or between different instances of the same database. For example, one may try to “find the pairs of poets who have at least one poem with a common title.” In this case, we need to involve two instances from the poetry database and run a query comparing the titles of the two poems. This is achieved in QBT by using multiple templates. In the case of the above query, the same template is instantiated twice, and the join attributes are connected together. The connection can be augmented with comparison operators to specify “joins” other than “equi-joins” (“joins” that use equality on the join attributes). Once again, in the case of asymmetric comparison operations, the precedence of the operators is determined by the direction of the arrow. To keep the conceptual similarity with QBE, examples are underlined to differentiate them from constants. Note that in the current example, specifying only the equi-join on poem titles would result in retrieving all poets in the database paired with themselves. To prevent this, an additional inequality join condition needs to be specified, ensuring that both the poets are not the same (see Fig. 6). Notice that visualizing the results of join queries may not be possible using the same template as the query itself, although this problem can be solved by specifying layout characteristics (using stylesheets, for example) to display the results. The closure of the interface is maintained by the fact that the query outputs consist of XML documents only, and so they can be displayed using the same methods used for displaying the template. Queries With Complex Conditions Visualization of queries that combine conditions on more than two regions using logical operators is difficult in QBT; this problem arises due to its flatness. A more
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
137
advanced visual representation of Boolean conditions is a possibility for future extension [28]. Connecting the regions together is not always sufficient because the intended order of these operations is important. In QBE, such complex situations are expressed in a separate area from the skeletons, commonly called the CONDITION BOX. The condition box is simply a small text window in which complex conditions can be expressed using logical expressions where the order of evaluation is denoted using parentheses. The condition box can also be used to override the default precedence of operators.
PROTOTYPE IMPLEMENTATION OF QBT
QBT uses a similar mechanism to express complex Boolean expressions. As search strings and examples are specified, the condition box is automatically updated. The user can then insert parentheses as necessary to change the default precedence. For example, in Fig. 7, if the default precedence (left to right) is used, the query evaluates to: “Find the poem titles and poets of the poems in which either the word ‘hate’ is in the title and the poet is Shakespeare, or the word ‘love’ is in the first line.” The default condition box is shown in Fig. 7(a). However, this default can be changed to: “Find the poem titles and poets of the poems in which the word ‘hate’ is in the title, and either the poet is Shakespeare or the word ‘love’ is in the first line” [see Fig. 7(b)].
As an experiment, we used the Chadwyck-Healey English Poetry database with templates similar to those described in this paper for performing the queries. This database was chosen because it was completely marked up in SGML, a precursor to XML, and hence a potential next-generation data collection. The engine generates its output in XML, along with an XLST stylesheet, which is displayed by the web browser. The database engine used is a locally developed XML database system called DocBase, which utilizes the Pat indexing system [40], [41]. Here we specify a short description of some of the implementation challenges and solutions for this interface.
Using the notations described above, the modified query is represented as
= T (poem==title; 'hate') and (T (poem==poet; 'shakespeare') or T (poem==body== ine; 'love')) R = Q(T hpoem==titlei; poem==poeti):
Q
The condition box can also be used for specifying complex conditions involving more than two variables in an expression. In this case, QBT’s condition box has the same functionality as that of QBE. The main use of the condition box is to provide the power necessary to generalize the querying method to accommodate all types of queries supported by the formal query languages and hence, add to the expressive power of the language.
Fig. 7.
We built a prototype of the QBT interface using Java programming language. The prototype implements most of the features described here including the embedded template, though without recursive magnification, and the structure template. We have not incorporated the condition box in this prototype; it will be added in the next phase. We also included an experimental version of an SQL language translator from the QBT query. Fig. 4 shows two parts of the screen, one showing the template screen and the other showing the structure template.
One of the biggest challenges of the implementation was to provide a mechanism for easy query formulation for novice users, as well as the possibility of structure discovery, and also a textual query processing method for advanced users. The current prototype has three components of the interface, of which only one can be viewed at a time. The QBT interface that we discussed earlier is included in the “Template screen.” The structure of the database schema is displayed in the “Structure screen,” and the equivalent DSQL query is shown in the “SQL screen.” We will briefly outline the template and structure screens. The SQL screen simply allows advanced users to interact with the system directly using DSQL queries. In the prototype, users switch back and forth between the screens using a tabbed folder selection method. The top of the interface consists of three buttons
Changing precedence of operations with condition boxes.
138
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
that function like three tabs, which can be selected to activate the corresponding screen. As each search constraint is added, the current number of matches is immediately shown. The users may undo the addition of the last search constraint using a back button. When a particular screen is selected, the tab corresponding to that screen gets dimmed, highlighting the current selection and also indicating to the user that he/she may switch to one of the other two screens. The bottom of the screen has two buttons for submitting the query for evaluation and clearing the current query, much like the buttons found in most HTML forms. In addition, the bottom of the screen also includes options for selecting the number of matches that the system should retrieve at a time and for selecting the region that should be displayed as the default result. For expert users, an expert mode is available that provides more Boolean querying facilities. The center of the displayed region contains the main query interface. This is the part that the user may change using the buttons at the top of the screen. By default, the system displays the template screen at start-up. The Template Screen The template screen consists of a template image in the background. As the user moves the mouse across the template, the position of the mouse activates the underlying region. This highlights the region on the template and displays the name of the region on the status bar. A mouse click on the activated region brings up an expression builder for that region. The expression builder consists of at least one entry area for inputting one or more search terms. It also includes a check-box for indicating negation on that region. When checked, the semantics of the search expression in the target region are negated. Currently, the expression needs to be explicitly included in the entry area, but a full implementation will have a graphical expression builder that users can use to build Boolean combinations of keywords. A screen capture for the template screen is shown in Fig. 4. The Structure Screen The structure screen displays the hierarchical structure of the database. This screen displays the same query as in the templates, by associating a search condition with the corresponding region in the hierarchical display. The structure can be expanded and collapsed by the user as a means for traversing the document structure, similar to traversing file system structures in a file browsing interface. Ideally, the structure should be displayed together with the template, with the current region highlighted in both the template and the structure to give the user an idea of the context. In the current implementation, navigation of the structure needs to be manually performed by the user.
The structure screen has two parts: the left half of the screen displays the structure of the database, and the right half of the screen displays the query corresponding to the current region highlighted in the structure. The user can change the query by modifying the query text in this section. The condition box is a part of the screen, although it is not implemented in the current prototype. If the user is formulating a query solely using the structure screen, the condition box is the only way to specify Boolean combinations of the individual query fragments corresponding to each region. A screen capture for the structure screen is shown in Fig. 4.
USABILITY ANALYSIS Sufficient evidence exists in the literature to show that all users do not seek information in the same manner. Marchionini summarizes several studies that show that the processes of users applying “retrieval systems to information-seeking problems reinforce the general theory of user-centered information-seeking” [42, p. 32]. This general theory argues for direct observation of users interacting with a system in order to assess its value and to suggest a redesign. Loeber and Cristea argue from their observations that users find information in two ways: (1) internal searching or recalling of stored information from memory, and (2) external searching—collecting information from online or offline outside sources [43]. The primary model of QBT as presented here is based on a combination of the two forms, where a partial internal view is developed by the user which is then completed by the external search. Stein et al. argue similarly that “users initially have vague information needs: they know they need some information but often cannot specify it” [44, p. 134] To determine this new interface’s potential for assisting real users, we conducted a usability evaluation with 20 participants, comparing their performance on QBT versus another existing interface. Usability has been defined under ISO 92 411-part 11 as the extent to which specific users performing specific tasks in a given context are effective, efficient, and satisfied. These criteria are generally operationalized in terms of task completion, speed of performance, and general ratings of effect (see, e.g., [15]). For comparison purposes, we employed a form-based interface (see Fig. 8). Our goal was to determine if QBT led to different user performance characteristics than a standard interface. Furthermore, we tested both novice users and experts, since it is clear from the literature that these user types often have different needs. Since QBT is a radical departure from traditional search interfaces, it is conceivable that it has different effects on different user types.
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
Method Users: Participants were chosen from the graduate students of the English Department of Indiana University. Expertise level was determined by stated familiarity with the Chadwyck-Healey collection and knowledge of database searching. Participants classified as novices had very limited knowledge of advanced search systems, although everyone was familiar with standard keyword search interfaces such as those provided by internet search engines. Participants classified as experts were familiar with advanced Boolean search techniques and were frequent users of electronic resources in the library and on the web. All participants, for copyright reasons, had to have an affiliation with Indiana University. Tasks: Ten tasks (see Appendix A) were designed to test a range of features of the search interfaces. These varied in difficulty but ensured that the participants were required to explore features. The tasks were chosen to cover all the different querying aspects of QBT implemented in the prototype. The first, and easiest, query was primarily meant for the subjects to get acquainted with the system. The other queries ranged from simple searches involving a single clause in a field to complex searches involving up to four clauses combined together. Note that the QBT interface had no restrictions on the number of clauses that could be specified, but the form interface was limited to only four clauses. The tenth task was open-ended; participants were asked to search for something of their own interest, which could be used
Fig. 8.
139
to guide future task selection for later tests. The first and the tenth tasks were not used for the analysis purposes. Test Design: A repeated measures, counterbalanced test design was employed with the interface type (QBT or Form) and the user type (Expert or Novice) being set as independent variables. The dependent variables were time on task (efficiency), accuracy of answers (effectiveness), and survey responses (satisfaction). Further satisfaction data were gathered through an interview after the test. The participants’ interactions were timed and recorded automatically by the server and the query engine executing the queries. The server also kept a detailed log of the actions and queries of all participants. Procedure: After pilot testing the method on four participants, ten novice and ten expert searchers were recruited and randomly assigned to either the QBT (Fig. 5) or the standard interface (Fig. 8) with experience counter-balanced across conditions. The subjects were introduced to the experiment and the target interface. Once the participants reported that they were ready, they were given the experimental queries and asked to perform all tasks sequentially and to note down the number of matches returned by the database for every query performed, including the titles of the top five poems returned by their search, after visually inspecting the correctness of their results. Upon completing the query set, participants completed a set of survey questions. They were
Form implementation of the query interface used in the usability analysis.
140
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
subsequently asked to verbally describe their feelings and general reactions regarding the functionality and appropriateness of the system they used. These interviews were transcribed and used as part of a qualitative analysis as a diagnostic tool for determining potential areas of improvement in the prototype as well as the methodology. Results For each of the dependent variables, we performed a multivariate ANOVA test, with a 0.05 significance level [45]. The quantitative results for the dependent variables are summarized below. Accuracy: We took the accuracy measures by evaluating the answers to each question on a 0–5 scale, with 5 indicating a perfect answer, and 0 indicating an unattempted or a completely wrong answer. Partially incorrect answers were given a value within the range of 1 to 4, inclusive, based on the type of mistake made. A set of guidelines for grading was established, and two graders independently graded the queries (with high intergrader reliability). The average of the grades was used for the analysis. Tasks 2 and 4 had a standard deviation value of 0, since all users had correct answers for these tasks. For the rest of the tasks, the cumulative effect of expertise or interface was nonsignificant at the P < 0:5 level (F (1; 16) = 0:021, p = 0:886 for interface effects, F (1; 16) = 0:00, p = 1:000 for expertise effects, and F (1; 16) = 1:045, p = 0:322 for their interaction). This suggests that there are no differences between users performing on QBT or Forms. Efficiency: For the efficiency measure, we used the time (in seconds) between submissions of two successive queries. Aggregate analysis over Tasks 2 through 9 shows that experts were significantly more efficient than the novices on both the types of interfaces (F (1; 16) = 19:703, p = 0:000). This confirms the validity of the expert manipulation performed. However, there was no significant difference in efficiency between the two interfaces (F (1; 16) = 0:644, p = 0:434) and no interaction (F (1; 16) = 0:202, p = 0:659). Univariate tests of significance on individual tasks, however, show significant effects of interface for Task 7 (F (1; 16) = 16:385, p = 0:001), where users of the form interface were significantly more efficient than QBT users. Upon observation of the subjects’ actions, we discovered that this task required the users to switch to a different screen for the QBT interface. Unfortunately, most of the users needed some time to understand the necessity for this action, hence causing the delay. This is one area for improvement of the QBT design. Satisfaction: For the satisfaction measure, the users were asked to grade different aspects of each interface on a 5-point Likert scale. The satisfaction data were collected after all the tasks were performed and was not calculated on a task-by-task basis.
An ANOVA on the data demonstrates a significant effect on satisfaction of the interface at P < 0:05 level (F (1; 16) = 7:53, p = 0:014), although there was no significant effect of expertise on satisfaction (F (1; 16) = 0:471, p = 0:503). This suggests that users preferred the QBT interface. Qualitative Analysis Several different types of qualitative data were collected during the course of this study. First, the users were individually monitored, both electronically, via logs of their actions, as well as by the experiment coordinator, and any difficulties or questions were noted. Next, they were asked to provide some oral feedback on their feelings for the system, including both positive as well as negative experiences. They were also asked to answer some survey questions related to usability of the method. The primary purpose of the qualitative data was to investigate the decision making behaviors of users to evaluate the appropriateness of the methodology (a strategy inspired by [46]). Although a complete verbal protocol approach was not employed because of the distraction to the subjects, the final oral debriefings were recorded and transcribed. Some of the user comments and the implications toward the method and implementation are shown in Table I. Table I shows only a small selection of the subject comments, questions, and observations gathered during the study. The comments are useful as a diagnostic tool for understanding the limitations of the method, although as the data demonstrate, most of the issues the users had were with the prototype implementation and not with the QBT methodology. A more complete implementation might potentially result in different empirical analysis outcomes, which is a future extension of this research. Limitations and Discussion The small number of subjects can be considered a limitation of this study, and hence the significant and nonsignificant results will be used primarily as a starting point toward a more intensive empirical analysis. However, the development of the study was based on the theory of mental models, with the premise that an interface based on users’ mental models would be more satisfying than a generic interface. This theory can be generalized for a potentially larger scale usability analysis. The result of this analysis also shows ample potential. Quantitatively, the data indicate no significant effect of the interface or expertise on accuracy. However, the data suggest that the experts were significantly more efficient than the novices, and the users of the QBT interface were significantly more satisfied than with the form-based interface. Although the nonsignificance of accuracy can be considered a drawback of this study, it does demonstrate that a more complex user interface does not affect users’ performance, but can improve their satisfaction. Further, the qualitative data gathered
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
provided several insights on potential improvements to the implementation. Overall, the user behavior during the study, the significant difference for satisfaction between the users of the two interfaces, and the qualitative information gathered from this study does show tremendous potential for the mental-model-driven QBT interface. Most users of the QBT system felt that it was refreshingly different from the form-driven interfaces so commonly used in today’s search systems.
CONCLUSION AND FUTURE WORK We presented QBT as a language based on the human factors of information seeking. The theory of information seeking shows that users seeking information develop a partial internal image of the target data which they want to complete using an external search; this is exactly the idea we wanted to capture in a language. Preliminary usability analysis shows some highly encouraging results and confirms the intuition that experts would be more efficient regardless of the artifact used, yet novices would be able to reach a high level of accuracy given enough time. Our comparison shows no significant differences in accuracy or efficiency between the interfaces,
141
although the data suggest significantly higher user satisfaction with the QBT interface. We believe that the most innovative aspect of QBT is its direct relationship to the internal structure of the database or the “mental model” of users thinking about that structure [5]. Forms always look the same, whether the underlying database is a poem, a dictionary, a quotation collection, or even a relational database. However, templates can be custom-designed for different types of databases. Moreover, templates use the principle of familiarity, which is demonstrated to work well for novice users [47]. The other interesting aspect of QBT is the fact that it can be adapted for all types of structures, and even forms can be considered as a special case of templates. Given the theoretical, formal, and experimental results, we strongly believe that QBT provides a route to the user–database communication method of the next-generation database systems. There are several future directions of this research. First, the implementation of QBT is in an early developmental stage and has substantial potential for improvement. For example, with highly complex hierarchies, the focus can be concentrated in the
TABLE I SUBJECT COMMENTS AND THEIR IMPLICATIONS
142
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
regions of interest using advanced methods like differential magnification [48]. The experiment we performed clearly indicated some of the ways in which it could be improved. Once XQuery execution engines are widely and commercially available, we can use the equivalence property to convert QBT queries into XQuery for direct execution against any
XML structure. An extensive empirical analysis with a much larger group of users will be needed for further validation of this methodology. We believe that the idea behind QBT will give us a starting point for query interfaces in future database systems, which will involve large amounts of XML data.
APPENDIX TASKS USED IN THE USABILITY TESTING (1) (2) (3)
Find the poems written by “Shakespeare.” How many poems were written in the Middle English Period (MEP)? Find all the poems written in the Early 19th Century period (C19A) that have the word “burning” in the first line. (4) Find the poems that have the word “hate” in the title and the word “love” in the first line. (5) Find the poems not written by “Hemans” that have the word “wreck” somewhere in a stanza. (6) Find the poems written during the Early 18th Century (C18A) which have the word “love” in the collection title, as well as in the poem title, but not in the first line. (7) Find the poems that have the phrase “expostulation and reply” anywhere in the body of the poem. (8) Find the poems written by Keats that do not have the word “mortal” in any of the stanzas. (9) Find the poems written by Shakespeare that have the phrase “to be or not to be” somewhere in the poem body. (10) Write a query of your own from your interest in poems, and indicate the number of matches you found for that query.
ACKNOWLEDGMENT This work was supported in part by Wright State University Research Challenge Funds, in part by US Department of Education award P200A502367, and in part by a National Science Foundation (NSF) Research and Infrastructure Grant award NSF CDA-9303189. The authors are very thankful to the subjects of their usability analysis for their participation and cooperation. They are also grateful to the LETRS (Library Electronic Text Resource Service) subdivision of Indiana University Library and especially to the ex-directors R. Ellis and M. Day for allowing them to use the Chadwyck-Healey Database for this research. They would also like to thank Prof. D. Van Gucht, K. Duffy, and the reviewers for their careful reading and useful comments.
REFERENCES [1] T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau. (2004, Feb.) Extensible Markup Language (XML) 1.0. W3C Recommendation. [Online]. Available: http://www.w3.org/TR/2004/REC-xml-20 040 204/ [2] M. M. Zloof, “Query by example: A database language,” IBM Syst. J., vol. 16, no. 4, pp. 324–343, 1977. [3] (2002, Aug.) HTML Working Group 2002. XHTML 1.0 the Extensible HyperText Markup Language. W3C Recommendation. [Online]. Available: http://www.w3.org/TR/xhtml1/ [4] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Sci. Amer., vol. 284, no. 5, pp. 34–43, 2001. [5] P. Booth, An Introduction to Human-Computer Interaction. Hove, East Sussex, UK: Laurence ErlBaum Associates Pub., 1989. [6] M. Levene and R. Wheeldon, “Navigating the world-wide-web,” in Web Dynamics, M. Levene and A. Poulovassilis, Eds. Heidelberg, Germany: Springer-Verlag, 2004, pp. 117–152. [7] J. Malik, D. Forsyth, M. Fleck, H. Greenspan, T. Leung, C. Carson, S. Belongie, and C. Bregler, “Finding objects in image databases by grouping,” in Proc. Int. Conf. Image Processing (ICIP-96) Special Session on Images in Digital Libraries, 1996, pp. 761–764. [8] A. Dillon, “Spatial semantics: How users derive shape from information space,” J. Amer. Soc. Inf. Sci., vol. 51, no. 6, pp. 521–528, 2000. [9] E. Carmel, S. Crawford, and H. Chen, “Browsing in hypertext: A cognitive study,” IEEE Trans. Syst., Man, Cybern., vol. 22, no. 5, pp. 865–884, Sep./Oct. 1992. [10] J. Gould, L. Alfaro, V. Barnes, R. Finn, N. Grischkowsky, and A. Minuto, “Reading is slower from CRT displays than from paper: Attempts to isolate a single variable explanation,” Human Factors, vol. 29, no. 3, pp. 269–299, 1987.
SENGUPTA AND DILLON: QUERY BY TEMPLATES: USING THE SHAPE OF INFORMATION
143
[11] A. Creed, I. Dennis, and S. Newstead, “Proof-reading on VDUs,” Behav. Inf. Technol., vol. 6, no. 1, pp. 3–13, 1987. [12] T. Nelson, Literary Machines Version 87.1. Sausalito, CA: Mindful Press, 1987. [13] B. Shneiderman and G. Kearsley, Hypertext Hands-on! An Introduction to a New Way of Organizing and Accessing Information. Reading, MA: Addison-Wesley, 1989. [14] M. Lehto, W. Zhu, and B. Carpenter, “The relative effectiveness of hypertext and text,” Int. J. Human-Computer Interaction, vol. 7, no. 4, pp. 293–313, 1995. [15] A. Dillon, Designing Usable Electronic Text: Ergonomic Aspects of Human Information Usage, 2nd ed. Boca Raton, FL: CRC Press, 2004. [16] S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Comput. Netw. ISDN Syst., vol. 30, no. 1–7, pp. 107–117, 1998. [17] G. Salton, “Developments in automatic text retrieval,” Science, vol. 253, pp. 974–980, 1991. [18] D. Florescu, I. Manolescu, and D. Kossmann. (2000) Integrating keyword search into XML query processing. Proc. 9th World Wide Web Conf. (WWW9) [Online]. Available: http://www9.org/w9cdrom/324/324.html [19] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. (1998, Aug.) XML-QL: A Query Language for XML. W3C. [Online]. Available: http://www.w3.org/TR/1998/NOTE-xml-ql-19 980 819/ [20] M. Murata and J. Robie. (1998) Observations on structured document query languages. Proc. W3C Conf. Query Languages (QL98) [Online]. Available: http://www.w3.org/TandS/QL/QL98/pp/murata-san.html [21] S. Abiteboul, L. Segoufin, and V. Vianu, “Representing and querying XML with incomplete information,” in Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles Database Systems, Santa Barbara, CA, 2001, pp. 150–161. [22] S. Abiteboul, S. Cluet, and T. Milo, “A logical view of structured files,” VLDB J., vol. 7, no. 2, pp. 96–114, 1998. [23] S. Abiteboul and V. Viannu, “Regular path queries with constraints,” in Proc.: ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 1997, pp. 122–133. [24] S. Abiteboul and C. Beeri, “The power of languages for the manipulation of complex values,” VLDB J., vol. 4, no. 4, pp. 727–794, 1995. [25] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, “XRANK: Ranked keyword search over XML documents,” in Proc. 2003 ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2003), 2003, pp. 16–27. [26] R. A. Baeza-Yates, J. Vegas, G. Navarro, and P. de la Fuente, “A model and a visual query language for structured text,” in 5th South American Symp. String Processing and Information Retrieval (SPIRE’98), 1998, pp. 7–13. [27] B. Grünbaum, “The construction of Venn diagrams,” Coll. Math. J., vol. 15, pp. 238–247, 1984. [28] S. Jones, S. McInnes, and M. Staveley, “A graphical user interface for Boolean query specification,” Int. J. Digital Libraries, vol. 2, no. 2/3, pp. 207–223, 1999. [29] S. Jones, “Graphical query specification and dynamic result previews for a digital library,” in Proc. ACM UIST’98: 11th Annu. Symp. User Interface Software and Technology, 1998, pp. 143–151. [30] S. Polyviou, G. Samaras, and P. Evripidou, “A relationally complete visual query language for heterogeneous data sources and pervasive querying,” in Proc. IEEE Int. Conf. Data Engineering, 2005, pp. 471–482. [31] L. Mohan and R. L. Kashyap, “A visual query language for graphical interaction with schema-intensive databases,” IEEE Trans. Knowl. Data Eng., vol. 5, no. 5, pp. 843–858, Oct. 1993. [32] A. T. Berztiss, “The query language Vizla,” IEEE Trans. Knowl. Data Eng., vol. 5, no. 5, pp. 813–825, Oct. 1993. [33] K. Siau, H. Chan, and K. Tran, “Visual knowledge query language,” IEICE Trans. Inf. Syst., vol. E75D, no. 5, pp. 697–703, 1992. [34] M. Petropoulos, V. Vassalos, and Y. Papakonstantinou, “XML query forms (XQForms): Declarative specification of XML query interfaces,” in Proc. 10th World Wide Web Conf. (WWW10), 2001, pp. 642–651. [35] R. Krishnamurthy, S. P. Morgan, and M. M. Zloof, “Query-by-example: Operations on piecewise continuous data (extended abstract),” in Proc. 9th Int. Conf. Very Large Data Bases, 1983, pp. 305–308. [36] M. M. Zloof, “QBE/OBE: A language for office and business automation,” IEEE Comput., vol. 14, no. 5, pp. 13–22, 1981. [37] J. Clark and S. DeRose. (1999, Nov.) “XML path language XPath Version 1.0,” Working Draft. W3C Recommendation. [Online]. Available: http://www.w3.org/TR/xpath [38] D. Chamberlin, J. Clark, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu. (2001, Jun.) “XQuery 1.0: An XML Query Language,” Working Draft. W3C Recommendation. [Online]. Available: http://www.w3.org/TR/xquery [39] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT Press, 1989. [40] A. Sengupta, “Toward the union of databases and document management: The design of DocBase,” in Proc. 9th Int. Conf. Management of Data, Databases for the Millennium 2000 (COMAD’98), 1998, pp. 88–109. [41] Open Text 5.0, Technical Manual. Waterloo, ON, Canada: Open Text Corp., 1994. [42] G. Marchionini, Information Seeking in Electronic Environments. Cambridge, UK: Cambridge Univ. Press, 1997, vol. 9, Cambridge Series on Human-Computer Interaction. [43] S. Loeber and A. Cristea, “A WWW information seeking process model,” Educat. Technol. Soc., vol. 6, no. 3, pp. 43–52, 2003. [44] A. Stein, J. Gulla, and U. Thiel, “User-tailored planning of mixed initiative information seeking dialogues,” Special Issue on Computational Models for Mixed-Initiative Interaction, User Modeling and User-Adapted Interaction, vol. 8, no. 1-2, pp. 133–166, 1999. [45] T. H. Wonnacott and R. J. Wonnacott, Introductory Statistics. New York: Wiley, 1990. [46] R. Krishnan, X. Li, D. Steier, and L. Zhao, “On heterogeneous database retrieval: A cognitively guided approach,” Inf. Syst. Res., vol. 12, no. 3, pp. 286–301, 2001. [47] D. Norman, The Design of Everyday Things. New York: Doubleday Currency, 1990. [48] T. A. Keahey and E. L. Robertson, “Techniques for nonlinear magnification transformations,” in Proc. IEEE Information Visualization Symp., 1996, pp. 38–45.
144
IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 49, NO. 2, JUNE 2006
Arijit Sengupta received the Ph.D. degree in Computer Science from Indiana University, Bloomington. He is Assistant Professor of Information Systems at the Raj Soin College of Business, Wright State University, Dayton, OH. His research areas are in databases and XML, specifically in modeling, query languages, data mining, and HCI.
Andrew Dillon received the Ph.D. degree from Loughborough University of Technology, Leicestershire, UK. He is Dean of the School of Information and Professor of Information, Psychology, and Information, Risk and Operations Management at the University of Texas, Austin. His research is in the area of human response to information techology.