to generate user interfaces for multiple devices fully auto- matically from a high-level model. In contrast to previous approaches focusing on abstracting the user ...
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Fully-automatic generation of user interfaces for multiple devices from a high-level model based on communicative acts J¨urgen Falb Roman Popp Vienna Univ. of Technology Institute of Computer Technology A–1040 Vienna, Austria {falb, popp} @ict.tuwien.ac.at
Thomas R¨ock Helmut Jelinek Siemens Austria PSE A–1210 Vienna, Austria {thomas.roeck, helmut. jelinek}@siemens.com
Abstract
Therefore, it is highly desirable to have
The problems involved in the development of user interfaces become even more severe through the ubiquitous use of a variety of devices such as PCs, mobile phones and PDAs. Each of these devices has its own specifics that require a special user interface. Therefore, we developed and implemented an approach to generate user interfaces for multiple devices fully automatically from a high-level model. In contrast to previous approaches focusing on abstracting the user interface per se, we make use of speech act theory from the philosophy of language for the specification of desired intentions in interactions. Our new approach of using communicative acts in high-level models of user interfaces allows their creation with less technical knowledge, since such models are easier to provide than user-interface code in a usual programming language. From one such high-level model, multiple user interfaces for diverse devices are rendered fully automatically using a number of heuristics. A generated user interface for a PDA is already in real-world use and its usability was informally evaluated as good.
1
Edin Arnautovic Hermann Kaindl Vienna Univ. of Technology Institute of Computer Technology A–1040 Vienna, Austria {arnautovic, kaindl} @ict.tuwien.ac.at
Introduction
Developing user interfaces (UI) is hard, error-prone and expensive. In particular, UI design requires special expertise. It is important today to provide user interfaces especially designed for a variety of devices quickly and cheaply. That is why programming them by hand takes too long and is too expensive.
• means for high-level modeling that does not necessarily require experts for UI design, and • automated generation of UI code, which should be very cost-efficient as compared to manual programming. The underlying idea behind both our modeling and the generation is the use of communicative acts, which allows the specification of user interfaces on a high-level interaction basis (instead of, e.g., a screen or widget basis). They are derived from speech act theory and express intentions in the sense of desired effects on the environment. This makes it “natural” for humans to express interactions in this form. We have fully implemented this approach in a tool named sConCAF (Siemens CONtext aware Content Assembling Framework), which is already in preliminary industrial use for real-world applications supporting multiple diverse devices. As a running example, let us use a small excerpt from a unique and new hypermedia guide that has been put into operation by the Kunsthistorisches Museum (KHM) in Vienna. It had been created with our approach and tools. The remainder of this paper is organized in the following manner. First, we give an overview of the state of the art in terms of related work, and sketch what communicative acts are and where they come from. Then we present and explain our high-level UI models based on communicative acts and, how the automatic generation of user interfaces from such models works according to our approach. We also show examples of automatically generated UIs for multiple devices.
1530-1605/07 $20.00 © 2007 IEEE
1
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Finally, we summarize the subjective evaluations of KHM visitors having used our approach.
2
State of the Art
Both costly and time-consuming design and implementation of user interfaces and a great variety of user devices have lead to several approaches and systems. So, let us first present the state of the art in model-based UI specification and automated generation of user interfaces for multiple devices. Declarative languages are used for the description of device-independent user interfaces with abstractly defined UI structures [1, 8, 16]. All this work on languages is important but mostly on a low semantic level. Browne et al. [2] utilize three kinds of UI models: Presentation models for the appearance of user interfaces in terms of their widgets, Application models representing which parts (functions and data) of applications are accessible from the user interface and Dialogue models representing end-user interactions. While this work utilizes descriptions of interactions represented by dialogue models, it does not model intentions involved in the interactions as in our work based on communicative acts. Mobi-D is a model-based interface development environment [15]. In order to present different views of a UI, Mobi-D utilizes several declarative models at different abstraction levels. This approach is similar to ours regarding its emphasis on declarative models. However, the communicative acts used in our approach allow a higher-level specification that also facilitates automated UI synthesis, while Mobi-D addresses design support. An advanced approach to specifying multi-device user interfaces based on task models is presented in [10]. The basic approach is to start from tasks to be supported by the application and to separately generate user interfaces for different devices according to specific device characteristics. While this approach seems to be of great help for user interface developers, it is still widget-oriented employing abstract descriptions of the UI elements as the basic abstraction. Also several of the transformations between models have to be done manually. In contrast, our approach understands user interfaces as a form of communication and involves intentions of the interactions between user and machine for the user interface specification. Also Florins and Vanderdonckt [6] address the problem of creating UIs for different devices. They propose the development of one user interface, called “root interface”, designed for the least constrained platform, and applying some transformations — called degradations — to get the interfaces for the more constrained ones. More constrained means in this context lower resolution screen, fewer widgets available, etc. Still, this unique approach does not seem to
have been automated yet. The work of Eisenstein et al. [4] is in line with ours concerning the importance of high-level UI models for multidevice user interfaces. However, it still relies more on designer assistance than our approach to automatic UI generation. Nylander and Bylund [14] describe user-service interaction in a modality and device independent way. Their approach focuses on the data transferred, while our specifications include intentions of communication. The personal universal controller of Nichols et al. [11] is an approach to generate intermediary user interfaces for complex appliances. This approach models mainly the structure of information based on state variables, commands, their dependencies and grouping, which is used to derive the interface. The decision if information is modeled as a variable or a command determines primarily the widget selection, which is not always suitable for all diverse devices. In contrast, our approach uses mainly the intention of the information conveyed to select widgets. The work on Smart Templates [12] has its focus on improving automatically generated interfaces for different devices. These parameterized templates allow automatic interface generators to create interfaces that are consistent and more usable. They include design conventions of particular devices for facilitating appropriate rendering for a particular device.
3
Communicative Acts
By investigating human language communication, philosophers observed that language is not only used to describe something or to give some statement but also to do something with intention — to act. Most significant examples are so-called performatives, expressions such as “I nominate John to be President.”, “I sentence you to ten years imprisonment.” or “I promise to pay you back.”. In these expressions, the action that the sentence describes (nominating, sentencing, promising) is performed by the sentence itself; the speech is the act it effects. Early and seminal work on speech acts was done by Searle [17]. In this essay Searle claims that “the speaking a language is performing speech acts, act such as making statements, giving commands, asking questions, making promises and so on”. Such speech acts are basic units of language communication in this work, and not tokens, words or sentences as previously believed. Each speech act can be represented in the form F (P ) where F is the illocutionary force (intention) and P its propositional content. For example, the speech acts Sam smokes habitually. Does Sam smoke habitually?
2
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Sam, smoke habitually!
User Interface Unit ID : Integer
have the same proposition P (Sam smoking habitually) but different illocutionary forces F : making an assertion, asking a question and giving an order. Speech acts can be further characterized through:
isShownIn
hasContent
Transition condition : Integer
*
hasContent
* Communicative Act degreeOfStrength : Integer *
1 out * State ID : Integer 1 in *
range *
*
UI Domain Property
UI Domain Class *
*
* domain
• Degree of strength, which describes the intensity and importance of the speech act (e.g., a command and a polite request have more or less the same illocutionary force but different strength); • Propositional content conditions, which describe conditions on the content (e.g., the content has to be about the future); • Preparatory condition (e.g., the receiver must be able to perform the action). Since speech act theory provides a formal and clean view of communication, computer scientists have found speech acts very useful for describing communication and interaction also apart from speech or natural language. To emphasize their general applicability, the notion communicative act is used in this context. Such communicative acts have been successfully used in several applications: interagent communication in Knowledge Query and Manipulation Language (KQML) [5] and FIPA Agent Communication Language (ACL) [7], and electronic commerce and information systems [9, 13]. Due to their rather formal interaction description, communicative acts have also been applied in the development of security protocols [3]. There was much less focus, however, on specifying the content of communicative acts in these languages. Analogously to human-human communication, humancomputer interaction can also be viewed as enacting communicative acts. For this reason and because of the previous uses in other areas, we make use of communicative acts in interaction design specifications and automated generation of user interfaces. However, we do not use them for handling speech or language. Since different kinds of speech acts exist, various classifications have already been made. Usually, there is a distinction made between Assertion, Directive and Commissive, where a further specialization of Directive is, e.g., Question. Such a classification actually represents the various intentions in the sense that a speech of communicative act classified as, e.g., a question has the desired effect of getting an answer. The communicative acts simply abstract from the corresponding speech acts in not relying on speech or natural language.
Figure 1. The metamodel of high-level UI models in UML.
4
High-Level UI Models Based on Communicative Acts
Based on communicative acts, we specify high-level UI models. What such models should look like in our approach is defined in a metamodel. It captures three main concepts used for modeling as well as their relations: • the intention captured by a communicative act, • the propositional content modeled by use of an ontology language, and • the set of interaction sequences modeled with a finite state machine. Figure 1 illustrates the metamodel in the form of a UML class diagram containing metaclasses and their relations.1 The central concept here is the communicative act. As stated above, it carries an intention of the interaction: question, assertion, etc. For the purpose of this paper, we simplify the classification into four specific communicative acts: Closed Question, Open Question, Informing and Response. The first two are specializations of the Question and the other two are specializations of the Assertion communicative acts. Response emphasizes that the content relates to a prior Question, whereas Informing intends to convey independent information or information based on the Response. An instantiation of a Closed Question and an Informing communicative act in our running example is shown on the upper right side of Figure 2. The actual specification is represented in OWL (Web Ontology Language) [18], since it enables us to import existing domain ontologies and link their entities easily with communicative acts. The upper right part in Figure 2 specifies a Closed Question communicative act named RoomSelection, which provides a list of adjacent rooms the visitor can select from and go to. In addition, the figure specifies an Informing communicative act named ExhibitInformation, which conveys 1 At the time of this writing, the specification of UML is available at http://www.omg.org.
3
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
1 ... ...
UI Domain Class
UI Domain Property
... ... Exhibit Exponat
Communicative Acts
UIU
Figure 2. Example of a specification of communicative acts and UI domain objects in OWL. information on the currently selected exhibit with normal priority (degree of strength) to the museum visitor. Our approach actually covers more than the few communicative acts used in the running example. For example Request communicative acts express that the receiver has to act upon the request. E.g., requests from the system to the user can be used to generate modal confirmation dialogues, and requests in the other direction can initiate the execution of system functionality. According to the form of speech acts, the interaction intention as specified by the type of the communicative act, and the propositional content are separated. In the metamodel, the content is represented by UI domain classes and UI domain properties. We prefixed the content entities with ‘UI’ to indicate that the classes and properties model the domain according to the structure presented to the user, which should correspond to the user’s point of view and need not to be identical to the internal system representation of the domain. Since we are using an ontology language (OWL) for domain modeling, properties are also “first class citizens” and therefore shown separately in the metamodel. Treating classes and properties equally allows us to refer to both independently from a communicative act. This has also practical advantages such as allowing a question about some property per se. For example, for the UI domain class property maritalState related to a UI domain class Person, a question about the percentage of married persons can be asked. This is clearly different from a question about a person’s marital status. Furthermore, the content of a communicative act can be the class or property itself instead of an instance of a class or property.2 This results in a discourse about the class or property itself, e.g., talking about what makes up a person in the particular domain/context. Regarding the transformation into user interfaces, a typical 2 Instances of properties are usually equivalent to statements in ontology languages.
application of such a meta-discourse is the generation of help documents. Figure 2 shows on the left side the definition of the UI domain class Exhibit and the UI domain property room. The class definition states that each exhibit must have exactly one room to which it belongs to, and the room property definition specifies that a room is defined by the Room class and that it is applicable to exhibits and floors. The UI domain classes and properties also relate to the actual application logic. Thus, they have to be connected technically to the application interface (e.g., Enterprise Java Beans or CORBA). This is done by implementing technology-specific templates, which describe how UI domain classes and properties get associated with application classes and data. One or more communicative acts are contained in one User Interface Unit (UIU) shown on the lower right side of Figure 2. The UIU is a basic unit presented to the user providing choices of different interaction sequences based on the contained communicative acts. However, it is still an abstract entity — it can be mapped to one or several concrete user interface screens according to concrete device profiles. The set of interaction sequences is modeled with a state machine where each state can have multiple ingoing and outgoing transitions representing segments of the interaction sequences. Each state also implicitly defines and fulfills an associated communicative act’s preparatory condition. After every state transition, the UIU connected to this state is presented to the user. More precisely, this happens while entering this state. In fact, the machine performs actions as specified in the communicative acts: e.g., it presents a question about some domain object — it asks. When the user interacts with the machine through the user interface, e.g., answering a question, a corresponding communicative act will be generated leading to the execution of its action and a subsequent state transition. The transition conditions between the states are defined in a simple expression lan-
4
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Service.home UI Domain Class Description
Service.exhibition
UI Domain Class Implementation Generation
UI Model Assembling
User Interface Code
Rendering
StartState Service.home
ExhibitionState UML State Chart
Service.logout
CA:ExhibitInformation Service.exhibition
State Machine Implementation Generation
Communicative Acts
Device Profiles
Style Guides
Layout Templates
update()
Figure 4. Conceptual Architecture. LoginState
ExhibitState
5
How The Automatic Generation of User Interfaces Works
!update()
Figure 3. Example of a state machine.
guage specified by us. This language is along the lines of Java expressions for conditions. Due to historical reasons we did not use a language like OCL (Object Constraint Language), but this may change in the future. Figure 3 shows a small selected part of the state machine consisting of four states (StartState, LoginState, ExhibitionState, and ExhibitState). While in the StartState, the user gets a choice of available services via Closed Question communicative acts. The transition conditions are made up of communicative acts, UI domain objects, method calls, and logical operators. In our example, all outgoing transitions of the StartState have a simple condition checking the existence of a selected UI domain object (e.g., Service.exhibition) independently of any communicative act. If the specified object is part of the content of any communicative act issued by the user via the user interface, the communication proceeds along the corresponding transition. The outgoing transition of the ExhibitionState checks for a particular communicative act. If the user utters an ExhibitInformation communicative act of the type Response via the user interface, the communication continues in the ExhibitState. Another possibility for transition conditions shown in the outgoing transitions of the LoginState tests the result of data updates in the application logic. In this case, if the username and password set on a particular user object are correct, the communication proceeds to the StartState, otherwise the system issues further Open Question communicative acts for logging in. A specification of a high-level UI model according to this metamodel provides the essence of a user interface to be generated. For its concrete rendering, as sketched below, additional device profiles and style guides are used.
First, we provide an overview on how we get from a user interface model to a concrete user interface. Then we describe the actual user interface rendering and show how it is facilitated by communicative acts. Finally, we describe the rendering rules and the additional information sources: device profiles, user preferences, style guides and heuristics.
5.1
UI Generation Process
From a specified interaction design model according to our metamodel, our generator tool automatically generates concrete user interfaces for diverse devices. The used generation process sketched in Figure 4 is divided into the following four steps, that generate a UI implementation corresponding to the Model-View-Controller Pattern (MVC): 1. Generation of the UI domain class implementations together with their binding to the actual application functionality, 2. Generation of the finite state machine implementation, 3. Assembly of the UI domain information and the communicative acts according to each state, and 4. Rendering of the concrete user interface based on the complete interaction design model. In the first three steps we transform the specification into code by applying a typical code generation process that uses an instantiation of our metamodel as input and applies templates on them. This provides also a mechanism to adapt the UI domain class implementation to the application logic technology (e.g., J2EE or CORBA) by choosing or writing an appropriate template. The forth step, the rendering process is based on the complete interaction design model and is guided by device profiles, user preferences, application-specific style guides and some heuristics. The device profiles contain devicespecific constraints regarding the user interface capabilities
5
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Figure 5. Output on a PC. like the screen size. The user preferences and applicationspecific style guides influence the layout, selection and rendering of widgets. The core of the rendering process are heuristics that mainly guide the widget selection based on, for example, the number, types and properties of communicative acts and UI domain objects. This combination covers a wide range of possible selections of widgets. E.g., a high degree of strength of a communicative act can render a dialogue modal or a label in bold face. The output of the UI generation process is a user interface specifically rendered to each particular device. In our running example, an HTML render engine reads the user interface specification defined in OWL partly shown in Figure 2. The render engine first analyzes the UIU and renders the overall dialogue and each communicative act resulting in the output shown in Figures 5 and 6, generated automatically for a desktop PC and a PDA, respectively.
5.2
Rendering of Communicative Acts
The rendering of the concrete user interface is mainly based on the intentions conveyed by the communicative acts. A screen on a user interface is typically compiled from multiple communicative acts. The overall structure of the screen is based on the UIU, that contains multiple communicative acts to be presented to the user at the same time. Their order defines also the order of the content on the user interface if not specified explicitly. For each block on the user interface defined by a communicative act, the render engine selects the widgets on the basis of the communicative acts themselves and their contents. Different kinds of communicative acts lead to the selection of different classes of widgets. The rendering effects of the communicative acts used in the running example are as follows: Closed Question: The intention of a Closed Question communicative act is getting information from a predefined list of values, that can be specified either intensionally or extensionally. The Closed Question leads
Figure 6. Output on a PDA. to a presentation of a list of UI domain objects in the form of radio buttons, check boxes, menus, tabs, and so on, from which the user can select to answer the question. In the case of specifying the list intensionally by constraining the data type of a UI domain property, the Closed Question will be rendered to spin boxes or input boxes based on the data type and the available widgets on the device and thus allows the user to enter information within a particular value range (e.g., numbers between 0 and 100). In our running example, the RoomSelection closed question is rendered to an image map showing the current room, and each adjacent room is rendered as a selectable icon next to the corresponding exit as shown on the right in Figure 5. Open Question: The intention of an Open Question communicative act is gathering information without restricting it to a predefined set of possible values. Thus, the render engine uses input widgets like text areas or input boxes to allow the user to provide information for particular UI domain objects or properties. Informing: The intention of the Informing communicative act is to provide new unknown facts to the receiver of the message. There is no intention that the receiver acts upon receiving the message. Thus, the render engine chooses some output widgets depending on the content. In a concrete interface this can result in text, image, audio or video for a single UI domain object or in a table or list if the communicative act contains a list of UI domain objects of the same type. For example, the ExhibitInformation communicative act presents the current exhibit — a hippopotamus — selected by the user. Therefore, the render engine generates HTML by using grouping and heading tags for text depending on the type of the object property and image tags referring to device-specific prepared images for graphical properties as shown in the left part of Figure 5 and the right part of Figure 6.
6
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
Other types of communicative acts can result in different rendering or system effects. E.g., request communicative acts by the system should be rendered to modal dialogues requiring the user to act and confirm it by, e.g., ‘OK’ or ‘next’ buttons. A request for action could be a modal dialogue that says: “Please enter CD into drive D”, whereas a request for information could be a modal dialogue that prompts for username and password. A request by the user is enabled by rendering a widget that expresses the initiation of an action like a button or menu. The widget selection is based on type of communicative act, content type, and properties like degree of strength. This combination covers a wide range of possible selections of widgets (e.g., a high degree of strength can render a dialogue modal or a label in bold face). Different types of communicative acts need not to call for different widgets. They can also result in the same widget selection but with different interaction flows. E.g., in an online store ordering books can be achieved by a Closed Question of books or an Offer of books. Both will present a list of books, and the Closed Question can result in asking again if the desired book is sold out, whereas the Offer obliges the system to buy the book. Another possibility to improve the rendering is to consider sequences of communicative acts. Assume, that the system asks a question, expects an answer from the user and provides a possibility for an inquiry (subordinate question), this results in rendering the subordinate communication in a separate dialogue. For example, if the system asks a Closed Question and the user has the possibility to either answer the question directly by selecting a particular value or to ask for more information, then a query for additional information results in rendering the succeeding Response communicative act within a separate help dialogue. The designer has to specify this rule in the style guide.
5.3
Parameterizing the Rendering Process for Multiple Devices
This general rendering process is further parameterized and constrained to generate user interfaces for multiple devices. The transformation of the communicative acts into a concrete user interface is guided by the use of device profiles, user preferences, style guides, and some heuristics. Optionally, layout templates may be used as well. Device profiles contain device characteristics like screen size, available media players, and supported interface toolkits (e.g., HTML versions). User preferences mainly represent information common to multiple applications like language or preferred color settings and the style guides contain default rules that state, for example, that navigational communicative acts have to be rendered to a menu. The optional layout
templates contain a positioning scheme for communicative acts. In our running example, all communicative acts are placed onto one page for a PC as well as for a PDA as shown in Figures 5 and 6. The different order of the communicative acts in both screenshots results from different layout templates for PDA and PC. While page layout is influenced by our templates, widget selection is not. Furthermore, communicative acts provide good hints for pagination of interfaces, since they are selfcontained and can be serialized as seen in human-human communication. This is especially important for devices with limited screen size like PDAs. The device profiles constrain the widget selection and give rendering hints, for example, on the page breaking mechanisms. Furthermore, device profiles are also used to select an appropriate content format. For example, if the content is available as audio, image and text, the device profiles define which formats can be displayed and can initiate format conversions, e.g., resizing images, transforming images from PNG format to JPEG format, and deciding which media player is available and should be included into the generated user interface. In the example, the rendering of audio content is done differently on PDA and PC. On a PDA, the audio content is rendered as a link if no embeddable player is available (this is the case for the emulator R applicaused for generating the screenshot) and as Flash° R plug-in is installed on the client. On a PC, tion if a Flash° the audio content is rendered by embedding the appropriate media player. If the render engine finds user preferences or style guides applicable to a particular communicative act or domain object, these are used to assign the specified layout information to the user interface widgets or, they are used to base the widget selection on them. In our running example, user preferences define, e.g., the language. Style guides are used for left alignment of the exhibit’s text on a PDA and justified alignment on a PC. Finally, the renderer uses rules and heuristics for generating the implementation. For example, the number of communicative acts and UI Domain objects influences the splitting of the UI into multiple pages according to the screen size, and the properties of communicative acts provide hints for styles, like rendering information in bold face if the degree of strength is high. After the compilation of the user interface according to the rules above, the render engine applies additional style information to each widget or group of widgets as defined in the user preferences and application-specific style guides. In our running example, we specify the style guides in form of Cascading Style Sheets (CSS). E.g., we specify a left text alignment on a PDA and a justified text alignment on a PC for the exhibit’s text, for better readability.
7
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
6
Subjective Evaluation
How did you like the MMG?
Was the MMG easy to handle?
0,45
3 MMG is the acronym used there for “multimedia guide”, the not very accurate name of this application as used officially, which is rather a hy-
0,4 0,35
Frequency
We have interviewed four designers, who provided us with subjective evaluations of the approach and the tools implementing it. They do not have very strong formal education or experience in building user interfaces. After a short training period, all designers grasped the modeling concepts and were able to design a user interface for an application using our approach and tools. They state that the design process is intuitive and enables a fast and convenient design of a user interface for diverse devices. The designers also stated, that they need little technical knowledge to model the user interface. More technically demanding are, of course, the linking of the UI domain classes to the application logic, and defining device profiles and style guides. These tasks remain to be performed by experts of software and UI technology. Once available, however, device profiles and style guides can be reused among applications. For an informal subjective evaluation of the generated UIs, let us focus on the application that we also use in this paper as a running example, the museum guide in the KHM. Before it was installed there, of course, our attempt was to find as many usability problems as possible, through usability inspections and a few usability tests. These problems have been resolved mostly by changes in the model. In this industrially funded application, we could not afford to prepare a hand-crafted UI to compare with. The generated UI is installed in the museum and used there. It is this installation that real users there evaluated informally through answers to a few standardized closed questions and free comments. Our point is simply to show that a generated UI can be good enough in terms of usability problems and even be judged well by end users, while usability is said to be a major problem for generated UIs. The evaluation through the museum users is restricted to the UI for PDAs only, since there is still a copyright issue about selling CDs with the PC version pending. For collecting the subjective opinions of visitors of the KHM that used this application rather than the audio guide (or none at all), every such subject was immediately asked upon returning the rent PDA to fill in a subjective questionnaire. (A few subjects used their own PDAs and still filled in the questionnaire.) This questionnaire consists of a few closed questions, where the respondent is asked to select an answer from a choice of alternative replies, with additional space for optional comments in free text. We use a multi-point rating scale, more precisely a Likert scale. Let us focus here on a selection of two questions that are directly relevant to the usability of the generated UI, ignoring here the quality of information contained.3
0,3 0,25 0,2 0,15 0,1 0,05 0 very good
good
medium
less good
not good
Score
Figure 7. Distributions of answers. 1. “How did you like the MMG?” 2. “Was the MMG easy to handle?” The scale for all questions is (very good, good, medium, less good, not good). We have a sample of subjective evaluations by 363 subjects available. They have all been using the same version of the MMG. Now let us summarize the answers to these questions in the questionnaire. Figure 7 shows the frequency distributions of the respective answers to both questions.4 1. A clear majority of the subjects replied to the first question with very good or good. Although a portion of visitors did not find the MMG so good, there is some empirical evidence that it has some appeal to museum visitors. 2. A majority of the subjects replied to the second question with very good or good, and a clear majority with very good, good or medium. While the handling was rated less good than the appeal, there is still some empirical evidence that the handling is at least acceptable to many visitors. With regard to the generated UI itself, a usability problem has been noted with the embedding of the audio player. This problem is due to the generation of the UI implementation. It did not arise because of any inherent issue in the rendering, but technical problems of embedding the audio player on a PDA. Once they will have been solved, this solution will be reused automatically in the future. Other issues are inherent to the device: a PDA is small and has a small screen, both hands are required for its use, permedia guide. 4 For the Likert scale used, no statistics like mean or standard deviation are allowed according to measurement theory. So, we simply show the frequency distributions.
8
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
and it is not ideally suited for elderly people for several reasons. So, we do not think that such an application should replace human guides or audio guides completely. It can rather provide added value to some people. They will probably become more in the future, with more wide-spread use of such devices. A very interesting and positive comment was about the possibility of revisiting museum highlights virtually through the MMG at lunch, which was highly appreciated. One subject noted that (s)he had never had such a great museum guide. Overall, the subjective opinions so far are encouraging and provide some empirical evidence that our approach of generating UIs works in principle. In general, the usability of the UIs generated was assessed informally as good. It is clear, however, that expert UI designers and implementers can still provide better user interfaces, especially from an artistic perspective or in the details. This situation is reminiscent of code generated by compilers, which can often be “beaten” by experienced humans programming directly on the level of machine code.
7
Discussion
Much less technical knowledge about UI design and implementation is needed in our approach than usual. The focus is on what is to be communicated and with which intention. How it is communicated through a graphical user interface is determined by the render engine, based on information inferred from the interaction design specification as described in detail above. Our goal is to let domain experts themselves provide user interfaces through intuitive high-level specifications. We think that communicative acts are very helpful here. Domain knowledge in the form of simple object models should also not pose too many problems. However, the currently used state machines for navigation of the interaction sequences require some special training and technical understanding. In fact, they are more in the spirit of UI design rather than interaction design. Discourse models may be a major step forward in this regard, since they contain fewer technicalities. User errors can be handled appropriately already in the current implementation. Since the render engine has the definition of all properties available, a check of the domain of each property can be done automatically and related errors can be handled by the render engine itself. Other errors must be handled by the application. This requires additional communicative acts in the model and eventually additional states for communication about the errors. Information about the success of enacting communicative acts is used for selecting the transition to the next state. The generality of our approach inherently depends on the expressive power of communicative acts. These abstract
from the corresponding speech acts in the sense that they do not rely on speech or natural language. Speech acts are presumably able to model all spoken (and probably also all written) communication among humans. So, it seems that our approach should, in principle, be able to model all communication between a computer and a human user through a graphical user interface. The expressive power of communicative acts in general is most likely not less than the expressive power of the usual approaches to task modeling. Our implemented approach covers, e.g., also browsing tasks, where Assertion is specialized to Informing. Since we actually employ communicative acts rather than the more restricted speech acts, also the use of other media like audio or video is covered. In fact, our hypermedia guide in a museum makes much use of multimedia. While the rendering in our approach is generally based on many heuristics represented procedurally in the generator itself, our style guides and device profiles provide a declarative way of representing important information, especially for dealing with diverse devices. The purpose of Smart Templates [12] is similar. They include design conventions of particular devices for facilitating appropriate rendering for a particular device. While we separate this information, Smart Templates may well be more advanced. This approach does, however, not include modeling the essence of the UI per se, which is the primary focus of the work that we present in this paper. In effect, these approaches are complementary. Generating UIs based on an abstract definition of the user interface and in combination with knowledge of the capabilities of the target display was listed as a system challenge for ubiquitous and pervasive computing recently [19]. We think that our implemented approach is a major step in this direction.
8
Conclusion
In this paper, we present a new approach to modeling and generating user interfaces. The high-level models include communicative acts, UI domain objects and a finite state machine. From such high-level models, user interfaces can be generated fully automatically. This makes our approach particularly suitable for interfaces of multiple devices, since from one high-level model several user interfaces for diverse devices can be generated efficiently. The key innovation of this work is the use of communicative acts (derived from speech act theory) for the specification of intentions to be conveyed through the user interface. This means more semantic content of the high-level UI models that include communicative acts. It also makes the approach better understandable to humans, since it is closer to their own way of communicating and abstracts from technical details. So, it is possible to provide such
9
Proceedings of the 40th Hawaii International Conference on System Sciences - 2007
specifications with little technical knowledge of building concrete user interfaces. Especially the intentions made explicit in the models through communicative acts also facilitate the fully automatic generation of the real user interfaces that express the desired intentions. For example, it makes a difference whether the machine primarily presents information or asks the user about something. Since certain communicative acts describe their desired effect on the environment deviceindependently, it is possible to use the best set of widgets available on each platform to achieve this effect. Even some style information can be derived from communicative act attributes like degree of strength, e.g., to render content as warning or modal dialogue. Our approach is fully implemented and already in preliminary industrial use. A unique and new hypermedia guide is in operation by the Kunsthistorisches Museum Vienna, which had been created with our approach and tools. In summary, our new approach of using communicative acts in high-level specifications of user interfaces allows fully automatic generation of several user interfaces for multiple and diverse devices. New devices can be supported easily by modeling their specifics once, for all previous and future applications.
References [1] M. Abrams and C. Phanouriou. UIML: An XML language for building device-independent user interfaces. In Proceedings of the XML 99, 1999. [2] T. Browne, D. D´avila, S. Rugaber, and K. Stirewalt. Using declarative descriptions to model user interfaces with MASTERMIND. In F. Paterno and P. Palanque, editors, Formal Methods in Human-Computer Interaction. Springer-Verlag, 1997. [3] P. M. Dung and P. M. Thang. Stepwise development of security protocols: a speech act-oriented approach. In FMSE ’04: Proceedings of the 2004 ACM workshop on formal methods in security engineering, pages 33–44, New York, NY, USA, 2004. ACM Press. [4] J. Eisenstein, J. Vanderdonckt, and A. Puerta. Applying model-based techniques to the development of UIs for mobile computers. In IUI ’01: Proceedings of the 6th international conference on intelligent user interfaces, pages 69– 76, New York, NY, USA, 2001. ACM Press. [5] T. Finin, R. Fritzson, D. McKay, and R. McEntire. KQML as an agent communication language. In CIKM ’94: Proceedings of the third international conference on Information and knowledge management, pages 456–463, New York, NY, USA, 1994. ACM Press.
[6] M. Florins and J. Vanderdonckt. Graceful degradation of user interfaces as a design method for multiplatform systems. In IUI ’04: Proceedings of the 9th international conference on intelligent user interfaces, pages 140–147, New York, NY, USA, 2004. ACM Press. [7] F. for Intelligent Physical Agents. FIPA communicative act library specification. Technical report, Foundation for Intelligent Physical Agents, www.fipa.org, 2002. [8] X. W. Group. Xforms - the next generation of web forms. http://www.w3.org/MarkUp/Forms/, 2005. [9] S. O. Kimbrough and S. A. Moore. On automated message processing in electronic commerce and work support systems: speech act theory and expressive felicity. ACM Trans. Inf. Syst., 15(4):321–367, 1997. [10] G. Mori, F. Paterno, and C. Santoro. Design and development of multidevice user interfaces through multiple logical descriptions. IEEE Transactions on Software Engineering, 30(8):507–520, 8 2004. [11] J. Nichols, B. A. Myers, M. Higgins, J. Hughes, T. K. Harris, R. Rosenfeld, and M. Pignol. Generating remote control interfaces for complex appliances. In UIST ’02: Proceedings of the 15th annual ACM symposium on User interface software and technology, pages 161–170, New York, NY, USA, 2002. ACM Press. [12] J. Nichols, B. A. Myers, and K. Litwack. Improving automatic interface generation with smart templates. In IUI ’04: Proceedings of the 9th international conference on Intelligent user interface, pages 286–288, New York, NY, USA, 2004. ACM Press. [13] M. Nowostawski, D. Carter, S. Cranefield, and M. Purvis. Communicative acts and interaction protocols in a distributed information system. In AAMAS ’03: Proceedings of the second international joint conference on autonomous agents and multiagent systems, pages 1082–1083, New York, NY, USA, 2003. ACM Press. [14] S. Nylander and M. Bylund. The Ubiquitous Interactor: Universal Access to Mobile Services. In Proceedings of HCI International HCII’03, 2003. [15] A. Puerta. A model-based interface development environment. IEEE Software, 14(4):40–47, 1997. [16] A. Puerta and J. Eisenstein. XIML: a common representation for interaction data. In IUI ’02: Proceedings of the 7th international conference on intelligent user interfaces, pages 214–215, New York, NY, USA, 2002. ACM Press. [17] J. R. Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, England, 1969. [18] W3C. OWL web ontology language reference. http://www.w3.org/2004/OWL/, 2004. [19] R. Want and T. Pering. System challenges for ubiquitous & pervasive computing. In ICSE’05: Proceedings of the 27th International Conference on Software Engineering, pages 9–14, New York, NY, USA, 2005. ACM Press.
10