Appl Intell (2007) 26:111–124 DOI 10.1007/s10489-006-0012-4
MyMap: Generating personalized tourist descriptions Berardina De Carolis · Giovanni Cozzolongo · Sebastiano Pizzutilo · Vincenzo Silvestri
Published online: 30 January 2007 C Springer Science + Business Media, LLC 2007
Abstract When visiting cities as tourists, most users intend to explore the area looking for interesting things to see or for information about places, events, and so on. To inform user choice, an adaptive information system should provide contextual information, information clustering, and comparative presentation of objects of potential interest in the area where the user is located. To this aim, we developed a system, called MyMap, able to generate personalized presentation of objects of interest starting from an annotated city map. MyMap combines context and user modeling with natural language generation for suggesting to the user what could be interesting to see and do using as interaction metaphor an annotated tourist map. An evaluation study has shown that the quality of the generated description is adequate compared with human-written descriptions. Keywords Context adaptation . User model . Natural language generation
1 Introduction Providing personalized presentation of information has been one of the main goals of research on adaptive systems: features such as user interests, background knowledge, and preferences were considered to determine, at the same time, the information to be included in the message and its “surface” realization [1–3]. This research field has some deep roots in research on adaptive explanation and adaptive presentation in
B. De Carolis () . G. Cozzolongo · S. Pizzutilo · V. Silvestri Dipartimento di Informatica, Universit`a di Bari, 70126 via Orabona, Bari, Italy e-mail:
[email protected]
intelligent systems [4, 5], and to achieve this aim, it combines user modeling and natural language generation techniques. In this first generation of user-adapted system the adaptation process concerned mainly “static” features of the user: user background knowledge, age, task interests, sex, etc. With the evolution of devices (PDA, mobile phones, car-computers, etc.), network connections (GSM, GPRS, UMTS, WLAN, Bluetooth, etc.), and localization technologies (GPS) for interacting with information services, users can access these services potentially anywhere and at any time [6]. Combining mobile and ubiquitous computing is emerging as a new paradigm with the goal of providing computing and communication services all the time, everywhere, transparently and invisibly to the user, using personal user devices in combination with devices embedded in the surrounding physical environment. The implementation of this paradigm requires not only advances from the technological point of view (i.e. wireless network technologies and devices), but also the development of infrastructures supporting the intelligence of the environments and the discovery and identification of ubiquitous computing applications and services. However, the mobility of the user, and therefore the possibility of interacting with services anywhere and at anytime, requires taking into account new personalization factors related to the context in which the interaction takes place [1, 3]. In this case, the main goal of an adaptive information system is to deliver targeted information to users when they need it, where they need it, and how they need it, that is, in a form suited to users’ situational interests and to the technological context [7]. Context awareness thus becomes a key feature for ensuring an appropriate response of the application to user requests. According to many researchers on ubiquitous computing [8, 9], user location, current activity, emotional state, Springer
112
interaction device(s), time of day, and weather conditions seem to be relevant to personalization. Information about the location, time, and weather, for instance, can be used to contextualize service requests and presentation of results through context-sensitive generation of natural language texts [10, 11], presentation of graphical maps [12], and highlighting of objects available in the surroundings. The activity of users, their emotional state, and the input/output capabilities of the device they are employing may influence the way the information is accessed and presented (for instance, browsing in a large information space, searching for some specific data, or receiving fast and well-focused “hints”). A generic application that aims to achieve this objective requires the implementation of the following capabilities.
r accessing the description of the current context in order r r r
to understand the situation of the user (location, activity, device, etc.); modeling the situational interests of the user in order to use these data to personalize the selection and presentation of information [7, 13]; accessing the description of the domain data in order to select objects of interest and use their representation for generating related information presentation; and generating information presentation accordingly [14, 15].
In this paper, we present a solution to the personalization of information presentation that combines the use of XML annotation for domain knowledge representation, Mobile User Profiles (MUPs) for managing contextualized user preferences and interests, a media-independent content planner for deciding the content and structure of the presentation, and a context-sensitive surface generator for rendering it at the surface level according to the device used. To show how the system works, we use the tourist domain as an example. Indeed, as mobile phones and other portable devices become more advanced, tourism is one obvious application area for mobile information systems. In particular, the Lancaster GUIDE system [16] and other systems based on mobile devices [17–19] are examples of application in this field. When people visit cities as tourists, most users intend to explore the area and to find interesting things to see or information about places, objects, events, and so on. Most of the time, tourists do not make very detailed and specific plans [20], “so that they can take advantage of changing circumstances,” and moreover, when choosing where to go and what to see, they tend to “pick an area with more than one potential facility.” According to these findings, it would be useful to support the user choice with contextual information presentation, information clustering, and comparative presentation of objects of potential interest in the same area. Springer
Appl Intell (2007) 26:111–124
The paper is structured as follows: after the brief overview of related work in the next section, Section 3 illustrates the system architecture. In Section 4, we focus on the process of generating personalized descriptions of places of interest using an annotated town map. In particular, we describe the structure of the map annotation scheme, the role of the MUP, and the generation steps necessary to produce personalized descriptions. Section 5 reports the main results of the first evaluation of the system. Finally, conclusions and future work are discussed in Section 6.
2 Related work As emphasized in Section 1, the “context” is a relevant factor for personalizing information in the tourist domain. According to Dey and Abowd [8], context can be defined as “any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and application themselves.” One of the first examples of a context-aware tourist system is Cyberguide [17]. This system was developed as a prototype of a location-aware application. It was tested at the GVU center of the Georgia Institute of Technology and was used to help visitors during their monthly open-house visits. It can recognize the position of a visitor and what he or she is looking at during these guided tours. The system can answer frequently asked questions raised by users during these visits. Cyberguide is also able to remember where the visitors have already been and what they have already seen, and it can update the user situation, using this information to provide suggestions about what to see next. As Cyberguide is one of the first applications in this category, it has been used mainly to define some guidelines for developing contextaware systems. One of the most relevant results derived from the evaluation of the system consisted is the fact that it is able to know the user’s interests, so as to provide adapted suggestions. This is one of the aspects that we considered when beginning the design and implementation of MyMap. GUIDE is a map-based system able to provide information about the town of Lancaster and its monuments. Visitors can define a path to follow during their exploration of the town using the map as a guide. There is a general overview of the town and detailed maps of local areas. It can be used to send text messages to other visitors, in this way implementing a kind of social dimension. GUIDE communicates with users in a friendly way, using cartoons. As this system was designed to be used in a limited space, such as a museum, a small town, etc., it does not use GPS to calculate the position of users. To avoid problems deriving from difficulties in receiving the GPS signal in built-up zones, GUIDE uses a positioning system based on message transmission.
Appl Intell (2007) 26:111–124
113
Fig. 1 MyMap architecture
A newer approach to the problem is proposed by Deep Map, which is a long-term research framework that aims at building the prototype of an intelligent next-generation spatial information system. Deep Map works as a mobile guide and as a web-based planning tool. The difference between Deep Map and our system is that the former addresses the problem of data storage using GIS and a multimedia database, so it can not only provide two-dimensional (2D) navigation, but support 3D and even 4D navigation facilities to answer questions such as “What was here before?” Another issue addressed is speech recognition, to realize a more natural and intuitive human computer interaction. The system can provide personalized planning, but it does not include any component for modeling the user and therefore it is not able to provide adapted text description of places visited.
3 System architecture To discuss the system design choices, let us consider the following situation: A user is traveling for business purposes. He or she is in the center of a town and requires information about a place using a personal mobile device. He or she wants to know what is going on in that area. In this case, the user is “immersed” in the environment and is presumed to be looking for “context-sensitive” information. One of the most common ways for tourists to request
information about places of interests in a particular town or to plan their visit is to use a map. Thus, the use of an electronic map seems to be one of the best ways to support tourists in their visit without changing the way they usually perform this task. However, if this map is only a graphical representation of the town or place of interest, it cannot be “explained” to the user by an automatic system. To generate user- and context-adapted information about places of interest, the map has to be “annotated” so as to define a correspondence between graphical objects and metadata understandable by the system that must generate a presentation of the information. With this aim, we have developed the system MyMap, whose interface is based on a map that is able to provide proactively (or on request) explanations and suggestions about “what to do” and “where to go.” MyMap, starting from an XML representation of domain knowledge, decides which information to provide and how to present it either after an explicit user request or proactively in the presence of interesting objects, events, and so on. As outlined in Fig. 1, the domain knowledge is stored as a result of an annotation of a graphical map made by a human tourist guide. To make this task easy for a person who may not be an expert in using a computer, the graphical map is annotated with the Inote tool [21]. Then this domain knowledge is used by the Presentation Generator Module to generate natural language explanations and suggestions according to the “user in context” features. The adaptation is possible by taking into account two knowledge sources: (a) the MUP Manager, which maintains a context-aware profile
Springer
114
of the user’s interests, preferences and goals, and (b) the Information about the current context, such as the time of day and weather. In this paper we do not discuss information filtering, context detection, or proactivity issues, but we focus on the process of generating adaptive information presentation while interacting with the city map. Below we present in more detail the methods employed to implement the system. 3.1 Understanding the map Understanding a map means extracting and describing objects of particular interest with their descriptive features. Data annotation is a typical solution to achieve this goal. Since we do not use automatic image feature extraction techniques, the description of the map components, their attributes, and the relationships among them is achieved using metadata. In this case, the map image is annotated in a modalityindependent way using XML as the markup language to encapsulate tourist information. Since the people who annotate the map are experts in providing guidance to tourists and information related to places of interest, to accumulate these metadata we had to use the Java tool Inote [21], which is freely available on-line and provides a user-friendly way of annotating images. Inote allows textual annotations to be attached to an image and stored in an XML file. Inote’s markup language is very general and may be applied to every kind of image [12]. With Inote
r a region of interest, a part of the image, called the “” can be identified;
r each may contain some objects of interest denoted “”;
r each may have attributes; r each attribute is denoted an “” and may be named; and
r a may be associated with every annotation of every detail in order to add the description of that attribute. To tailor Inote to map description, we have defined a parser able to interpret the tags according to the following ad hoc semantics (illustrated in Fig. 2). A map region has some “General Properties” that identify it: the name of the town, the described area, its coordinates, and so on. In a wide region it is possible to identify some areas of interest, denoted “overlays.” The main information content of each overlay consists of a list of details, which correspond to the category of places of tourist interest (eating places, art, nature, and so on). Each place of interest is described by a Springer
Appl Intell (2007) 26:111–124
set of attributes (type, position, etc.) denoted an annotation. Each attribute value is described by a text tag. Below is an example of structure generated by Inote following this scheme. bari-zone4 eating type restaurant name Le Mura coordinates 41◦ 06 14.800"N 16◦ 45 57.013"E view harbor wheelchair accessibility yes ...
The annotation regards an eating facility, in particular, it is a restaurant called “Le Mura.” After this part concerning identification data, a set of attributes is listed: its coordinates, the type of view (on the harbor in this case), and accessibility information. 3.2 Contextual personalization Mobile personalization can be defined as the process of modeling contextual user information which is used to deliver appropriate contents and services tailored to the user’s needs. In a mobile access to tourist information services, the user is “living” the environment. This means that users are presumed to look for “context-sensitive” information about, for instance, a monument or a cheap restaurant that is close to where they are and matches their preferences. In this case, showing the place on the map would help in locating exactly where the described object is, with respect to the users’ position. In this scenario, as far as adaptation is concerned, we can distinguish two categories of factors that should be considered in the design and implementation of a contextual user model.
r Factors that persist during time or evolve slowly. These long-term factors are related to the user’s background knowledge, cognitive capacity, experience, personality, job, gender, age, interests about families of topics, and so on. They may be acquired through standard stereotypetriggering methods [22]. Starting from the activated
Appl Intell (2007) 26:111–124
115
Fig. 2 Illustration of map annotation scheme
r
stereotype, an individual User Profile can be built and updated dynamically along interaction sessions. Factors related to a particular interaction session or situation. These short-term factors depend on the context in which the user is and moves [23].
Context awareness regards three main points: the environment, the task, and the device employed [24].
r The environment in which the application operates is iden-
r r
tified by the physical and social surroundings of the user, his or her emotional state, the location, the activities going on in the environment, and other external properties. The task describes the activity the user is performing or, more generally, the user’s behavior and habits. The device can be a PC, a handheld device, a mobile phone, a TV, a watch, and so on, or a combination of these.
Once the information to be considered in the user model has been defined, another issue that must be taken into account concerns the approach to user model management. 3.3 User model management The user modeling component is crucial for the personalization of services and, therefore, for the presentation of results. The need to allow the user to be free to interact with smart services everywhere and continuously in time causes obvious changes in the way the user modeling component must be designed and developed. Indeed, this task has to be thought of not as a part of a single stand-alone system, but as an independent component able to provide its services to other entities that require them. Possible solutions to these problems are a centralized, a distributed, and a mobile approach [25]. The centralized approach allows the storage of information about a user in a way that is independent of the application domain or of the interaction environment. The main advantage of this approach is to keep information about the user in one place accessible by all applications, avoiding redundancy and fragmentation and providing more secure and
complete cross-application information about the user. The main disadvantage is related to the need for a network connection and to the centralization of computation. In the distributed solution, user information is stored in different servers related to different applications that maintain information about users. In this way the management of the user model is delegated to an application specialized for a specific domain, which stores only the information relevant in a related situation. However, redundancy, incompleteness, and possible lack of consistency of user information (different applications that stores different values for the same user preference) may arise. The mobile approach seems to be very promising in ubiquitous computing scenarios. In this case the user always “brings” the user profile with him- or herself on a personal device, and the user modeling component has the responsibility of sharing and updating the profile when needed. This approach presents several advantages: information about the user is always available and updated and can be accessed in a wireless and quite transparent way, avoiding problems related to consistency of the model, since there is always one single profile for each user. Moreover, storing the user model on the user side and communicating needed personal information to the application when and if needed enforces efficacy and privacy. However, the mobile approach has to deal with the following problems. i. We cannot assume that the user will be in possession of a handheld device (i.e., the interaction could potentially happen using a “key card”), and this type of device, nowadays, still presents hardware-related limits (capacity, computational speed, battery, etc.) ii. Since this component has modeled the user in different situations and when interacting with different applications, it is important to express interests and preferences in a context-dependent way (i.e., “I like eating typical food when I’m abroad”). iii. To share its content with environments that can use these data for personalization purposes, it is important Springer
116
Appl Intell (2007) 26:111–124
An example of a situational statement is the following. fast-food tourist info
Application Remote UM
User trusted server
Ontology
User Device UM
User Device UM – Application Interaction
Fig. 3 UM architecture
to preserve the semantics of data, following the semantic WEB approach [13] and using an ontology-based approach to represent user and context information. As far as the first problem is concerned, we can structure the UM architecture as shown in Fig. 3. In our architecture, the user profile is stored on the user side and communicates the needed personal information to the application. In particular, when the user interacts with an ubiquitous information service, the Device UM component asks a Remote UM for the data that are relevant to personalize the interaction for that particular service and passes them to the application. The Remote UM component has been introduced to overcome the location problem of the “user side”; it can be on an external server, in the user’s car, in a wheelchair, or in the user’s handheld device, according to the computational power of the device. As long as the user expresses preferences, the user model is updated, and if some preferences are changed, or a different one is directly or indirectly expressed, the new preference is inserted in the MUP. The preferences inferred by observing the user are updated using a usage model [26]. As far as representation is concerned, besides considering static long-term user features (age, gender, job, general interests, and so on), it is necessary to handle information about more dynamic “user in context” features. Instead of defining a new ontology and language for describing mobile user profiles, we decided to adopt UbisWorld [7] language as the user model ontology of our modeling component. In this way we have a unified language able to integrate user features and data with situational statements. Furthermore it supports privacy settings, allowing us to manage privacy issues, which are very important in situated interactions. This language allows represention of all concepts related to the user by means of the UserOL ontology [7], to annotate these concepts with situational statements. The situational statements may be transferred to an environment only if the owner/user allows this, according to privacy settings. Springer
User preferences, interests, etc., are collected in two ways:
r using a graphical interface (Fig. 4) in which the user can r
explicitly insert his or her preferences and related privacy settings regarding particular domains; and deriving other information (i.e., temporary interests) from user actions or from another knowledge basis (i.e., user schedules, agenda, etc. [23, 27]).
Fig. 4 MUP management interface
Appl Intell (2007) 26:111–124
User feedback and actions in the digital and real world may reproduce changes in the user model. The MUP manager observes the user actions: when new information about the user can be inferred, it updates or adds a new slot in the MUP and sets the “confidence” attribute of that slot with an appropriate value that is calculated by the weighted average of all the user actions having an impact on that slot. The confidence attribute may be set to low, medium, or high.
4 Generating context-sensitive information The architecture of the Presentation Generator component is based on the pipeline model of Natural Language Generation (NLG) systems [28]. Based on the user implicit or explicit input, the system interprets the input request in terms of the communicative goal(s) to be achieved in a selected domain (tourist information in this case). According to the selected communicative goal, the Presentation Generator component plans what to communicate to the user and decides how to render it according to the interaction context. In this case, situational user preferences play an important role in order to adapt the description of object to the situation. As has been proven in previous research on language generation (e.g., Refs. [11, 12] and [14]), user-related information can be used to constrain the generator’s decisions and to improve the effectiveness of the generated text. Such information is useful at any stage of the generation process: i. for the selection of relevant knowledge; ii. for the organization of information presentation (the organization strategies or plans can have preconditions dependent on user information); and iii. for the surface realization (by using words which depend on the context). Below we examine in more detail how these three steps have been designed and implemented in our system. 4.1 Selecting relevant knowledge Let us consider the following example: suppose the user is traveling for business reasons and, during the lunch break, he or she is visiting the center of the town. Information about places of interest close to where the user is will be emphasized on the interactive map running on the personal device. In this case, the Presentation Generator will ask the MUP manager to select situational statements where “time of day = lunch time” and “reason of travel = business purposes” and user “location = town-center.” In the set of selected statements, the one with the highest confidence value will be chosen. The MUP Manager will infer in the described context that the user prefers to eat
117
something fast but in a place with a nice view of the town center. Then, according to this preference, the Presentation Generator will select, in the XML description of the map, all places () of category “eating” being “fast-foods” with coordinates that show that the place is relatively close to the user position (within 500 m). Moreover, the system will check for other features matching the presumed user preferences (i.e., view = “historical center”). Then a new XML structure containing the selected places will be generated to be used for the presentation. Selected items are then ordered on the basis of the number of matched user features. As the user moves, the map is updated, as well as the context information.
4.2 Organizing the information presentation There are several computational approaches to planning “what to say” when presenting information. Important milestones in this research field are the introduction of text schemata [29], Rhetorical Structure Theory (RST) as formulated by Mann and Thompson [30], and an operationalization of RST by the application of a traditional top-down planner [31] to the discourse planning [15]. One of the first text generators that adopted a schemabased approach to defining discourse structure was TEXT [29]. The system used schemata, predefined representations of a stereotypical paragraph structures, as templates to decide the content and order of sentences in a text. Coherence was achieved by the correct nesting and filling of a schema. Though the schemata remain a clear and simple method of generating multisentence texts today, they lack representation ability of the purpose for each utterance in the text. Without such information, the system cannot replan any portion of its text. Moreover, if the communication was not successful, schemata cannot motivate its decision in the composition of texts. This shortcoming cripples any system that must be able to assemble a text dynamically and to reason about it, such as interactive explanation generators or presentation generators. Thus, a method for planning text structure dynamically is needed. A theory of text structure useful for text generation is the RST. It postulates a set of relations to represent relations that hold within sentences of English text. The relations are used recursively, relating ever-smaller segments of adjacent text, down to the single-clause level; RST assumes that a paragraph is coherent if all its parts can eventually be made to fit into one overarching relation. Most relations have a characteristic English cue word or phrase which informs the hearer how to relate the adjacent clauses. Most relations contain two parts: a Nucleus, the central part of a sentence; and a Satellite, the ancillary qualifying Springer
118
Appl Intell (2007) 26:111–124
material. For example, the Elaboration relation schema is as follows. RELATION NAME: ELABORATION CONSTRAINTS ON NUCLEUS: none CONSTRAINTS ON SATELLITE: none CONSTRAINTS ON THE NUCLEUS AND SATELLITE COMBINATION: The Satellite clause presents additional details about the situation or some element of subject matter which is presented or inferable from the Nucleus clause in one of the following ways (Nucleus listed first) [set:: member; abstract :: instance, whole ::part; process :: step; object :: attribute; generalization :: specific ]. EFFECT: The reader recognizes the situation presented in the Satellite as providing additional detail for the Nucleus. LOCUS OF THE EFFECT: Nucleus and Satellite Planning, however, is a heavy computational task. Considering the need for dealing with real-time interaction on a small device, our approach is based on the idea of using a library of noninstantiated plan-recipes (a kind of schema with applicability conditions and achieved goals) expressed in an XML-based markup language: DPML (Discourse Plan Markup Language [32]). DPML is a markup language for specifying the structure of a discourse plan based on RST: a discourse plan is identified by its name; its main components are nodes, each identified by a name. Attributes of nodes describe the communicative goal and the rhetorical elements: role of the node in the RR associated with its father (nucleus or satellite) and RR name. For instance, Fig. 5 illustrates a tree representing the noninstantiated plan for achieving the communicative goal of describing an object. The nodes indicated by a bullet are nucleus; the others, satellite. The DPML description of the n2 node in Fig. 5 provides information about (i) the node goal (Describe (Gen Features,obj1)), (ii) the role in the RR relation holding between the children nodes departing from it (nucleus), and (iii) the RR name (Elaboration Member Set). Then, for describing the general features of an object in the map, the system has to inform about the object name and the category to which it belongs. The XML-based annotation of the discourse plan is motivated by two factors. Fig. 5 A DPML structure for achieving the goal of describing an object
a. In this way, it is possible to build a library of standard explanation plans, which can be instantiated when needed and can be used by different applications in several contexts. b. XML can be easily transformed through XSLT in another text language (for instance, HTML) or in another scripting language (using, for instance, a TTS). In this way the adaptation to different context and devices is favored. Once a communicative goal has been selected, explicitly as a consequence of a user request or implicitly triggered by the context, the Information Presenter selects from this library the plan that best suits the current situation. The generic plan is, thus, instantiated by filling the slots of its leaves with data in the XML-domain-file that is associated with the map to be described. In this prototype we consider the following types of communicative goals:
r Describe(S, U, x), where S is the System, U is the user,
r
r
and x is a single object to be described. Triggering this goal will produce, at the surface level, sentences such as, “La Locanda di Federico is a fast-food restaurant located 100 m right, in Piazza Ferrarese. It offers a nice view of the historical center. . . .” DescribeArea(S, U, list of(zi )), where list of(zi ) represents a list of objects of interest belonging to different categories. An example of sentence that will be generated when this goal is activated is, “In this area there are 3 restaurants, 2 churches and one exhibition to visit.” Describe(S, U, list of(yi )), where list of(yi ) represents a set of objects of interest of the same type (i.e., restaurants) to be described. This goal, when triggered, will be used to generate sentences of this type: “In this area there are 3 restaurant that you may like: Vini e Cucine, Le Travi, La Locanda di Federico.”
Considering the latter example, the Presentation Generation will select the plan which corresponds to the goal Describe(S, U, list of(yi )) for listing the eating facilities matching the user preferences. Then it will instantiate the parameters of the plan with the selected data (fast-food restaurants close to where the user is, with a nice view and open at the current time).
< node nam e=" n2" goal=" Describe( Gen_Featu res,o bj 1) " role=" nu cleus" RR=“ ElabMem berSet ">
Describe(obj1) RR:ElabGenSpec
n2: Describe(GenFeatures, obj1) RR:ElabMemberSet
Describe(SpecFeatures,obj1) RR:Joint
InformName
InformType InformRel_position
Springer
InformView …
Appl Intell (2007) 26:111–124
A small portion of the XML-Instantiated-Plan that was generated for describing some eating facilities in the area is as follows:
...
This plan first presents general information about the existence of open fast-food places, then it lists them, describing their main features in detail.
119
4.3 Rendering the map object description Adaptation of layout (visible/audible) should support alternative forms of how to present the content, navigational links, or presentation as a whole. The appropriate transformation technology, especially when considering standard initiatives, is obviously XSL transformation (XSLT) in combination with DOM (Document Object Model) programming. XSLT is an effective way to produce output in the form of HTML or any other target language. Rule-based stylesheets form the essence of the XSLT language and build an optimal basis for the introduced adaptation mechanism. The surface generation task of our system is then very simple. a. Starting from the instantiated plan, the appropriate template is applied. This process is mainly driven by the type of communicative goal and by the RRs between portions of the plan. b. The plan is explored in a depth-first way; for each node, a linguistic marker is placed between the text spans that derive from its children, according to the RR that links them. For instance, the description, “There are 3 fast foods in this town area” in Fig. 6, is obtained from a template for the Describe(Ag, U, list of(yi)), where the Ordinal Sequence RR relates the description of the single objects in the list. We defined the template’s structure after an analysis of a corpus of town-map web sites. At present, we generate the descriptions in HTML; however, our approach is general enough to produce descriptions in different formats and, therefore, for different interaction modalities [33]. In the example in Fig. 6, the Information Presenter will display to the user a web page structured as follows. i. On the left side the portion of the map of the town area where the user is located and the graphical indications (icons denoting different categories of objects) about places of interests are displayed. ii. On the right side a description of those objects is provided. iii. At the bottom, when the user selects one of the objects in the list, a detailed description of the selected object is displayed. The user may access the same information directly by clicking on the icons on the map. Looking in more detail at the proposed information could be considered as positive feedback in building usage models. However, while this is important in the case of nonmobile information systems, when the user is moving in a real space this is not enough. In this case, the digital action should be reinforced by the action in the real world: going to that place. Springer
120
Appl Intell (2007) 26:111–124
Describe-Set(where_to_eat, area1)
Elaboration Describe(set, restaurants)
OrdinalSequence Inform(existence(restaurants)) Describe(attr, restaurants) Joint
InformName
InformView InformType InformRel_position
“ Ther e ar e 3 re st a ura n t s in t his area: Le M ura , t ypical re st a ura nt , 1 0 0 m t . from where you are, Via Venezia ( La Muraglia) close t o Mer cant ile Square.Nice vie w on t he ha rbor, …. La Loca nda di Fe de rico, t ypical host e ria , 4 0 0 m t . from you, a cce ssible w it h a w he e lcha ir; Arga na , a re st a ura nt offering t y pical food from Marocco, 2 0 0 m t .from you.”
Fig. 6 List of eating places in the area of town where the user is located
We are still working on this issue since it is important to consider contextual events that may discourage the user from eating in that place (i.e., the restaurant is full). At present, we ask users to deal directly with this kind of feedback.
5 Evaluation Many NLG systems have been created to produce texts in various application areas [3, 5, 14, 15]. However, only a few of them have been evaluated [34–37]. As in many of these evaluation studies, in this phase of our work we are more concerned with studying the feasibility of the proposed approach and methods employed than with evaluating the effectiveness of the generated description. At this stage we have performed only an evaluation of the generated text against the descriptions present in a Bari tourist guide and the results show a good level of similarity. However, this does not show any evidence that contextual information provision is more effective than noncontextual provision. This will be the aim of our future user studies. After this study, if there is evidence that contextual information provision is effective, we will concentrate on the generation of comparative descriptions of places of interests in the same area. Then, since this phase of the project was to focus on the adaptive generation of descriptions of objects and places of interest on a map starting from data annotation, in the evaluation phase we designed an evaluation study aiming at assessing the quality of the generated text. We used an “intrinsic” evaluation method [36]. In this type of approach human Springer
Fig. 7 MyMap running on an Ipaq handheld
judges are asked to rate the quality of the text in terms of accuracy, fluency, and correctness. The experiment was set up as follows. We used a betweensubjects design study in which two groups of 11 Italian people coming from towns different from Bari were asked to interact with MyMap. The first group, denoted Group A, was offered humanedited descriptions that we found in the Michelin guide. The second group, denoted Group B, was asked, first, to state their preferences, through the MUP interface, regarding eating places and artistic monuments. Then each of them was asked to look at the description of three different places,
Appl Intell (2007) 26:111–124
121
Table 1 Comparison of MyMap generated text versus the Michelin guide MyMap XML annotation
MyMap generated description*
Michelin guide
eating type restaurant name Lo sprofondo address Corso Vittorio Emanuele 111 telephone 0805213697 closing days from 9 to 20 August and Sunday, in JulyAugust closed also Saturday lunchtime average-price 30 euro style rustic food type traditionalseafood pizzeria eating outside summer facilities wheelchair accessibility smoking dogs credit cards location central dowtown close to the historical center
Lo Sprofondo is a restaurant located 200 m north in Corso Vittorio Emanuele. In particular, it is a rustic style restaurant located in central downtown close to the historical center of Bari. Since you prefer to eat traditional food when you are in another town and in this period it is possible to eat outside, you will probably find it interesting. Moreover, it is a nonsmoking restaurant and the average price is lower than your budget limit. *Description generated for a user with the following features:
LO SPROFONDO Corso Vittorio Emanuele 111 (BA) Tel. 080/5213697 Closed: closed from 9 to 20 August and Sunday, in July–August closed also Saturday lunchtime. Meal prices: menu 27/35 Cusine: typical and seafood Comments: Nice rustic tone, in this central restaurant, with a wood stove in the evening pizzeria and traditional dishes; pleasant outside in summer. Facilities: Restaurant and pizzeria
r Food preference: r r
traditional when in other towns Smoking: no Budget limit restaurant: 40 when traveling for business
belonging to the above-mentioned categories, on the map (Table 1). The evaluators were asked to answer a set of questions (Fig. 7), giving a score of from 1 (bad/low) to 5 (good/high). The graph at the bottom shows the ratings for the different questions, where the first column indicates Group A’s average rating and the second column indicates Group B’s average rating. As the “Overall” columns summarize, on average the generated text was evaluated positively, and as shown, the provided content was evaluated as more adequate for the users’ preferences and interests by Group B. In terms of acceptability and usefulness of the description, the MyMap generated description was as good as those found at human-edited web sites. The results of the t-test of description usefulness, coherence, and adequacy show a p value of about 0.4 for humanedited versus generated. These results show that on average the difference between the two sets is not significant. Thus,
we may deduce that the users regarded the quality and usefulness of the generated descriptions as being almost-equal to those of the human-edited texts. 6 Conclusions and future work In this paper, we have described the prototype of a system able to generate context-sensitive descriptions of objects of interest present on a map. Even though we have selected mobile tourism as the application domain in which to test our approach, the system architecture and employed methods are general enough to be applied to other domains. Moreover, the use of XML content modeling and domain-independent generation methods allows the system to deal with the adaptation of the presentation modality. In this way, the provided information can be easily adapted to different devices and to the needs of users with disabilities.
Springer
122 Fig. 8 Results of the evaluation process
Appl Intell (2007) 26:111–124
Group A (Mean) Group B (Mean) t-test a=0.05 3.04 3.19 0.30
Overall Coherence and organization
3.22
3.11
0.39
Usefulness Adequacy
3.11 2.78
3.22 3.22
0.40 0.39
5.00 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 Overall
The system has been implemented in JAVA- and XMLrelated technologies. We did the testing on an iPAQ h5550 and a h3970 without GPS (Fig. 8). We simulated the user location with an interface for managing context features. In the first evaluation phase of the system, we studied the feasibility of the described approach. In particular, we performed an evaluation of the generated text versus the descriptions present in a Bari tourist guide, and the results showed a good level of similarity. The aim of our future studies is to test and evaluate the effectiveness of the generated description. In the case of effectiveness of contextual information we will concentrate on the generation of comparative descriptions of places of interests in the same area. As a further development we would like to investigate the impact of social and engaging aspects of tourism on the effectiveness and usability of an “intelligent electronic map” since tourism itself is a social experience that most people share with others. Social aspects of tourism range from group planning to sharing of the experience during and after the visit. Users will be able to learn if there are other people with the same interests near them, or just friends, as in instant messaging systems. They can annotate maps with personal information, such as “ I met a friend here” or “This restaurant is very bad,” and share them with other people. In this view, in addition to the proactive, context-aware, and goal-oriented support, MyMap will enhance the sense of presence by adding to the intelligent map a social space in which users may communicate and share their experiences with other tourists who are currently there physically, tourists who have been there at another time, and friends or family members at home.
Springer
Coherence and organization
Usefulness
Adequacy
Acknowledgments Research described in this paper is an extension of the work we performed in the scope of the ARIANNA project. We wish to thank those who cooperated in implementing the prototype described here: especially, Gloria De Salve and Isabella Liotino. In particular, we thank Fiorella de Rosis for her useful comments on this work.
References 1. Ardissono L, Goy A, Petrone G, Segnan M, Torasso P (2002) Ubiquitous user assistance in a tourist information server. In: Lecture Notes in Computer Science no. 2347. 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems (AH2002), Malaga. Springer Verlag, New York, pp 14–23 2. Brusilovsky P (1996) Methods and techniques of adaptive hypermedia. UMUAI 6(87):129 3. Wilkinson R, Lu S, Paradis F, Paris C, Wan S, Wu M (2000) Generating personal travel guides from discourse plans. In: Brusilovsky P, Stock O, Strapparava C (eds) Adaptive hypermedia and adaptive web-based systems international conference, AH 2000, Trento, Italy, August. Proceedings LNCS 1892, pp 392 ff 4. Paris CL (1988) Tailoring object descriptions to a user’s level of expertise. Comput Linguist 14(3):64–78 5. De Carolis B, de Rosis F, Grasso F, Rossiello A (1996) Generating explanations about drug prescription addressed to different recipients. AI Med 8 (in press) 6. Weiser M (1991) The computer for the 21st century. Sci Am 7. UbisWorld; available at: http://www.u2m.org 8. Dey AK, Abowd GD (1999) Toward a better understanding of context and context-awareness. GVU Technical Report GIT-GVU-9922. College of Computing, Georgia Institute of Technology, Atlanta 9. Baldauf M, Dustdar S, Rosenberg F (2006) A survey on contextaware systems. Int J Ad Hoc Ubiq Comput (in press) 10. Geldof S, Van de Velde W (1997) Context-sensitive hypertext generation. In: Working notes of the AAAI 97 Spring Symposium Workshop on Natural Language Processing for the Web, Stanford University, Stanford, CA, pp 54–61 11. Paris C (1993) User modeling in text Generation. Pinter, London/New York
Appl Intell (2007) 26:111–124 12. De Salve G, De Carolis B, de Rosis F, Andreoli C, De Cicco ML (2000) Image descriptions from annotated knowledge sources. IMPACTS in NLG, Dagsthul 13. Sinner A, Kleemann T, von Hessling A (2004) Semantic user profiles and their applications in a mobile environment. In: Artificial Intelligence in Mobile Systems 14. De Carolis B, de Rosis F, Pizzutilo S (1997) Generating useradapted hypermedia from discourse plans. In: Fifth Congress of the Italian Association of Artificial Intelligence (AI*IA 97), Rome 15. Moore J, Paris C (1993) Planning text for advisory dialogues: capturing intentional and rhetorical information. Comput Linguisti 19(4):651–694 16. Cheverst K, Davies N, Mitchell K et al (2000) Developing context-aware electronic tourist guide: some issues and experiences. In: Proceedings of CHI’2000, the Netherlands, pp 17– 24 17. Abowd GD, Atkeson CG, Hong JI, Long S, Kooper R, Pinkerton M (1997) Cyberguide: a mobile context-aware tour guide. Wireless Networks 3(5):421–433 18. Malaka R, Zipf A (2000) Deep Map—challenging IT research in the framework of a tourist information system. In: Fesenmaier DR, Klein S, Buhalis D (eds) Information and communication technologies in tourism. Springer-Verlag, New York, pp 15–27 19. Pan, Bing, Fesenmaier DR (2000) A typology of tourism related web sites: its theoretical background and implications. In: Fesenmaier DR, Klein S, Buhalis D (eds) Information and communication technologies in tourism 2000. Springer-Verlag, New York, pp 381–396 20. Brown B, Chalmers M (2003) Tourism and mobile technology, In: Kuutti K, Karsten EH (eds) Proceedings of the eighth European conference on computer supported cooperative work, Helsinki, Finland, 14–18 September 2003. Kluwer Academic Press, New York 21. Inote: Image Annotation Tool 22. http://jefferson.village.edu/iath/inote.html 23. Rich E (1979) User modeling via stereotypes. Int J Cognit Sci 3:329–354 24. Cavalluzzi A, De Carolis B, Pizzutilo S, Cozzolongo G (2004) Interacting with embodied agents in public environments. AVI 2004:240–243 25. Dey AK, Abowd GD (2003) Support for adapting applications and interfaces to context. In: Seffah A, Javahery H (eds) Multiple user interfaces: engineering and application frameworks. John Wiley and Sons, New York 26. Kobsa A (2001) Generic user modeling systems. UMUAI II(1– 2):49–63. 27. Fink J, Kobsa A (2002) User modeling in personalized city tours. AI Rev 18(1):33–74 28. Cozzolongo G, De Carolis B, Pizzutilo S (2004) Supporting personalized interaction in public spaces. In: Baus J, Kray C, Porzel R (eds) Proceedings of the artificial intelligence in mobile systems 2004, Nottingham, UK 29. Reiter E, Dale R (2000) Building natural language generation systems. Cambridge University Press, Cambridge 30. McKeown K (1985) Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, Cambridge 31. Mann WC, Matthiessen CMIM, Thompson S (1989) Rhetorical structure theory and text analysis. ISI Research Report 89-242 32. Sacerdoti E (1974) Planning in a hierarchy of abstraction spaces. AI 5(2):115–135
123 33. De Carolis B, Pelachaud C, Poggi I, Steedman M (2004) APML, a mark-up language for believable behavior generation. In: Prendinger H, Ishizuka M (eds) Life-like characters. tools, affective functions and applications. Springer, New York 34. Klante P, Kr¨osche J, Boll S (2004) AccesSights—a multimodal location-aware mobile tourist information system. ICCHP 2004:287–294 35. Colineau N, Paris C, Vander Linden K (2002) An evaluation of procedural instructional text. In: Proceedings of the International Natural Language Generation Conference (INLG) 2002, New York, pp 128–135 36. Callaway C, Lester J (2001) Evaluating the effects of natural language generation techniques on reader satisfaction. In: Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society, Edinburgh, UK 37. Hartley A, Scott D, Kruijff-Korbayouva I, Sharoff S, Teich E, Sokolova L, Staykova K, Dochev D, Cmajrek M, Hana J (2000) Evaluation of the final prototype. Technical report, Brighton University, Brighton, UK 38. Miliaev N, Cawsey A, Michaelson G (2003) Applied NLG system evaluation: FlexyCAT 9th European Workshop on Natural Language Generation (in conjunction with EACL 2003), Budapest, Hungary 39. Hovy E (1988) Generating natural language under pragmatic constraints. Lawrence Erlbaum Associates, Hillsdale, NJ
Berardina De Carolis is Assistant Professor and Researcher at the Department of Computer Science, University of Bari, Bari, Italy. Her main interests lie in the field of intelligent interfaces: in particular, user modeling, adaptive interfaces, and natural language generation. She is involved in several research projects concerning ambient intelligence, ubiquitous computing, and human-robot interaction.
Giovanni Cozzolongo graduated in computer science in 2003. Since December 2003 he has been a temporary researcher at the Department of Computer Science, University of Bari, where he collaborates with the Intelligent Interface research group; in 2005 he became a Ph.D. candidate. His research interests are group modeling, ambient intelligence, multiagent sytems, and human-computer interaction.
Springer
124
Sebastiano Pizzutilo is Associate Professor at the University of Bari. Currently he is teaching Computer Architecture and Distributed Systems courses in the Curriculum of Informatics, Department of Computer Science. He is also participating in several national and European Economic Community research projects on human-computer interaction and DCE. He is author of several publications, and at present his research interests include DCE methods and technologies, formal methods for evaluating user-adapted interfaces, agent theories, and languages.
Springer
Appl Intell (2007) 26:111–124
Vincenzo Silvestri is a master student at the Department of Computer Science, University of Bari. He works as a programmer for the Intelligent Interface research group. His research interests are humancomputer interaction, object-oriented programming, and distributed systems.