Increasing Performances and Personalization in ... - Semantic Scholar

Increasing Performances and Personalization in the Interaction with a Call Center System Federica Cena

Ilaria Torre

M3Lab CSP Via Livorno 60, 10144 Torino (It) +39 011 4815700

Computer Science Department University of Torino Corso Svizzera 185 – 10149 Torino (It) +39 011 6706827

[email protected]

[email protected]

ABSTRACT This paper describes the innovative combination of speech recognition and personalized response generation with the adaptive routing of calls to the operator which best fits the caller's features. The project aims at supporting the user incrementally, starting from a personalized automatic support and moving to a proficient human one, when it is needed. In particular the paper shows the adaptive workflow of the answering process and focuses on the principles for providing the personalized speech response.

Categories and Subject Descriptors H.5.2 [Information Interfaces And Presentation (e.g., HCI)]: User Interfaces - Interaction styles (e.g., commands, menus, forms, direct manipulation), Voice I/O; J.7 [Computers In Other Systems]: Consumer products.

General Terms Human Factors

Keywords Adaptation, VUI voice user interface, speech recognition, automatic response, calls routing management, call center.

1. INTRODUCTION Interacting with a Call Center system is a common experience. The organizational and technological transformations in this field, over the last few years, have produced a non homogeneous scenario where advanced systems based on automatic-speechrecognition and automatic-speech-generation coexist with those based on touch-tone technology and pre-recorded answers, and finally with traditional systems based on human-desktopoperators. The reason regards mainly two categories of problems:

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright is held by the author/owner(s). IUI’04, Jan. 13–16, 2004, Madeira, Funchal, Portugal. ACM 1-58113-815-6/04/0001

costs and quality of interaction. As clear, the above solutions suffer each one of at least one of these problems: costs affect mainly the solution based on operators, while the problem of lack of quality of the interaction regards all the self-service solutions, such as touch-tone and ASR - automatic speech recognition. The last one in particular, self-service with voice recognition (i.e. the Interactive Voice Response systems, IVRs, which aim at creating a human-link context of interaction, and thus at being deployed for simple inquires as well as for more complex requests), shows several problems: i) standardized and cool answers in correspondence of predefined categories of questions; ii) inability to address the different needs of different users (e.g. novice users, unsatisfied customers, high value customers, etc.), iii) high probability of misunderstandings, iv) failure to manage complex inquiries and special cases, and to face unexpected situations. In our prototype we have provided some solutions to these problems. In particular, for the first group of problems - from i) to iii) - we think that a significant improvement could be brought by making the automatic voice responses personalized, adapting the dialog to the user’s characteristics. For example, a first way to mitigate problem i) could be modifying the formulation and style of sentences according to user features. Problem ii) is very complex, as it requires to define and identify the categories of users with special needs and specific features and modify the kind of interaction. Examples of contributions to this problem regard the personalization of the VUI by adding hints to the system prompts, adapting the level of proactiveness of the answer, varying its level of kindness, etc. Furthermore, a way to decrease the risk of misunderstandings, using personalization, is taking into account language, social expressions and conventions, scholarship level and knowledge of domain. Moving to the last problem, namely the failure of IVRs in managing complex inquiries and special cases and in facing unexpected situations, personalization cannot be an effective instrument and, in our view, there are no possibilities of automatic solutions. This imposes to switch the response infrastructure and to route the call toward a human operator. However, this solution has another trouble: the step of switching the call to an operator can be experienced by the user as a system failure, as a waste of time, since (s)he has to repeat her/his question again and finally it can cause the disorientation of the user, especially if there is a change in the way answers are built. The solution we have experimented to address this problem can be labeled as an

“adaptive context switch” which aims at integrating the two phases of the answering process and is accomplished by routing the call to the operator which best fits the specific caller, according to some rules and heuristics of reasoning. In summary, the goal of this prototype can be described as the attempt to find a solution to combine the advantages of flexible self-services with those of employing human agents, increasing the quality of both of them and supporting the user incrementally.

2. USER PERSPECTIVE OF AN INBOUND CALL In this section we are going to show an example of interaction, seen from the user’s point of view, which has been the starting phase of the project (our prototype regards the Call Center of a bank; thus the examples, the adaptation rules, the features of the users models and also the grammars for automatic speech recognition are specific for this application domain). Mr Rossi is a middle-age plumber living in the north-east of Italy. He is a middle value but loyal customer for the bank. He dials the free number of the Call Center of his bank from his mobile. He is not very familiar with ATM and often calls for asking information about his account. When he dials up he is welcomed by the automatic response system: ‘Good morning Mr Rossi. As usual I need to know your access code or your keyword’. ‘Maccheroni’. ‘Very kind Mr Rossi. Do you want to know your account balance?’ ‘No, I want to know my..eh.. left money for this month’. ‘Do you want to know the available cash on your account?’ ‘No, I told you the left money for the this month’. ‘Thanks Mr Rossi, an operator is going to answer you’. [A middle age voice from northern Italy] ‘Good Morning Mr Rossi do you want the available cash limit for your bank card for this month?’ ‘Yes’. Etc… ‘Come and visit us, we remind you that your bond will expire on the 21 of this month’. The above example of interaction will probably look very simple and easy; indeed looking natural to the user’s eyes is quite the objective of the project. However, typically, the more natural the interaction looks like and the greater is the need of reasoning by the system. In the next section we are going to present the components that are involved in the interaction and the architecture of the system.

3. SYSTEM PERSPECTIVE OF AN INBOUND CALL The core of our application is the Response Manage (RM) a software agent toward which the Communication server routes the traffic. It controls the flow of the call and the dialog among the modules which manage: i) the Voice User Interface - VUI -, composed by the Automatic Speech Recognition engine (ASR), in charge of understanding natural language user input, and the Response Generation agent (RGA), in charge of prompting, providing menus and answering to the user; ii) the routing of the call, accomplished by the Routing agent (RA). Whenever a call is received, the system checks the calling number. If it is not recognized, then the user is taken into account as a new one, otherwise, there are two options: the caller is a customer or (s)he is using the phone of another customer. To manage this second possibility, in order to avoid wrong forms of personalization and for security reasons, the system asks for the pin code or the keyword. Then it compares the answer with the recorded pin code or keyword, and thus it can authenticate the

caller with a low margin of error. Consequently, there are two types of dialog interactions: standard, for non-customer callers (new callers or callers whose voice does not match the prerecorded one) and personalized, for known customers, based on her/his model. In the example of interaction above, the number is recognized. Consequently the system loads the model of the caller and the Response Generation Agent, through an inferential engine, exploits the information concerning the user’s name, age, number of previous accesses and the last request(s) to produce a first form of personalization. In particular it uses age for the style (formal or informal) of the welcome formula, the number of accesses and the scholarship level (if present in the user model) for estimating the level of experience of the caller in the use of the application. In the example, the user is a middle age man with a secondary school degree which is usual to dial for asking information about his account, so he has enough experience of the use of application and he does not need any additional help. Instead, in case of new users, the system provides examples and explanations regarding the use of the application. All the decisions about the next action to perform are in charge of the Response Manager, which, in this phase of the example, requires the Response Generation Agent (RGA) to ask the user for his pin code or keyword. In order to make the interaction similar to a human conversation between people who know each other, the answers may include elements that depend on the previous interactions. Moreover the dialog of the system is personified using the first person singular. The TextToSpeech Engine, inside of the Response Generation Agent, generates the final output, which, in the example, corresponds to “as usual I need”. Then, the RGA should ask the user to make his question, but, given that the user is not a new one and in order to save time and to change the interaction over the time, it modifies the question taking into account the last caller requests, and thus tries to ask “Do you want to know your account balance?”. Unfortunately the answer is “no” and Mr Rossi asks his question. When a user asks her/his question, this sentence is analyzed by the Speech Recognition Engine (using a keyword spotting technique) and evaluated by the Response Manager. If the request is understood and identified as a simple one, the IVR is charged for supplying the requested service and the Response Generation Agent produces the answer according to the features of the caller. If the request is classified as complex one, the RM switches the call to the Routing Agent. In case the Speech Recognition Engine does not recognize the request, the Response Generation Agent tries to interpret the question and proposes its result to the caller. The decision whether routing the call to an operator or asking the user to repeat the question again, in case of another wrong interpretation, depends on several factors: the number of times the user has repeated the question, the way in which the user has repeated the question and the kind of her/his previous interactions. In the case of Mr. Rossi, the call is routed to an operator because the second time he repeated his question exactly as the first one (the system estimates there are few chances he will change it the third time) and a high number of previous interactions ended with the switch of the call to an operator. Anyway, to avoid the impression of a system failure, the Response Generation Agent answers as if it would have understood the question “Thanks Mr. Rossi, an operator is going to answer you”. Immediately, the Routing Agent is activated to perform the adaptive switch of the call. In order to create a context of interaction homogeneous with the first one, not annoying the user with a next question and avoiding

the perception of a system failure, before answering, the selected operator listens to the registration of the request that was not recognized by the automatic speech recognition (typically a human operator has no problems to understand it) and can thus provide the answer immediately. Finally, at the end of the call, another set of rules is exploited with different purposes (alerts, requests of information, loyalty rewards, etc). The system, taking into account the user value, loyalty, cost and information about her/his account and operations, loads on the operator’s screen some hints or alerts to be proposed to the caller. In the example, Mr Rossi is notified about the date of expiration of his bonds.

4. SKETCH OF THE PERSONALIZATION METHODS This section will be focused on the criteria used by the Response Generation Agent to carry out a personalized interaction with the caller, and, given the space constraints, we will just mention those used for the adaptive routing. As it can be realized from the previous description, to accomplish the task of the VUI adaptation, the Response Generation Agent needs to know the Model of the caller (user model) and the description of the category of sentences, so to apply the rules for the match between user features and response features. Regarding the first point, the system stores a user model for each caller, building and updating it on the basis of the Customer DB of the bank. The model is structured in a set of dimensions. Both the Response Generation agent and the Routing agent access the same user model, but use different dimensions. Some of those used by the Response Generation agent are: age, experience in the use of the application (which basically depends on the number of calls), knowledge of the domain (which is deduced with secondary inferences from the user’s school level, job and kind of questions), satisfaction (inferred from the lack of complaints and problems during the previous interactions), cost (which is related with the time subtracted to others calls and with a monetary cost if the number is free for the caller), etc. Regarding the description of the sentences, each phrase (phrase pattern) is classified according to its content and to a set of other parameters (formalism level, complexity, explanation by example, courtesy formula, etc.), each one evaluated on a Likert scale (form 0 to 3). For example, the sentence “on your account, the money remained are...” has score 0 as formalism, 0 as complexity, 1 as explanations, etc.”. The objective of the adaptation rules is to estimate, for each descriptive parameter, the score that best fits the user features. In this way it is then possible to select the sentence, whose parameters’ scores have the shortest distance from the optimal ones, given the specific category of content. First of all, the values of the user’s features are converted into a Likert scale from 0 to 3, then a set of rules defines the correspondences between users features and descriptive parameters and applies them a specific weight. Note that each parameter can be associated with one or more user feature, with different weights (for example, in our knowledge base, given the marketing directions, the choice to use courtesy formulas depends on the customer satisfaction with the weight k=4 and on her/his cost with k=2, on a scale from 1 to 5). Finally, another set of rules computes the score for each parameter as a combination of the scores of such parameter, given the different user features. Then the value is normalized and the RGA selects the sentences whose parameters’ scores have the shortest distance from the optimal ones.

For the adaptive routing of calls, the basic principles are similar, but in this case the result of the selection is not a sentence but an operator to which the call is routed and we need the user model and the models of all the operators (with dimensions such as skill level, communicative ability, rate of positively ended calls, etc). Then a set of rules, based on heuristics similar to those sketched for the VUI personalization, are exploited (see [2] for more details).

5. CONCLUSION In our view, the main contribution of the project is to merge different technologies in order to improve the global quality of the interaction with the user. Thus, the relevance of the application dose not depend on each singular component we exploited (which uses well established techniques and technologies - the automatic speech recognition and the routing of a call), but it regards the attempt to manage them in a user-centered work flow, using adaptivity to personalize each phase and especially the continuity between phases. Furthermore, from a user modeling point of view, the innovative contribution concerns also the specific field of application. For these reasons, our goal was to develop modules (user modeling agents and personalization rules for the VUI and for the adaptive routing) which could be integrated, in an open architecture, into commercial CTIs, opportunely configured and defined (routing algorithms, grammars). Implementation remarks. For the prototype, we implemented our agents on Cisco platform -Customer Response Application v3-, based on IP networks and Java environment. Other components: ICM (Cisco Intelligent Contact Manager), which routes the call; TTS Nuance server, that translates text into voice, ASR Nuance server, which contains the voice recognition engine (based on GSL language) and JESS shell to implement the routing agent. As the next step, we are going to systematically test our application with different users in a real environment. In particular, we plan to evaluate a set of parameters comparing users performing the same task but in environments where the answer is managed by i) our application, ii) the IVR component alone, iii) an operator.

6. ACKNOWLEDGMENTS Our thanks to Delos spa for providing us with CISCO platform and Nuance server, allowing us to develop the prototype.

7. REFERENCES [1] Bocklund, L., Bengtson, D.: Call Center Technology Demystified, CallCenter Press ICMI, Maryland, 2002. [2] Cena, F., Torre, I.: Adaptive Management of the Answering Process for a Call Center System, in Proceedings of HCITALY Simposium, Turin, 2003. [3] Halpern, E.: Human factors and Voice Application, in VoiceXMLReview, Vol.1, 2001. [4] Kobsa, A., Koenemann, J, Pohl W.: Personalized Hypermedia Presentation Techinques for improving Online Customer Relationship, in The Knowledge Engineering Review, 2001, pp. 111-155. [5] Stentiford, F.W.M. and Popay, P.A., The design and evaluation of dialogues for interactive voice response services, in BT Technology Journal, Vol. 17 N. 1, 1999