A Data-Driven Methodology for Evaluating and Optimizing Call Center ...

3 downloads 361043 Views 581KB Size Report
Usability of many call center IVRs (Interactive Voice Response systems) is dismal. ... of telephone voice user interfaces based on detailed call center assessment ...
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 5, 23–37, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. 

A Data-Driven Methodology for Evaluating and Optimizing Call Center IVRs∗ BERNHARD SUHM AND PAT PETERSON BBN Technologies, Speech and Language Processing, 70 Fawcett Street, Cambridge, MA 02138, USA [email protected] [email protected]

Received May 29, 2001; Revised August 22, 2001

Abstract. Usability of many call center IVRs (Interactive Voice Response systems) is dismal. Callers dislike touch-tone IVRs and seek agent assistance at the first opportunity. However, because of high agent costs, call center managers continue to seek automation with IVRs. The challenge for call centers is providing user-friendly, yet cost-efficient, customer service. This article describes a comprehensive methodology for usability re-engineering of telephone voice user interfaces based on detailed call center assessment and call flow redesign. At the core of our methodology is a data-driven IVR assessment, in which we analyze end-to-end recordings of thousands of calls to evaluate IVR cost effectiveness and usability. Because agent time is the major cost driver in call center operations, we quantify cost-effectiveness in terms of agent time saved by automation in the IVR. We identify usability problems by carefully inspecting user-path diagrams, a visual representation of the sequence of events of thousands of calls as they flow through the IVR. Such an IVR assessment leads directly into call-flow redesign. Assessment insights lead to specific suggestions on how to improve a call-flow design. In addition, the assessment enables us to estimate the cost savings of a new design, thus providing the necessary business justification. We illustrate our IVR usability and re-engineering methodology with examples from large commercial call centers, demonstrating how the staged process maximizes the payback for the call center while minimizing risk. Keywords: call centers, interactive voice response, assessment, user-centric design, usability re-engineering, cost-benefit analysis

1.

Introduction

Many companies have enthusiastically adopted touchtone interactive voice response systems (IVRs), introduced more than two decades ago, to increase customer service efficiency. Consumers, on the other hand, who have dealt with many IVRs that are difficult to use, attempt to bypass the automated system and prefer to speak to live agents. This paradox can be explained by considering that on the one hand, financial pressures force call centers to cut operating costs, and on the other ∗ The

tools and processes described in this paper are the subject of pending patents.

hand, IVR usability and its impact on call center operations is poorly understood. While the demise of touchtone IVRs has been predicted (Tatchell, 1996), they are still widespread. Recently, with the maturation of speech recognition technology, speech-enabled IVRs have begun to replace touch-tone IVRs in some domains, but they come with their own limitations, such as poor recognition accuracy under noisy conditions, lack of privacy, and high upfront investment costs. Meanwhile, despite the acknowledged dismal usability of many deployed telephone voice user interfaces,1 call centers find it difficult to diagnose usability problems and to redesign call flows to optimize both customer satisfaction and business benefit.

24

Suhm and Peterson

Decision makers in call centers lack adequate information. Standard IVR performance reports do not capture information on usability and lack sufficient detail. Call center managers are often misled to believe that the existing IVR is performing well. Even if they recognize that something is wrong, they cannot identify the specific problems, much less how to remedy them. Without understanding the value of usability and its impact on the business, IVR usability engineering is rarely taken seriously. Usability design and re-engineering know-how for IVRs ranges from style guides for touch-tone IVRs (Halstead-Nussloch, 1989) to comprehensive collections of best practices for IVR design (Resnick and Virzi, 1995), which also cover state-of-the-art speechenabled IVRs (Balentine and Morgan, 1999). While applying best practices can often improve IVR usability, the measurement and validation of those improvements can be quite difficult, especially in the context of a production IVR with many different tasks and a wide range of callers. When IVR design methods yield different, plausible designs, it is often impossible to decide which design works best just by applying guidelines without some form of empirical evaluation. 1.1.

Evaluation of Telephone Voice User Interfaces

A search of the literature reveals little research on the basic design problems in IVRs, such as prompting and menu design in touch-tone IVRs. Some research identified guidelines for menu and form styles, including their respective usability characteristics. Few standard usability evaluation methods have been applied to IVRs (Edwards et al., 1997; Delogu et al., 1998). These methods have also been applied successfully to the evaluation of research speech user interfaces, commonly believed to be the next generation of IVRs (Yankelovich et al., 1995; Bennacef et al., 1996; Walker et al., 1997). However, standard usability tests, measuring task completion times and rates in a laboratory study, are not practical for complex call center IVRs that offer many tasks and have to accommodate a wide range of users. Usability walkthroughs (Nielsen, 1993), on the other hand, while fast and inexpensive, do not provide any quantitative data and may miss subtle usability problems. Instead of considering usability measures, call center managers commonly evaluate IVRs using reports generated by various system components. Such reports typically contain measures such as “IVR uti-

lization” and IVR/agent “average handling time.” IVR utilization (or “IVR take-rate”) is commonly defined as the difference between the percentage of callers entering the IVR and the percentage leaving the IVR to talk to a live agent. While often interpreted as the success rate for serving callers in an automated fashion, IVR take-rate is a poor measure of IVR automation, because callers hanging up in the IVR may not have received any useful information. In several large call centers we have seen that the majority of callers hanging up in a touch-tone IVR have actually received no useful information and therefore have not been served. For example, based on standard IVR reports, one call center believed that its touch-tone IVR served more than 30% of the callers in the automated system. Our detailed analyses revealed that only 1.6% of all callers were actually served, and almost 20% hung up without receiving any useful information. 1.2.

Speech—The Next Generation of Telephone Voice User Interfaces

While there is no doubt that existing touch-tone IVRs are inadequate, it is unclear what will actually become the new generation of IVRs. Multimodal interfaces (e.g., Gibbon et al., 2000) are expected to improve interface usability in many areas. Conceivably, overcoming the limitation of voice prompts for system output and DTMF for user input would improve IVR usability. Not surprisingly, a study showed that users prefer a telephone augmented with a display over two variations of a standard, voice-only touch-tone telephone (Roberts and Engelbeck, 1989). After several successful commercial deployments, speech-enabled IVRs are being marketed as the next generation of IVRs, citing studies that suggest callers generally prefer speech-enabled over touch-tone IVRs (Nuance, 2000; Bers et al., 2001). However, other studies showed that users may actually prefer touch-tone interaction over speech for certain tasks (Fay, 1994). Certain applications may be more conducive to touch-tone interaction, allowing users to complete their jobs faster using a touch-tone IVR. Furthermore, voice recognition comes with inherent limitations, such as lack of privacy and deterioration of recognition accuracy in noisy environments or for speakers with foreign accents. Under such circumstances, users may want to switch to touch-tone interaction even if they otherwise prefer speech on the specific application.

Evaluating and Optimizing Call Center IVRs

Clearly, “multimodal” IVRs show promise to become the next generation of IVRs. Multimodal in this context means at least speech and touch-tone interaction, and possibly other modalities, such as a (small) display. With further maturation of speech recognition technology, speech-enabled IVRs will be able to conduct increasingly complex dialogs with callers. Several companies have developed speech-enabled IVRs that can process responses to open-ended prompts (Gorin et al., 1996; Lee et al., 2000). Such natural language call routing eliminates the need for layered menus, which are unavoidable when using either touchtone or spoken keyword menu designs for more than just a few routing destinations. Layered menus are one of the main causes of poor usability in existing IVRs. 1.3.

A Novel IVR Assessment Methodology

To improve our ability to conduct effective IVR usability engineering and to advance research towards the next generation of IVRs, this article presents a methodology for IVR usability evaluation and redesign. At the core of the methodology is a novel evaluation measure that combines objective usability and costeffectiveness of IVRs in a single measure. This novel measure also overcomes the flaws of standard IVR performance measures. By quantifying the benefit of an IVR, we can accurately estimate the cost-savings potential for call-flow redesign, and usability practitioners can compare alternative IVR designs objectively. The article also presents a number of usability analyses that identify specific IVR usability problems and lead to concrete suggestions for how to improve the design. Although our assessment does capture the customer experience, this article does not discuss subjective usability explicitly. We believe that standard methods for evaluating subjective usability, such as questionnaires and surveys, are adequate for telephone voice user interfaces. Our assessment methodology includes a process for collecting detailed usability data from commercially deployed IVRs and tools for processing the data efficiently. In an IVR assessment, we record thousands of live calls in an unobtrusive fashion, and we apply automated tools to determine the complete IVR event sequence for each call. Section 2 describes our methods for collecting data from thousands of live calls and efficiently processing that data into a database of event traces for complete

25

calls. We have developed algorithms that infer the complete IVR sequence from call recordings, and we employ human transcribers to annotate significant events in agent-caller dialogs. Section 3 describes our IVR assessment analyses, which evaluate cost-effectiveness and usability of IVRs. We compile the event sequences of thousands of calls into a single number—total IVR benefit—that quantifies both cost effectiveness and objective usability (Suhm and Peterson, 2001). To identify usability problems, we inspect user-path diagrams. User-path diagrams visually represent the complete path of thousands of calls through the IVR. Section 4 illustrates how to apply our IVR assessment methodology to call-flow redesign, using case studies from several large commercial call centers. We illustrate how the assessment analyses allow us to identify usability problems and to generate concrete suggestions for call-flow redesign. We describe how we quantitatively compare alternative call-flow designs using our comparative IVR analysis. Comparative IVR analyses enable us to identify the most effective design possible and project the benefit and cost-savings for a redesign. Section 4 concludes by introducing natural language call routing as the ultimate remedy for curing the touchtone menu blues, and Section 5 closes this article with a summary. 2.

Data Capture from Commercial IVRs

The only complete records of user and system behavior in IVRs are complete calls. Therefore, a comprehensive usability assessment of IVRs must be based on end-to-end recordings of calls. A call typically begins in a dialog with an automated (IVR) system, called the IVR-caller dialog, which may be followed by a dialog with a live agent, called the agent-caller dialog. This section describes methods for collecting end-toend data for calls, including both IVR-caller and agentcaller dialogs. Following subsections describe how we transform call recordings into a complete sequence of events (or an event trace), to make processing efficient. This includes automatic methods to infer the complete IVR event sequence and annotation to capture significant events in the agent-caller dialog. 2.1.

End-to-End Recording and Complete Call Event Trace

Calls can be recorded end-to-end either on-site or off-site. For on-site recording, standard recording

26

Suhm and Peterson

equipment can be employed. For example, if the IVR is hosted on a PC, incoming calls could be recorded on the PC using an appropriate hardware card. This recording procedure is attractive for research purposes. In complex call centers, however, on-site end-to-end recording is difficult because a call is handled typically in more than one piece of equipment. Furthermore, a call may be transferred to remote sites. This happens often in large call centers that handle specific types of calls with specialized agent queues (called “skill-based routing”). Frequently, such specialist agent queues are distributed across several geographically disparate call centers. Therefore, calls that are handled initially in one location frequently must be transferred to another location. In such situations, off-site recording may be the only way to record calls end-to-end. Recordings of complete calls represent a large amount of data that is difficult to analyze in its raw form. To make the analysis of call data efficient, we transform the recordings into a trace of significant events for each call. Significant events in the IVR-caller dialog include system prompts and caller input, either touch-tone or speech. In the agent-caller dialog, we look at events such as exchanges of various kinds of information (e.g., account numbers, dollar amounts), description of the reason for the call (e.g., question about a bill, inquiry into flight schedules), and completion of transactions (e.g., making a payment arrangement or flight reservation). While most of our IVR assessment analyses are based on this event trace, the ability to switch between call recording and its representation as event sequence is crucial throughout the analysis process. The following two subsections describe how we extract the complete call event trace from end-to-end recordings. We begin with a method to obtain the event trace for the IVR-caller dialog, called IVR analysis. This method can be applied to both touch-tone and speech-enabled IVRs. We then outline how we obtain an event sequence for agent-caller dialog. 2.2.

Automatic Analysis of IVR-Caller Dialogs

The preferred method for capturing the IVR event sequence would be an event log that is generated by the IVR. However, the reports that current IVR platforms generate are generally inadequate and inaccurate. They are inadequate because they typically are based on “peg counts”, which indicate how many times a prompt or menu was visited overall, but provide no information on specific calls. Peg counts are unable to identify even

the most basic usability problems, such as callers getting trapped in “voice mail jail” or “touch-tone hell”. To illustrate IVR report inaccuracy, consider the reporting of hang-ups in the IVR segment of a call. Without knowing whether specific calls actually accomplished anything in the IVR, IVR reports frequently count all calls that hang up in the IVR segment as “resolved” (or caller self-serve), regardless of whether or not the caller obtained any useful information or accomplished anything in the IVR. While IVR logging can be customized to include event traces for calls, the IVR code would have to be modified to write to an event log at appropriate states in the call. Generating such code is error-prone and intrusive to call center operations. To obtain a complete event trace independent of IVR reports, we have developed a method (which we call IVR analysis) that infers the complete IVR event sequence from the call recording alone. 2.2.1. Touch-Tone IVR Events. Our IVR analysis employs three main tools to capture the event sequence for the IVR-caller dialog: a prompt detector, a DTMF detector, and a prompt inference tool. First, we use a commercially available DTMF detector to detect touch-tones. Next, our prompt detector recognizes important known prompts in recordings. Finally, whenever the IVR is so complex that detection of all prompts would be impractical, we employ a prompt inference tool to infer the complete prompt sequence efficiently. An additional, crucial step is to determine the exit condition from the IVR-caller dialog. The exit condition indicates whether the call ended in the IVR with a hangup or was transferred to an agent. We detect IVR transfer prompts, such as “Please wait for the next available representative,” to determine whether a call was transferred to an agent. If the prompt detector detects the transfer prompt, we infer that the call was transferred to an agent. Otherwise, we assume that the caller hung up. This method of inferring the IVR exit condition fails when the caller hangs up during the hold time, before reaching an agent. However, such cases can be corrected during the annotation analysis, which we describe below in Section 2.3. 2.2.2. Speech-Enabled IVR Events. We follow a similar process to capture events in speech-enabled IVRs, with the following modifications. First, the analysis must rely on prompt detection to disambiguate the event sequence after any speech input

Evaluating and Optimizing Call Center IVRs

from the caller. Unlike touch-tone IVRs, where we can recognize touch-tone input by the caller reliably, recognition of user speech input is error-prone. Therefore, the state transition after speech input cannot be inferred reliably from the speech alone. Second, to evaluate speech recognition performance, all segments of a recording that contain user speech must be identified and annotated with the sequence of words that was actually spoken. Speech segments can be identified using speech detection algorithms, which are in the public domain. Then, since speech recognizers are prone to recognition errors, the true sequence of words on those spoken segments must be annotated manually, using human transcribers. 2.3.

Annotation of Agent-Caller Dialogs

Our annotation analysis captures the sequence of significant events for anything that follows the IVR-caller dialog, i.e., waiting on hold and agent-caller dialogs. Significant events include start of the agent-caller dialog, the reason for the call (and topics discussed), exchanges of information between caller and agent, and completion of transactions. In addition, the annotation analysis may characterize the call as a whole according to certain attributes, such as the degree to which the call was resolved and agent courtesy. We currently employ human transcribers to perform these annotations, either based on end-to-end recordings or, if recordings of the caller-agent dialog are not available, by annotating in real time while listening to a call. Annotating the reason for calls in randomly selected agent-caller dialogs allows us to estimate their frequency distribution. The distribution of the reasons for calls (referred to as call types in the remainder of this article) is a first and crucial step towards understanding why customers are calling a call center, but it is frequently not available in commercial call centers. Call centers sometime infer call type distributions based on peg counts of IVR sections, i.e., based on how often callers access certain IVR sections. However, these distributions are inaccurate because callers can bypass the IVR completely by transferring to a live agent, and callers who do cooperate frequently make wrong choices in the IVR, thus routing themselves to the wrong IVR section. In our experience, only 35% to 75% of all callers get to the right place using touch-tone menus. Instead of relying on inaccurate IVR reports,

Table 1.

27

Call type distribution example.

Call type

% Calls

Sales

24

Establish new account

17

Payment information and arrangements

11

Billing questions

10

Repair

7

Other

31

we infer the call type distribution from the calleragent dialog, provided that the call was served by an agent. If callers are fully served in the IVR for certain call types, we adjust the call-type distribution accordingly. Table 1 shows such a call-type distribution from one of our case studies. We use the call type distribution to identify IVR usability problems and to estimate upper bounds on IVR automation. The call type distribution is also valuable during call-flow redesign, since frequently-asked questions should be offered near the top of touch-tone menus. Section 4.1 will describe these applications of call type distribution in IVR usability re-engineering in more detail. Figure 1 illustrates the process for capturing the IVR event trace using IVR analysis and call annotation, followed by its application to IVR usability re-engineering. The data capture phase can be viewed as building a call database; each record of the database represents the complete event trace of one specific call. The IVR analysis determines the events of the IVR-caller dialog, and annotation determines significant events of the caller-agent dialog. Based on this call database, in the analysis phase we evaluate usability and cost-effectiveness of IVRs comprehensively by analyzing the call event traces in various ways. The following section describes our IVR assessment analyses. In particular, we introduce user-path diagrams as an effective diagnostic tool that visualizes traffic and levels of call resolution. We also describe our automation analysis, which quantifies the benefit of an IVR to both the caller and the call-center in a single number, called total IVR benefit. Such an assessment leads to IVR usability re-engineering: usability problems are identified, alternative designs can be compared quantitatively, and the re-engineering cost can be justified (by quantifying the improvement opportunity).

28

Suhm and Peterson

Figure 1.

3.

Overview of our IVR assessment and redesign methodology.

Evaluating Cost-Effectiveness and Usability of IVRs

The IVR usability evaluation methodology presented in this section analyzes call event traces to identify IVR usability problems and to quantify the benefit of an IVR, both to the user and to the call center. To quantify benefit in a single number, the first subsection describes total IVR benefit as a measure that combines (objective) usability and cost-effectiveness of an IVR. Total IVR benefit is calculated using our IVR automation analysis. To measure IVR automation,

we apply the common usability measures of task completion rates and task completion time in a form that is more suitable for evaluating production IVRs. Beyond quantifying benefit of an existing IVR, total IVR benefit allows us to measure the potential for improvement by estimating upper bounds on total IVR benefit based on annotations of caller-agent dialogs. This step is crucial to obtain the necessary business justification for call-flow usability reengineering. IVR automation analysis typically reveals general problem areas of an IVR, for example, whether callers get to the right place, whether callers can be identified

Evaluating and Optimizing Call Center IVRs

efficiently, and whether callers succeed in obtaining useful information. To identify specific usability problems, we employ user-path diagrams, an application of state-transition diagrams to the evaluation of IVRs. User-path diagrams allow a usability practitioner to identify specific usability problems and generate concrete improvement suggestions. In our experience with evaluating large commercial call centers, user-path diagrams have proven to be a very useful diagnostic tool. Beyond evaluating objective usability, a comprehensive usability evaluation methodology must also address subjective usability—especially in call centers, where delivering superior customer service is very important. Subjective usability sometimes even outweighs objective performance. However, we believe that standard methods for evaluating subjective usability, such as surveys and questionnaires, are adequate for quantifying customer satisfaction in call centers. Methods for evaluating subjective usability of IVRs therefore, are not discussed further in this article. 3.1.

Evaluating IVR Cost Effectiveness

Evaluating call center IVRs is difficult. Evaluation criteria from the caller’s point of view (usability) and from the call center’s point of view (cost-effectiveness) appear difficult to reconcile. Existing evaluation methods are inadequate and address either usability or costeffectiveness in isolation. As mentioned earlier, standard reports generated by IVR hardware are inaccurate and do not report usability measures. Methods to evaluate subjective usability exist, but they do not quantify the cost for the call center. Common laboratory usability evaluations, using task-based measures in controlled experiments on a few tasks, are impractical for complex call center IVRs, which can offer many different functions (tasks). We therefore introduce the total IVR benefit as a single measure that combines IVR usability and cost-effectiveness. 3.1.1. Total IVR Benefit. How can we quantify both usability and cost effectiveness of a telephone voice user interface? On the one hand, callers want to accomplish their goals quickly and easily over the phone. Therefore, objective usability can be quantified by the standard measures of task completion rates and times. On the other hand, agent time dominates the cost in most call centers. The ratio between cost of agents and all other costs, such as telecommunications time, IVR hardware and software, and facilities charges, is at

Table 2.

29

Typical agent-time savings for automated tasks.

Automated task Saved agent seconds

Caller Useful Completion of identification Routing information transactions 15

40

40

40

least 4 : 1. Therefore, we quantify cost-effectiveness of a telephone IVR in terms of agent time. We define the total IVR benefit as the agent time that is saved by the IVR, compared to handling the complete call by live agents. An IVR “saves” agent time whenever it performs tasks successfully that otherwise would have to be performed by an agent. Tasks that typically can be performed within an IVR include identifying the caller, providing information to the caller, performing transactions, and routing the caller to specialized agents. In some cases, completing these tasks successfully may resolve the call so that the caller hangs up without any assistance from an agent. We refer to such calls as self-serve or full automation. It is important to note, however, that even if a call is not fully automated, the IVR can still provide significant savings through partial automation. Table 2 shows typical agent-time savings across categories of “automatable” tasks, i.e., tasks that can be performed within an IVR. These savings can be derived from benchmark assumptions or measured in annotated agent-caller dialogs. While the emphasis in this context is on cost, we note that IVR automation rates correspond to sub-task completion rates. Hence, IVR automation is a more differentiated version of the standard task-completion usability measure, and total IVR benefit thus combines cost-effectiveness with task completion. The key to our IVR evaluation methodology is the measurement of cost-effectiveness in terms of agent time saved at the task level, by first quantifying IVR automation and then calculating an overall benefit measure, as described next. 3.1.2. Quantifying IVR Automation. Total IVR benefit could be measured directly by timing the length of agent-caller dialogs. But as agent time has a large variation, the length of thousands of agent-caller dialogs would have to be measured, which currently requires manual annotation of calls. Furthermore, it is impossible to obtain unbiased data from commercial call centers, because many factors may have a significant

30

Suhm and Peterson

Table 3.

IVR automation analysis, with two agent categories (“specialist”, “floor”). (Agent seconds saved per) automation category

Traffic Call profile Fully-automated calls Transfers to specialist with readout

Calls

% Calls

Account

Routing

Info delivery

One call

Net

307

5.6

15

40

40

95

5.3

40

40

95

1.7

40

55

1.0

40

55

6.4

40

40

3.9

15

1.3

99

1.8

15

Transfers to agent with readout

101

1.8

15

Transfers to specialist with ID

641

11.6

15

Transfers to specialist, no ID

545

9.9

Transfers to agent with ID

471

8.5

Transfers to agent, no ID

2927

52.9

Abandons Total

Benefit [agent secs]

439

7.9

5530

100.0

impact on caller behavior and agent handling time. We therefore have developed a method to estimate total IVR benefit based on call event-sequence data, called IVR automation analysis. As the first step in IVR automation analysis, we define tasks that can be automated in the IVR, as shown in Table 2. Typically, the completion of a task can be associated with reaching a certain state in the IVR. Thus, the set of completed tasks can be inferred directly from the event sequence data for a call, using a simple lookup table that documents which IVR states correspond to the completion of which tasks. We make one important exception to the assumption that IVR states indicate successful task completion. Specifically, we do not assume that routing decisions made in the IVR are necessarily correct. Rather, we look at subsequent agent-caller interactions to determine, based on the annotated reason for a call, whether the call was correctly routed or misrouted to an agent. Calls that misroute to specialists usually need to be transferred somewhere else and, therefore, incur a cost equal to the time it takes the specialist to reroute the call, which can be thought of as a negative routing benefit. Given the definition of tasks that can be completed within an IVR, we characterize each call according to distinct combinations of automated tasks, which we refer to as call profiles. Given a set of calls with their event sequence data, we annotate every call with its set of completed tasks and use the pattern of completed tasks to accumulate counts for each call profile. The call traffic into an IVR is thus partitioned into a set of call profiles, each representing a distinct pattern of automation.

15

29%

29%

9%

19.6

Automation rates are defined as the percentage of automation achieved over all calls for each automatable task. This percentage can be calculated simply by adding the percentages of all call profiles that include the specific automatable task. Table 3 shows an example IVR automation analysis, in which we distinguish two agent types, “specialist” and “floor.” The left column lists the call profiles. The next two columns (labeled “Traffic”) show the breakdown of the total data set, which consists of 5530 calls, into the various profiles. For example, 5.6% of the calls were fully automated, and 7.9% of the callers were abandoned without the caller’s getting anything done. Then, the three “Automation” columns show the automation categories for each profile. This analysis is based on three automation categories: capture of the caller’s account number, routing, and delivery of information. In each “Automation Category” column we enter the associated agent time savings from Table 2 for those call profiles in which that automation component was achieved. For example, the profile “Transfer to agent with readout” achieved capture of the account number and automated delivery of information. The bottom row in Table 3, for the three “Automation” columns, shows the automation rates by category: 29% capture of account number, 29% routing, and 9% information delivery. For each call profile, the saved agent time over all calls into the center (shown as the last column in Table 3) is the product of the total agent time saved for one call with the corresponding percentage of traffic. For example, the call profile “transfers to specialist with ID” saves 55 seconds of agent time, because the

Evaluating and Optimizing Call Center IVRs

call was transferred to the right place (routing automation), and the caller was identified (account number automation). Since 11.6% of all calls fit this profile, the net saving of agent time is estimated as 11.6% times 55 seconds, which equals 6.4 agent seconds over all calls to the center. The total IVR benefit, then, is the sum of the net IVR benefits for all call profiles. For the example in Table 3, our analysis estimates a total IVR benefit of 19.6 agent seconds saved, shown in the bottom right corner cell of Table 3. In other words, we estimate that this IVR shortens, on the average, the agent handling time for every call by 19.6 seconds. 3.2.

Evaluating IVR Usability

Evaluating usability typically encompasses quantifying usability, identifying usability problems, and evaluating subjective usability factors. Our assessment methodology currently quantifies usability and provides methods to identify usability problems, but we do not (yet) formally evaluate subjective usability factors, such as user satisfaction. Common usability measures include task completion rates and task completion times. Our IVR automation analysis provides task completion rates in a form that is suitable to the problem of evaluating telephone user interfaces. The automation analysis can also be used to quantify usability of telephone user interfaces. Specifically, low automation rates point to usability problems. In the example above, the low success rate for capturing account numbers (only 29% of all callers) reveals a severe shortcoming and usability problem in this call flow. In addition to IVR automation analysis, we have developed a number of other tools for evaluating usability. In this article, we describe user-path diagrams as a diagnostic tool for identifying IVR usability problems, and as an analytic tool for estimating the impact of design changes. User-path diagrams visualize user behavior in the IVR by representing event sequence data as a tree, similar to state-transition diagrams. State-transition diagrams have been applied to many engineering problems, including user interface design (Parnas, 1969). Applied to visualizing user behavior in IVRs, statetransition diagrams can visualize the paths of many users through an IVR, hence the name user-path diagram. To manage the complexity of user-path trees, we cluster individual IVR states into sub-dialogs, such as ID entry or menu selection. Such sub-dialogs may

31

encompass many IVR events and multiple IVR-caller interactions in the captured sequence data. The nodes of the tree correspond to IVR states, arcs correspond to state transitions, and leaves correspond to end conditions of calls. Each node and leaf is marked with the percentage of all calls that reached the node or leaf. In addition, arcs may be marked with the user input that causes the corresponding state transition, such as pressing a certain touch-tone in response to a prompt. We found it helpful to distinguish at least three end conditions. “Self-serve” refers to calls that are resolved in the IVR, i.e., where the customer completes the call in the IVR, without talking to a live agent. “To agent” are calls that transfer to an agent. “Abandon” refers to calls where the caller hangs up, either in the IVR without obtaining any useful information, or on hold before reaching a live agent. If the call center operates with distinct categories of agents, the “to agent” category is typically broken down into various subcategories, each representing a distinct routing destination from an operational point of view. Figure 2 shows an excerpt from a user-path diagram. Rectangular boxes represent IVR states, arrows represent call traffic, and circles indicate places where calls leave the IVR. In this example, 82% of all callers make it past the opening menu to a state that prompts the callers to key in their account number, called “ID Entry”. In this figure 8.5% of all callers abandon the call while attempting to provide their account number, shown as an arrow to the right. On the other hand, 63.9% of all callers enter their account number successfully and reach the main menu. At the main menu, 28.5% of the callers select an option that routes them

Figure 2.

Excerpt from a user-path diagram.

32

Suhm and Peterson

to a specialist agent, while 0.8% route themselves to a general (floor) agent, and 1.7% abandon the call. We identify usability problems by inspecting userpath diagrams. Usability problems are found by looking at those areas of the call flow that receive little or no caller traffic or that have high rates of abandoned calls or transfers to an agent. In Fig. 2, for example, the state cluster named “ALT ID Entry” receives 9.6% of all calls, but 86% of these calls either are abandoned or are transferred to a floor agent, and the account number is correctly entered in only 14%. Obviously, this part of the IVR is ineffective. Section 4.1 presents a more detailed example of how to identify IVR usability problems by inspecting user-path diagrams.

4.

Application to IVR Redesign

Using case studies from several large call centers, this section illustrates our IVR assessment methodology and its application to call-flow redesign. The first two subsections elaborate how we employ user-path diagrams and call-type distributions to identify IVR usability problems, and how this leads to quick hit and

Figure 3.

call-flow redesign recommendations. The next subsection (4.3) presents our comparative IVR analysis: by applying the same assessment methodology that is used to identify usability problems, alternative designs can be compared quantitatively. Instead of relying on educated guesses, more or less comparative IVR analysis allows us to determine the overall best design. This section closes by demonstrating how our assessment methodology also leads to building a business case, empowering call center managers to make informed decisions between touch-tone re-engineering and speech-enabling their IVR. 4.1.

Identifying IVR Usability Problems

This section demonstrates how to identify and quantify IVR usability problems by analyzing user-path diagrams and call-type distributions. The call center in this example serves many functions, including sales, billing questions, and repair services. The user-path diagram in Fig. 3 shows the first two menu layers in detail, but abbreviates the provision of automated information as “Automated Billing Information” and “Automated Fulfillment”.

Identifying IVR usability problems by inspecting a user-path diagram.

Evaluating and Optimizing Call Center IVRs

Visual inspection of this user-path diagram reveals the following IVR usability problems, identified in Fig. 3. (1) About 30% of calls either are abandoned or are transferred “cold” to an agent at the main menu. This traffic represents the callers who attempt to bail out of the IVR at the first opportunity. While we can empathize with such callers, they are likely to be transferred to the wrong agent, who then has to transfer the caller, which means a second period of waiting on hold for the correct agent. (2) While 18% of callers choose “other billing question,” only 3% actually find the billing IVR on this alternative path, and 15% bail out to an agent— after spending more than 1 minute in the IVR without having received or provided any useful information. (3) The billing IVR achieves very little automation, because only 5% of all callers find “Automated Billing Information”. Only 3% of the callers obtain automated information in the billing IVR. By contrast, a standard IVR report for this call center would indicate a 19% IVR take rate, which really just means that 19% of all callers hung up in the IVR. The IVR report would not reveal that less than one in six such callers (3% overall) actually obtained useful information! Many call centers commit the mistake of inferring the call-type distribution from IVR reports. In our example, IVR peg counts would indicate that 10% of callers reach the billing IVR, but this does not mean that 10% of all incoming calls are about billing questions, because many callers may not find the billing IVR! In conjunction with knowledge of the true call-type distribution, presented in Table 1 in Section 2.3, the following additional issues become obvious. (4) 21% call about billing-related questions, but only 10% of the callers find the billing IVR. (5) 24% of the calls should be handled by a sales representative (as indicated in the call-type distribution), but only 6% of the callers are transferred to a sales representative out of the IVR. Anecdotally, our IVR assessment in this case study also revealed that hardly anyone was choosing a specific automated fulfillment. To offer this service, the company had recently purchased a speech-enabled system that essentially was wasted, as our assessment showed. An early assessment might have prevented this waste.

33

Based on our assessments of call centers across several industries, we have identified the following common IVR usability problems: • Excessive complexity—many IVR functions are underused because customers get confused early in the call. • Caller identification difficulties—IVR scripting that attempts to identify the caller is frequently too difficult, preventing many callers from reaching the parts of the IVR that deliver automated customer service. Even with effective use of Automatic Number Identification (ANI), the success rate may be low because customers call from phones other than the one registered with their account. • Confusing touch-tone menus—menu wording is often based on call center operations terminology and may not reflect how the customers think about their reason for the call. The customers make selections that do not lead to self-service and instead require assistance from an agent. 4.2.

Assessment Quick Hits and Re-engineering

In the course of evaluating existing telephone user interfaces, we usually observe a number of usability problems in enough detail to diagnose the problems and recommend solutions. We refer to solutions that are clear-cut and non-controversial as quick hits. When the diagnosis is clear but the solution is not, we frequently recommend call-flow reengineering, where alternative designs are tested side-by-side for efficacy. We have encountered many obvious quick hits, some more than once. For example, in a few cases we observed a suspiciously large proportion of callers being bumped out of a touch-tone numeric entry task because of numbers that had too few digits. By listening to call recordings, we realized that callers were struggling to enter long digit strings, and that they were cut off before completing their entry. The solution was to increase digit timeout parameters. Another quick hit is that many call flows unnecessarily invite callers to bail out early in the call flow, thus effectively bypassing what the automated system is intended to achieve. We refer to this problem as call-flow “leakage.” Frequently, simple wording changes can encourage callers to make a conscious selection at least at the first menu, which may deliver very significant benefits by getting them transferred to the right destination. Call-flow reengineering is called for in cases where the best design is not obvious. An important example

34

Suhm and Peterson

Figure 4.

Comparative IVR analysis example.

is in touch-tone menus. Our detailed routing analysis can identify specific menus that are not effective. While simple wording changes may help, identifying wording that works best is difficult, and a comprehensive solution frequently involves changes to the menu structure. In the example of Fig. 3, our analysis revealed that routing to billing and sales does not work. However, it is not obvious how to merge the two alternative paths to the billing IVR, which confuses many callers. In such cases, we develop several alternative designs and quantitatively compare them by applying our assessment methodology, which we describe next.

4.3.

Comparative IVR Design

As part of the reengineering process, we typically evaluate alternative designs side-by-side with real traffic. We call this process comparative IVR analysis. For each design we measure automation rates and calculate IVR benefit. Differences in automation rates indicate which IVR design is better for each automatable task. For overall comparisons, differences in total IVR benefit reveal which design is superior on the whole. Comparative IVR analysis can thus validate that a new IVR design is indeed better, and furthermore, it can quantify the cost savings. In another case study, IVR automation analysis (see Table 3) revealed that the baseline call flow was ineffective in capturing the caller’s account number. The baseline call flow, shown as “Touch-tone Design A”

in Fig. 4, required that callers first make a selection at the main menu. Only for specific choices were callers asked to provide their account number; callers bailing out to an agent at the main menu were not even asked to enter their account number. We suggested asking callers for their account number before presenting them the choices of the main menu. To determine whether this design (shown as “Touch-tone Design B” in Fig. 4) was superior, we exposed both designs to thousands of live calls and conducted a comparative IVR analysis. The increase in total IVR benefit, by nine agent seconds, as seen in Fig. 4, proved that Design B was indeed much more effective than Design A. The improvement was due to increases in successful capture of account numbers (+29%), delivery of information (+11%), and improved routing (+5%). This figure also shows the automation analysis for our speech-enabled natural language call router, which is discussed in more detail in the following subsection (4.4). For a statistical treatment of such comparisons, standard tests on the difference of proportions can be applied to differences in automation rates and call profile rates, after an adequate Bonferroni adjustment to the significance level. In our case study, even increases of 1.5% in any automation rate are significant (z = 3.68; p < 0.01). Hence, the increases in all three automation categories reported above were significant. The statistical treatment of total IVR benefit is more complex. An analysis that we cannot present here due to space limitations shows that a difference of more than

Evaluating and Optimizing Call Center IVRs one agent second is significant ( p < 0.05). Hence, the nine agent-second increase in benefit of the redesigned IVR is highly significant. 4.4.

Natural Language Call Routing

Touch-tone menus are inherently limited in their ability to get callers to the right destination. Touch-tone menus force callers to match the reason for their call with just one of a few options, which are often expressed using call center jargon. Moreover, as menu complexity increases, IVR usage decreases because callers become frustrated, and routing mistakes increase because of caller confusion. In a case study, our end-to-end analysis of calls showed that 25% of all calls were routed to specialists, but less than 80% of these—20% overall— went to the correct specialist. Callers misrouted themselves at one of four menu layers because they could not determine which touch-tone option best matched their question. When a mistake is made, the opportunity to automate the call or to save agent time is lost because the caller hangs up out of frustration or times out to a customer service agent. Natural language call routing helps to solve these problems by cutting through the tangle of call-flow options and letting callers state their purpose in their own words. We recently conducted a trial of the BBN Call Director in a large call center. Results show that natural language call routing delivers significant improvements over touch-tone menus, both in terms of customer satisfaction and agent labor savings. Of more than 10,000 callers who experienced the BBN Call Director and had a preference, an overwhelming 82% said that they preferred describing their problem with words to navigating touch-tone menus. Most said that they

Figure 5.

Benefit projection example.

35

preferred it because it was easier, more natural, and more efficient to use than touch-tone menus. Our assessment analyses from the trial showed that the natural language call router provided benefit even beyond the improved touch-tone design, as shown in Fig. 4. Overall, the number of successful routes in the IVR increased by a factor of three over the original touch-tone system. After accounting for the part of the gain that could be attributed to call-flow redesign, the speech-enabled call router increased IVR benefit by an additional nine agent seconds, thus effectively doubling the total IVR benefit compared to the baseline. 4.5.

Benefit Projections

Due to the cost of IVR changes in large call centers, the redesign of telephone user interfaces must be justified with a business case. Our IVR automation analysis and benefit calculation can provide the necessary business justification for IVR redesign because the cost savings of the redesigned IVR can be estimated. Based on an automation analysis of the existing IVR and knowledge of usability problems, we can derive bounds for improvements in the various automation categories. From these bounds, we can project total IVR benefit to determine upper limits on annual cost savings, which are then used to justify reengineering effort. Our re-engineering methodology, which is based on evaluating designs with real callers, eventually produces very tight benefit projections. In the example below, the numbers for reengineered touch-tone and speech-enabled systems are based on comparative evaluations with real callers. Figure 5 compares four IVR designs: an initial touch-tone baseline; a quick hit touch-tone design;

36

Suhm and Peterson

a reengineered design representing a practical upper limit for touch-tone; and a speech-enabled design that uses BBN’s Call Director natural language call router. The height of the columns indicates the total IVR benefit. The first two columns represent the redesign described in Section 4.3 above. With further changes to the touch-tone menus, we could realize an additional five agent seconds of benefit (see the third column in Fig. 5), but that is probably close to the limit of what can be done with a purely touch-tone interface. In contrast, we projected the benefit of re-engineering with speech to ten agent seconds beyond the quick-hit redesign, which is represented by the second column. Such projections of IVR benefit can be translated easily into cost savings using basic call center cost parameters, such as total call volume and agent cost. In this example, the increase in IVR benefit from the baseline to the “quick-hit” modified call flow corresponds to annual agent cost savings of more than $1M. 5.

Summary and Conclusions

Telephone voice user interfaces, an important class of human-computer interfaces, have been neglected by researchers in the field of human-computer interaction. Usability evaluation and engineering methods for IVRs are not well developed. Decision-makers in call centers, under strong financial pressures, strive to cut costs without being able to assess the significant impact of usability on customer satisfaction and the financial bottom line. To remedy this situation, we have presented an IVR assessment methodology that evaluates both costeffectiveness and usability. Moving beyond previous laboratory studies of research spoken dialog systems, which evaluate only task completion rate and time, our methodology allows practitioners to evaluate IVR usability in the field in a systematic and comprehensive fashion. An evaluation of a telephone voice user interface must be based on thousands of end-to-end calls. Calls must be recorded in their entirety to capture the complete user experience, and thousands of calls are necessary to obtain statistical significance in the analyses. We have presented methods to analyze such large amounts of audio data efficiently. Our analysis transforms gigabytes of audio data into detailed event traces. For the IVR section, the event sequence is captured in a fully automated procedure, while manual transcription is necessary to annotate events in agent-caller dialogs.

We described several tools for IVR evaluation and usability re-engineering, including IVR automation analysis, user-path diagrams, and comparative IVR analyses. These tools enable IVR usability practitioners to solve the tough problems in IVR redesign. By identifying IVR usability problems and comparing alternative designs, an IVR assessment can tell call center managers very specifically what’s wrong with their existing IVR and how to improve it. By quantifying the improvement opportunity and measuring potential cost savings, we justify the cost of call-flow re-engineering and help call center managers to prioritize the use of their limited resources. An assessment typically delivers immediate cost savings with quick-hit recommendations, with payback periods of much less than a year. Our methodology of quantifying IVR automation and benefit is far superior to standard IVR reports. In particular, we have shown that the standard measure of “IVR take rate” can mislead call center managers to believe that their IVR is quite effective, while IVR usability may in effect be very poor. We have presented total IVR benefit as an accurate, quantifiable measure that combines objective usability and cost-effectiveness. We recommend adoption of total IVR benefit as the standard benchmark for IVR performance. The dependence of the analysis of agent-caller dialogs on human annotators significantly impacts the cost for an assessment. In the future, we hope that audio mining technology will lower costs of transcription analysis and allow call center managers to monitor their performance in a fully automated fashion. Our methodology currently does not formally evaluate user satisfaction or any other subjective usability measure. While the impact of user satisfaction on customer attrition can be large, most managers of call centers focus on operational savings and ignore user satisfaction, because it is difficult to quantify. We believe that standard methods developed in the human factors community are sufficient to evaluate user satisfaction of telephone voice user interfaces. Some of these methods, such as expert walkthroughs and surveys in the evaluation phase, and usability tests or focus groups in the redesign phase, are complementary to our data-driven assessment. With each method having its own strengths and weaknesses, a combination of complementary methods can be powerful, bringing in the user perspective in various ways throughout the entire evaluation and design process. As the ultimate cure for the touch-tone menu blues, this article referred to natural language call routing.

Evaluating and Optimizing Call Center IVRs

Natural language call routing avoids menus by allowing callers to describe problems in their own words. While this technology has been investigated for many years, only recent breakthroughs have increased the accuracy of natural language routing to levels that are far superior to those of touch-tone menus. With such superior technology and a solid IVR evaluation and reengineering methodology, we are poised to make large improvements in the usability of telephone voice user interfaces. Acknowledgments The assessment methodology presented in this paper was developed over four years of research and consulting for several large call centers. The authors gratefully acknowledge the contribution of all members of the Call Director team at BBN Technologies, past and present. Sincere thanks also to Dan McCarthy for his comments and proofreading the article. Note 1. For simplicity, this article uses the term “IVR” often instead of the more correct, yet longer term “telephone voice user interface”. Technically, the latter refers to a class of human-computer interfaces (and may be more intuitive to readers with a humancomputer interaction background), while the former refers to a specific instance of such an interface (and should be very familiar to most readers with a background in call centers). We apologize to the readers for any confusion this may cause.

References Balentine, B. and Morgan, D.P. (1999). How to Build a Speech Recognition Application. San Ramon, CA: Enterprise Integration Group. Bennacef, S., Devillers, L., Rosset, S., and Lamel, L. (1996). Dialog in the RAILTEL telephone-based system. International Conference on Spoken Language Systems (ICSLP). Philadelphia, PA: IEEE, Vol. 1, pp. 550–553. Delogu, C., Di Carlo, A., Rotundi, P., and Satori, D. (1998). A comparison between DTMF and ASR IVR services through objective and subjective evaluation. Interactive Voice Technology for Telecommunications Applications (IVTTA). Italy: IEEE, pp. 145– 150.

37

Edwards, K., Quinn, K., Dalziel, P.B., and Jack, M.A. (1997). Evaluating commercial speech recognition and DTMF technology for automated telephone banking services. IEEE Colloquium on Advances in Interactive Voice Technologies for Telecommunication Services, pp. 1–6. Fay, D. (1994). User acceptance of automatic speech recognition in telephone services. International Conference on Spoken Language Systems (ICSLP). Yokohama, Japan: IEEE, Vol. 3, pp. 1303– 1306. Gibbon, D., Mertens, I., and Moore, R. (Eds.). (2000). Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology, and Product Evaluation. Dordrecht, Netherlands: Kluwer Academic Publishers, pp. 102–203. Gorin, A., Parker, B., Sachs, R., and Wilpon, J. (1996). How may I help you? Interactive Voice Technology for Telecommunications Applications (IVTTA). Italy: IEEE, pp. 57–60. Halstead-Nussloch, R. (1989). The design of phone-based interfaces for consumers. International Conference for Human Factors in Computing Systems (CHI). New York: ACM, Vol. 1, pp. 347–352. Lee, C.H., Carpenter, B., Chou, W., Chu-Carroll, J., Reichl, W., Saad, A., and Zhou, Q. (2000). On natural language call routing. Speech Communication, 31:309–320. Nielsen, J. (1993). Usability Engineering. Morristown, NJ: AP Professional. Nuance. (2000). 2000 Speech User Scorecard. Menlo Park, CA: Nuance Communications. Parnas, D.L. (1969). On the use of transition diagrams in the design of a user interface of interactive computer systems. Proceedings of ACM Conference. pp. 379–385. Resnick, P. and Virzi, R.A. (1995). Relief from the audio interface blues: expanding the spectrum of menu, list, and form styles. Transactions on Computer-Human Interaction, 2:145–176. Roberts, T.L. and Engelbeck, G. (1989). The effects of device technology on the usability of advanced telephone functions. International Conference on Human Factors in Computing Systems (CHI). New York: ACM, Vol. 1, pp. 331–338. Suhm, B. and Peterson, P. (2001). Evaluating commercial touchtone and speech-enabled telephone voice user interfaces using a single measure. International Conference on Human Factors in Computing Systems (CHI). Seattle, WA: ACM, Vol. 2, pp. 129– 130. Tatchell, G.R. (1996). Problems with the existing telephony customer interface: The pending eclipse of touch-tone and dial-tone. International Conference on Human Factors in Computing Systems (CHI). Vancouver, BC: ACM, Vol. 2, pp. 242–243. Walker, M.A., Litman, D., Kamm, C., and Abella, A. (1997). PARADISE: A framework for evaluating spoken dialogue agents. 35th Annual Meeting of the Association of Computational Linguistics. Madrid: Morgan Kaufmann, pp. 271–280. Yankelovich, N., Levow, G.-A., and Marx, M. (1995). Designing SpeechActs: Issues in speech user interfaces. International Conference on Human Factors in Computing Systems (CHI). Denver, CO: ACM, Vol. 1, pp. 369–376.