data automatically and extracts the call-reason segment pre- cisely by detecting ... Most customers call the contact center with some reasons in mind. Identifying ...
EXTRACTING CALL-REASON SEGMENTS FROM CONTACT CENTER DIALOGS BY USING AUTOMATICALLY ACQUIRED BOUNDARY EXPRESSIONS FUKUTOMI Takaaki, KOBASHIKAWA Satoshi, ASAMI Taichi, SHINOZAKI Tsubasa MASATAKI Hirokazu and TAKAHASHI Satoshi NTT Cyber Space Laboratories, NTT Corporation ABSTRACT To improve the performance of call-reason analysis at contact centers, we introduce a novel method to extract call-reason segments from dialogs. It is based on the following two characteristics of contact center conversations; 1) customers state their requests at the beginning of the calls, 2) agents tend to use typical phrases at the end of the call-reason segments. Our proposal acquires these typical phrases from stored speech data automatically and extracts the call-reason segment precisely by detecting the typical phrases. Experiments show that it significantly improves the performance of call-reason information retrieval since it allows the search scope to be limited to the call-reason segments of calls. Index Terms— spoken document segmentation, contact center, speech dialog, speech recognition, data mining 1. INTRODUCTION In recent years, many contact centers have been recording dialogs between their customers and agents, and processing the records for quality analysis, data mining, auditing and agent training [1][2]. Analyzing contact center dialogs is an effective approach to acquiring insights into improving contact center operations and business processes. Analyzing the customers’ call-reasons is especially useful for trends analysis, new product development and so on. This is because the callreasons are a direct window on the customer, his requests, complaints, and queries. Most customers call the contact center with some reasons in mind. Identifying these call-reasons is needed to analyze the trends in specific call-reasons. A few techniques have been presented to identify call-reasons in contact center dialogs [3][4]. However, these were designed to suit particular training data or predefined call-reasons categories. A technique that is flexible enough to suit the analyst’s purpose is necessary. Using keywords relevant to the desired call-reasons is useful in searching the ASR(automatic speech recognition) transcripts of dialogs to extract the calls that suit the analyst’s purpose [5]. However, contact center dialogs often contain topics not relevant to the desired call-reason. Therefore, it is
978-1-4577-0539-7/11/$26.00 ©2011 IEEE
5584
necessary to limit the search scope to just those call segments to enhance search performance [6][7]. Past research indicates that the beginning segment of a call is critical in terms of call-reason analysis [8][9]. This is because customers call with some purpose, and it is naturally that they state their call-reason at the beginning. We use the term call-reason segment to represent this initial segment. Call-reason segment length varies significantly with the customer and the subject matter. Calls tend to follow a typical pattern within each domain [10, 11, 12], and agents are likely to handle a call in a similar manner. We observed that at the completion of call-reason, the agents uttered a few common phrases. In this paper, we propose a novel method to extract the call-reason segments by detecting typical phrases at the end of the call-reason segments. Our proposal uses two characteristic features of contact center dialogs to extract typical phrases; 1) call-reason segments frequently appear at the beginning part of a call, 2) agents use typical phrases at the end of call-reason segments. Considering the call-reasons mostly appears at the beginning of a call, we defined the phrases whose frequencies at the beginning part of call differ significant from that of the other part of calls as the typical phrases. The typical phrases are automatically extracted from stored speech data independently of contact center domain. In following sections, we describe a call-reason extraction method by using typical phrases at the end of the call reasons in detail, and show robustness to speech recognition errors and its availability for the call-reason analysis through the experiments. 2. PROPOSED METHOD This section describes the method for call-reason segments extraction using typical phrases at the end of the call-reason segment (Fig.1). 2.1. Automatic acquisition of the typical phrases at the call-reason boundary The typical contact center conversation starts with the agent’s greeting and is followed by the customer’s opening statement.
ICASSP 2011
Preprocess for extraction characteristic expressions U begin
Stored speech data
Speech recognition
Call transcripts
high U begin
yes
Typical phrase extraction a-2
d f (w i Document frequency calculation a-3
no
d f (w i
Document frequency calculation a-4 W boundary
w 1 , w 2 , w 3 , ... , w n Detection of the call reason segment Greeting detection b-1
Customer
Speech recognition
Document frequency at the beginning part of the call
The beginning part of a call a-1 U other
Agent
high
U begin )
Agent: : Thank you for ... Customer: Oh I’m ... Agent: : mhm Customer: it’s ah ... :
U other )
Document frequency in the remainder of the call
Extraction of Characteristic expressions(chi-square test) a-5
Typical phrase search with extracted expressions b-2
Agent Customer time The call-reason segment
Fig. 1. Procedure for extracting call-reason segment with automatically acquired characteristic expressions at the boundary of call-reason segment
We define this opening statement as the beginning of the callreason segment. A manual investigation of 113 dialogs in a contact center domain confirmed that in more than 90 percent of the dialogs the call-reason segments started just after the agent’s greeting. Accordingly, our definition is reasonable. The focus of this section is a method of detecting the last utterance of the call-reason segment. As noted in Section 1, once the call-reason segment is completed, it is usual for the agent to respond with one of a few typical phrases. We use these typical phrases to delineate the end of the call-reason segment. However, phrases which are not typical at the end of call-reason can also exist. To prevent these unrelated phrases from causing miss-extraction, we restrict the search space to the first X percent of each call (Fig.1 a-1). This is based on the fact that call-reasons are stated early on as mentioned above. Typical phrases are identified by the document frequency of utterances. For an utterance, this frequency is expressed as follows. Ni 1 df (ui ) = df (wj ) (1) N j=1 df (ui ) is document frequency of utterance ui , Ni is the number of words contained in utterance ui , and df (wj ) is document frequency of word wj . The set of phrases with high document frequency of utterance, typical phrases Uhigh begin = {u1 , u2 , ..}, are taken to be the top Y percent of phrases (Fig.1
a-2). Some sets of extracted phrases represent the same meaning but in a different fashion, like“ I understand ”and“ I see ”. It is to be expected that word order also differs. Additionally, recognition errors are included in the phrases. To suppress these differences within phrases and recognition errors, we constrain typical phrases as those words whose document frequencies at the beginning parts of the call differ significantly from their frequencies in the remainder of the call. Document frequency of words in Uhigh begin is calculated as follows; df (wi ∈ Uhigh begin ) =
|wi ∈ Uhigh begin | |Uhigh begin |
(2)
high where |wi ∈ Uhigh begin | is number of words included in Ubegin , high |Uhigh begin | is number of utterances in set Ubegin (Fig.1 a-3). Document frequencies of words that are included in the utterances in the remainder of the call are shown as;
df (wi ∈ Uother ) =
|wi ∈ Uother | |Uother |
(3)
where Uother = {u1 , u2 , ...} is the set of utterances that lie in the remainder of the call (Fig.1 a-4). Finally we obtain a characteristic set of words Wboundary = {w1 , w2 , ..} in typical phrases by using the chi-square test to compare the document frequencies of the words at the different parts of the call (Fig.1 a-5). By representing typical
5585
phrases as the set of words, we can remove word order differences, wording differences, and commonly miss-recognized words from phrases. 2.2. Extraction of the call-reason segment The call-reason segments start just after the agent’s greeting. We identify the beginning of the call-reason by detecting the agent’s first statement (Fig.1 b-1). Using the set of characteristic words Wboundary to extract the completion of call-reason is discussed in this section. The set of words Wboundary is characteristic of the end of call-reasons (Fig.1 b-2). The utterance that contains the greatest number of words in Wboundary is taken to indicate the end of the call-reason. When several utterances contain the same (maximum) number of words in Wboundary , we define the first such utterance is taken to indicate the end of the call-reason. 3. EXPERIMENTS We conducted two experiments to evaluate our technique. The first evaluated its accuracy in extracting call-reason segments. The second evaluated its contribution to call-reason analysis performance. 3.1. Dialog data set A total of 945 dialogs captured by two contact centers were used as the data set; each call was automatically transcribed by ASR. 513 dialogs were on billing guidance (billing) and 432 were on general guidance (general), 400 dialogs out of 513 (393 dialogs out of 432) dialogs were used to extract characteristic words in billing (general). The call-reason segment if the remaining dialogs (billing: 113, general: 39) were tagged both automatically (by our proposal) and manually. These are used as the ground truth data set for performance evaluation. 3.2. Evaluation measure for call-reason segment extraction performance and its contribution to call-reason analysis The proposed method emphasizes the extraction of the utterance indicating the end of the call-reason. We use extraction accuracy EA and extraction rate ER to represent the performance of end utterance extraction as follows;
EA =
|extracted utterances ∩ correct utterances| |extracted utterances|
(4)
ER =
|extracted utterances ∩ correct utterances| |correct utterances|
(5)
where correct utterances are the set of utterances at the end of call-reason segment as tagged manually. Extracted utterances are the set of utterances extracted by our proposal. We used these evaluation criteria to measure how precisely our proposal detects the end of the call-reason.
5586
Fig. 2. Performance of extracting the completion of callreason segment The extraction quality does not have a direct impact on call-reason analysis performance. Consequently, we set IR (information retrieval) over the spoken dialogs task to evaluate the effects on call-reason analysis performance. This is because IR techniques, such as query search, are mostly applied to find calls that discuss specific call-reasons in practice. We chose the queries for the search task from the nouns present in the correct call-reason segments. These queries can be said to be closely related to the call-reasons; we employed query retrieval precision and recall as the evaluation metrics. Improvements in the call-reason analysis were confirmed by comparing the following 4 search scope conditions; all utterances (entire call), the K utterances at the beginning part of the calls (baseline) [8], the utterances contained in the extracted call-reason segments (proposed), and the manually transcribed utterances contained in the extracted call-reason segments (proposed w/ transcription). 3.3. Result and discussion The characteristic words at the end of the call-reason segments were automatically acquired from the data set presented in 3.1. In this experiment, the typical phrases were acquired from the utterances at the beginning part (X = 20%) of the call, and the characteristic words in typical phrases were extracted from the utterances with high document frequency (top Y = 10%). We defined the significance level of the chi-square test as 0.05. We acquired some of the extracted characteristic words that represent understanding of the customer’s request, such as ”Certainly” and ”I understand” [in Japanese], and the others are common terms at the topic transition in contact center dialogs, such as ”may I have”, ”may I ask” and ”customer number” [in Japanese]. Sixteen and twelve words were acquired for billing and general task data sets, respectively. The extraction quality in billing task is shown in Fig. 2. The x-axis and the y-axis represent extraction rate and accuracy, respectively. The plots are based on the following two
Table 1. Comparison of retrieval performance; mean precision, mean recall and mean F-measure task billing
general
Transcript set entire call baseline (first 15) Proposed Proposed (w/ transcription) entire call baseline (first 20) Proposed Proposed (w/ transcription)
In addition, we can say that it can be applied to various domains in contact center dialogs, because it achieved high performance regardless for these two different contact center tasks. Furthermore, experiments on contact center dialogs suggest that the proposal is robust to speech recognition error.
Precision .233 .614 .714
Recall .739 .498 .660
F-measure .354 .550 .686
.703
.897
.789
.363 .740 .717
.693 .423 .512
.474 .538 .597
The authors would like to thank Mr. Hirohito INAGAKI of NTT Cyber Space Laboratories and all those who helped us with their advice. Without them, we could not have pursued this work.
.720
.730
.725
6. REFERENCES
parameters. M s represents the least number of characteristic expressions that the utterance at the end of the call-reason has to include. The margins are set to evaluate the utterance as correct if it appeared within one or two utterances either before or after the correct one (the dashed and dotted lines). To compare the performance, we also plot the result of a simple extraction method; extract the K-th utterance as the end utterance of the call-reason segment. The best result (K = 15 and 2 utterances as margin) is plotted as a black square. Our proposal clearly outperforms the simple method in extracting the completion of call-reason. The results also indicate that the accuracy of extracting the utterance at the end of call-reason rises with the number of characteristic expressions contained in the extracted utterance. Table 1 shows retrieval performance. As we noted above, the dialogs used for the evaluation differ from the dialogs used to extract the characteristic words (unsupervised). The baseline approach uses the first K utterances of the calls to retrieve the call-reasons. K in billing and general tasks was adjusted to 15 and 20 utterances, respectively, since these values yielded the best F-measure in the experiments. The result of this experiment indicated that the proposed approach outperforms both tasks in terms of F-measure. Our method achieved roughly twice the precision of entire call search. The effect of recognition error on the performance can be also observed. For precision, our proposal on ASR transcripts matches the performance of manual transcriptions. No degradation in precision was observed. One interpretation of this result is that the chi-square test works well to prevent words that commonly appear from being extracted incorrectly. 4. CONCLUSION In this paper, we proposed a call-reason segment extraction technique using characteristic words to delineate the callreason segment. It can automatically extract characteristic words for segment extraction from stored speech dialogs, so no adjustment is needed to extract the call-reason segment. In experiments, our proposal showed superior performance on the call-reason search task to the entire call search approach.
5587
5. ACKNOWLEDGEMENTS
[1] Pang-Ning Tan, et al., ”Textual data Mining of Service Center Call Records”, In proceedings of ACM SIGKDD, pp.417-423, 2000. [2] L. V. Subramaniam, et al., ”Business Intelligence from Voice of Customer”, In proceedings of IEEE International Conference in Data Engineering, pp.1391-1402, 2009. [3] M. Tung, et al., ”Call-Type Classification and Unsupervised Training for the Call Center Domain”, In Automatic Speech Recognition and Understanding, pp.204-208, 2003. [4] S. Busemann, et al., ”Message Classification in the Call Center”, In proceedings of ANLP, pp. 158-165, 2000. [5] J. Mamou, et al., ”Spoken Document Retrieval from CallCenter Conversations”, In proceedings of ACM SIGIR, pp. 51-58, 2006 [6] H. Takeuchi, et al., ”Automatic Identification of Important Segments and Expressions for Mining of Business-Oriented Conversations at Contact Centers”, In proceedings of EMNLP, pp. 458-467, 2007. [7] G. Mishne, et al., ”Automatic Analysis of Call-center Conversations”, In proceedings of ACM, pp. 453-459, 2005. [8] Y. Park, et al. ”Low-Cost Call Type Classification for Contact Center Calls Using Partial”, In proceedings of INTERSPEECH, pp. 2739-2742, 2009 [9] Y. Park, ”Automatic Call Section Segmentation for ContactCenter Calls”, In proceedings of ACM, pp. 117-126, 2007. [10] K. Kummamuru, et al., ”Unsupervised Segmentation of Conversational Transcripts”, Statistical Analysis and Data Mining, Vol 2, Issue 4, pp. 231-245, 2009. [11] H. Takeuchi, et al., ”A Conversation-mining System for Gathering Insights to Improve Agent Productivity”, the 4th In Proceedings of IEEE International Conference on Enterprise Computing, E-Commerce, and E-Services, pp. 465-468, 2007. [12] S. Roy, et al., ”Automatic Generation of Domain Models for Call Centers from Noisy Transcriptions”, Proceedings of In proceedings of COLING, pp. 737-744, 2006