Real Time Speaking Rate Monitoring System - IEEE Xplore

17 downloads 10492 Views 336KB Size Report
speaking rate of a call center agent when s/he is in conversation. A constant monitoring assists in the agent speaking at the desired rate with a customer in a ...
Real Time Speaking Rate Monitoring System Meghna Abhishek Pandharipande TCS Innovation Labs - Mumbai Tata Consultancy Services Limited Yantra Park, Thane (West), India - 400601 Email: [email protected]

Abstract—The rate at which we speak has a bearing on its comprehensibility and is important in recent times with mushrooming call center operations. An optimal speaking rate is one that is neither too fast not it is too slow. A fast spoken speech makes conversation unintelligible while a slower speaking rate speech makes the conversation boring. Speaking rate definitely varies depending on the emotional state of the person when s/he is speaking but more specifically regional and cultural influences the speaking rate. With the growth in call centers, especially the ones that cater to a different geography, it has become important to have systems that can continuously and in real time monitor the speaking rate of a call center agent when s/he is in conversation. A constant monitoring assists in the agent speaking at the desired rate with a customer in a different geography. In this paper, we develop a speaking rate monitoring system that can assist the agent to speak at an optimal speaking rate. Keywords-Speaking Rate; Customer Satisfaction Index; nonlinguistic feature

I. I NTRODUCTION With mushrooming services industries the need for human driven call center is becoming widespread. A call center of a company usually caters to the need of its customers and assists them resolve queries that they might have about the services that the company provides. It is important that the agents at the call center be well aware of the company and also be able to understand the queries of the customers and be able to articulate and communicate the answers efficiently especially because it is a conversation on the telephone and not face-to-face interaction. The agent becomes the representative of the company and it is important that the agent be able to satisfy the customer query in the best possible manner. Customer Satisfaction Index, or CSI in short is often a metric used to measure the performance of the agent when dealing with the customer on the telephone. Measurement of CSI [1] has derived importance because of the strong link between customer satisfaction and customer retention, which is very important for any company. Customer’s perception of the quality and service of the product usually determines the success of the product or service in the market. And the face of the company is the agent who converses with the customer, hence CSI is seen as a key performance indicator within business. While the performance of the agent has several attributes, as discussed in [2], one of the major attribute is the manner in which the agent interacts with the customer and speaking rate tops them. As per some studies emotional

Sunil Kumar Kopparapu TCS Innovation Labs - Mumbai Tata Consultancy Services Limited Yantra Park, Thane (West), India - 400601 Email: [email protected]

dependence is one of the factor that helps in building extreme customer loyalty. Emotional dependence includes; speaking rate, neutral accent, cultural lingo. Research [3] shows that the average English-speaking rate is between 130 and 200 Words Per Minute (WPM). This wide WPM range applies to 90% of the English-speaking population. They further suggest that for complex material, a rate of 130 to 145 WPM may be required and for material of average complexity, a speaking rate between 145 and 175 is optimal while for simple material, many listeners can accommodate over 175 WPM. Listeners can be lost to boredom, lost to complexity or fully engaged in a conversation based on the speakers ability to deliver all types of material at the optimal rate for each listener [3]. This makes it important that the agent when conversing with the customer speak at the desired speaking rate all the time. While it is easy for a machine to maintain the speaking rate at the desired levels, it is hard for a human agent. While accent and cultural lingo and can be trained and cultivated. Speaking rate is an attribute is normally driven by situations and can be overlooked, especially when predetermined text is to be spoken. Monitoring Speaking Rate of an agent in real time is essential and can improve CSI by several notches. In this paper, we propose a real time speaking rate monitor that assists the agent in real time maintain an optimal speaking rate. The rest of the paper is organized as follows, in Section II we discuss speaking rate and in Section III we describe our construction of a real time speaking rate monitoring system and we conclude in V. II. S PEAKING R ATE In general speaking rate is measured as number of spoken words per min (WPM). Measuring speaking rate involves identifying a feature of speech and then calculating the rate by counting the occurrence of that feature per second [4]. While there is a general debate on which pattern to use, argument persist between choice of syllable and phoneme. Some insist syllable is the right unit while others oppose that the universal relevancy of syllable is not assessed and that phonemes may be better candidates. However [5] showed that measuring syllable rate is more correlated to the perceptual speaking rate than measuring the phone rate. The correlation is 0.81 using syllable as the feature as against the correlation of 0.73 when phoneme is used as the feature. Pfitzinger [5] showed that the speaking rate calculated in terms of syllable or phoneme for

German language have a correlation of 0.6 for normal rate speech. The level of correlation is higher [4] for languages with simple CV syllable structure than for languages allowing more consonant cluster complexity and at fast speaking rate language dependent strategies may also influence [6]. Without getting into the debate of which speech feature to use, we focus on syllable as the measure of speech to compute the speaking rate. The algorithm to detect syllable nuclei in a spoken speech has been described in [7]. The syllable in speech is identified as a function of the intensity of the spoken voice and the voicedness of speech. They also identify the pauses in the spoken speech so as to be able to identify the number the of occurrences of the syllables per unit time more accurately. In out implementation of speaking rate monitoring system we make use of the algorithm specified in [7] with very minor modifications. Once the speaking rate is computed in terms of number of syllables per second (sps), we can compute the speaking rate in WPM using a conversion factor of γ = 1.5 as suggested by Yaruss [8], namely, SRW P S = γ × SRsps × 60

Fig. 2.

Speaking Rate Monitor Agent Interface

(1)

where SRW P S is the speaking rate in words per minute, SRsps is the number of syllables per second and γ is the conversion factor between syllable and word.

(a) Indicator Speak Slower

III. B UILDING A R EAL T IME S PEAKING R ATE M ONITORING S YSTEM Fig. 1 shows a typical call center scenario. A customer who is in need of a service calls up the call center of that company on a telephone channel and gets connected to one of the call center agents. The agent typically has a desktop which pops up some relevant information about the customer which the agent can use to communicate with the customer

Fig. 1.

Typical Call Center Scenario.

(b) Indicator Speak Faster Fig. 3.

Speaking Rate Monitor on Taskbar

and resolve the customer queries by talking to the customer on the telephone. The proposed real time speaking rate monitoring (SRM) system resides on the desktop of the agent and taps the speech of the agent as s/he communicates with the customer. On purpose, the speaking rate monitoring system does not access or process the speech of the customer because of the confidentiality of the information being communicated by the customer which include credit card numbers, passwords etc. While the call conversation between the agent and the customer is in progress on the communication channel, the agents speech is simultaneously and in real time passed to the SRM system on the desktop. The SRM extracts the syllables in the agent spoken speech and can convey the speaking rate to the agent on his/her desktop. The speaking rate can be updated at a predefined interval of time (in our implementation it is 1 sec) as shown in Fig. 2. However, in actual operation and on the floor of the call center, the agent interface (see Fig. 2) is more of a distraction to the agent than an assisting tool for the agent to maintain the speaking rate. Discussion with the agents on the floor concluded that a non-distracting feedback to the agent was a preferred option. Subsequently an icon on the taskbar was provided that changes its color depending on the speaking rate of the agent. Addi-

Fig. 4. Identified 8 syllables in /The first one believed in faith, he thought/ using the algorithm in [7]

tionally, the icon pops up a message like Please speak slow if the agent is consistently speaking faster than the desired speaking rate for more than a predetermined amount of time (in our implementation it was 30 seconds). A screen shot of the speaking rate monitor on the desktop of an agents machine on the taskbar is shown in Fig. 3. IV. R ESULTS AND D ISCUSSION The implemented algorithm [7] was tested for its performance accuracy. Fig. 4 shows the number of syllables detected in the spoken speech /The first one believed in faith, he thought/1 . The text ”The first one believed in faith, he thought” has 8 syllables identified using [9]. It can be observed that the number of syllables identified in the speech (8 syllables marked in Fig. 4) is exactly equal to the number of actual syllables in the text, namely 8. However for testing the performance accuracy, we identified a short paragraph in English and asked 10 members in the lab to speak the paragraph at three different speaking rates. The algorithm was used to compute the number of syllables in the spoken speech in all the 30 spoken speech. The syllables identified by the used algorithm was within 10 % of the actual number of syllables [9], present in the text. The speaking rate SRsps is computed by counting the number of syllables (say N ) in the spoken speech as seen in Fig. 4 per unit time. If Ttotal is the total time taken to speak, and if Tsilent is the duration for which there was no speech detected (see Fig. 4), then SRsps =

N Ttotal − Tsilent

SRW P S is calculated using using SRsps in (1). There were not many challenges in terms of providing a speaking rate monitoring system. The challenges were however in the deployment and tuning the speaking rate monitor. One of the challenge was to be able to obtain a visible space on an already crowded agent desktop without distracting the agent. A series of discussions with several agents clarified that an additional icon on the task bar with occasional pop-ups would not only assist the agents to maintain their speaking 1 We will use / / to indicate the spoken word. For example /W / represents 1 the spoken equivalent of the written word W1 .

rate at the desired levels but also not unnecessarily distract them from doing their main task of speaking to the customer. The other major challenge was in provide a real time aspect to the monitoring system. For enabling this one had to compute the speaking rate not only accurately but also quickly. In our implementation we experimented with several duration of the speech signal and finally converged to the fact that analyzing a 5 sec speech sample every 1 second gave the desired results of accuracy. Note that the longer the duration of the speech analyzed the better is the accuracy of detection of syllables and hence computation of speaking rate. Further the delay of 5 sec was found to not cause any feel of a delay by the agent. The implementation was put to test in a local call center that catered to a financial company. The effect of the implementation was tested in two ways, in the first instance, the actual users were asked to give a feedback about the speaking rate monitoring system in terms of usefulness in real floor use. Surprisingly almost all the agents were happy to be given an indication when hey deviated from the required speaking rate. Several of them mentioned that they took pride when the speaking rate monitor did not give any pop-up (speak slow or speak fast) during their conversation with the customer. They also mentioned that they invariably tend to speak faster especially information that they had to read to the customer and the pop-up by the SRM was more often useful for them to check their speaking rate. The second measure was in terms of the CSI, pre usage of the speaking rate monitoring system and post use of speaking rate monitoring system showed a significant increase in the CSI of the agents who used the system. While the increase in the CSI can not be entirely attributed to the SRM, it can be concluded that it did have an over all effect on the improved CSI. V. C ONCLUSION We proposed and developed a speaking rate monitoring system. The speaking rate monitoring system was used on the agents desktop in the floor of a call center and gave a real time feedback to the agent. The feedback allowed the agent to change the speaking rate on the fly while the conversation with the customer was going on. This feedback, in terms of changing color of an icon on the task bar and pop-up messages was discreet and in general assisted the agent speak at desired rate, this was observed in terms of improved CSI of the agents who used the SRM system on their desktop. The real-time implementation and the actual use of a speaking rate monitor in a call center environment is the main contribution of this paper. R EFERENCES [1] CSI. [Online]. Available: http://www.symphonytech.com/articles/ satisfaction.htm/ [2] L. ShinYi, B. Ryan, H. L. Qu, and L. Martin, “A study of the relationship between hotel informative service setting items and customer satisfaction,” Journal of Quality Assurance in Hospitality and Tourism, vol. 11, no. 2, pp. 111–131, 2010. [3] Using an adaptive voice user interface to improve customer service and reduce operational costs. [Online]. Available: http://psshelp.s3. amazonaws.com/AdaptiveAudio WhitePaper.pdf

[4] F. Pellegrino, J. Farinas, and J.-L. Rouas, “Automatic estimation of speaking rate in multilingual spontaneous speech,” Speech Prosody, vol. Nara, Japan, 2004. [5] H. Pfitzinger, “Local speakig rate as a combination of syllable and phon rate,” in In Proceeding of ICSLP, 1998. [6] F. Ramus, “Acoustic correlates of linguistic rhythm: Perspectives,” in In Proceeding of International Conference on Speech Prosody, 2002. [7] N. H. D. Jong and T. Wempe, “Praat script to detect syllable nuclei and measure speech rate automatically,” Behavior Research Methods, vol. 41, pp. 385–390, 2009. [8] J. S. Yaruss, “Converting between word and syllable counts in children’s conversational speech samples,” Journal of Fluency Disorders, vol. 25, no. 4, pp. 305 – 316, 2000. [Online]. Available: http: //www.sciencedirect.com/science/article/pii/S0094730X00000887 [9] wordcalc. [Online]. Available: http://www.wordcalc.com/

Suggest Documents