21st International Conference on Pattern Recognition (ICPR 2012) November 11-15, 2012. Tsukuba, Japan
Probabilistic Keyboard Adaptable to User and Operating Style Based on Syllable HMMs Toshiyuki Hagiya KDDI R&D Laboratories, Inc.
[email protected]
Tsuneo Kato KDDI R&D Laboratories, Inc.
[email protected]
Abstract We propose a probabilistic keyboard based on syllable HMMs, as well as an adaptation for users and operating styles to achieve high accuracy on the software keyboard on mobile devices. The syllable HMMs balances high accuracy by introducing syllabic constraints and word flexibility by not depending on a dictionary. Experimental results showed that a userdependent probabilistic model reduced the error rate by 24.2% compared to the conventional deterministic method. Moreover, we propose to adapt the model to various operating styles using maximum-likelihood linear regression (MLLR). In the experiment, the adaptation was effective with tens of words typed into the style.
1. Introduction In accordance with the recent increase in smartphones and tablets, keyboard input on a touch screen display has become more popular. The touch screen display provided the benefits of a larger display area and a flexible software keyboard, which changes orientation, switches languages, and modifies the key layout. Despite the benefits, many users still prefer physical keyboards because they make more errors with a software keyboard. One reason for the errors is the lack of tactility and the small keys [1]. A software keyboard is sometimes too small for fingers to hit the correct keys. Another reason is the parallax of various operating styles. Parallax, which is dependent on the user’s operating style, has a significant effect on the accuracy. A straightforward solution is to adjust the target areas according to the actual touch distributions [2]. We will introduce related research in the next section. Previous studies modeled touch distributions for each letter of the Roman alphabet, in other words, each key. However, sequences of keys can be effectively limited by units of syllables. Moreover, a character corresponds to a syllable, which is a sequence of keys, in many languages, such as Japanese and Chinese. We propose a probabilistic QWERTY keyboard based on syllable HMMs. The syllable HMMs can improve typing ac-
978-4-9906441-1-6 ©2012 IAPR
65
curacy by introducing syllabic constraints into the keyboard, while maintaining word flexibility for input because of the independence with a dictionary. Touch distributions are dependent on users and operating styles. To adapt the HMMs to users and operating styles with little data, we adapt the HMMs using MLLR. This paper is organized as follows. In section 2, we describe the related research. In section 3, we detail the proposal technique. Next, we evaluate the basics of the syllable HMMs and the adaptation to different users in section 4 and the adaptation to operating styles in Section 5. Finally, we state the conclusion in section 6.
2. Related Research Previous studies to reduce typing errors are classified into two approaches: language model based and touch model based. In the language-model-based method, the next characters or words are predicted and displayed [3, 4], and the next keys with high probabilities are displayed with bigger key sizes [5]. However, the performance of the language-model-based method was greatly dependent on a dictionary. Unregistered words, such as new words, abbreviations, or colloquial expressions, were ignored. On the other hand, in a touch-model-based method, one study proposed transforming the keyboard layout according to actual touch distributions [6]. In that study, the center of each key moved to follow the centroid of actual touch distributions, and the boundary was set in the middle of the keys. However, the method did not consider key context, namely the previous key. Additionally, the study reported that some users were confused by large shifts in key layout. To the problems above, some studies proposed a method of transforming the detection areas using both approaches. The probabilities of the occurrence of the next characters were given as the product of two probabilities from the language model and from the touch model [7, 8]. The former study used character n-grams in the language model and the Gaussian mixture model (GMM) of stylus input positions in the touch model [7].
Table 1. Japanese syllabic alphabet represented by Roman alphabet letters. A I U E O a i u e o K ka ki ku ke ko S sa si(shi) su se so .. .. . . W
wa
wi
-
we
Figure 1. The basic operating style of the device.
wo
The latter study proposed a similar approach based on typing history in the language model, although the central region of the key remained fixed to minimize the transform [8]. Moreover, one method proposed collecting touch data via a typing game because a large volume of data was required to train the probabilistic model [9].
3. Keyboard based on Syllable HMMs This method consists of two proposals – the HMMs in syllabic units and the adaptation to different users and operating styles. The proposal keyboard is based on syllable HMMs. Each key entry corresponds to a HMM state. The HMM states are defined by the occurrence probability distribution of touch positions in the two-dimensional plane and the state transition probability, which is fixed to 1. The character with the highest probability is displayed in response to the user’s touch positions. Japanese syllables are typed by combinations of a couple of Roman alphabet letters as shown in Table 1. They are not identified until the final key input of the syllable. The syllables are translated into a standard Japanese expression through a kana-kanji converter in the next step. The proposed method has two advantages. First, the method has word flexibility for input because the syllable unit is independent of the dictionary, unlike the word-unit-based input method described in previous studies [7, 8]. With mobile devices in particular, users type words that are not in the dictionary, such as new words, abbreviations, or colloquial expressions. Second, typing accuracy is expected to improve compared with character-unit-based input because the possible sequences are restricted by syllable units in many languages. The models are left-to-right HMMs without selftransitions. Every touch input corresponds to a frame of the state transition. The probability of the syllable Si is computed by the following equations: Li P ((xj , yj )|Si ) (1) P (Si ) = j=1 K P ((xj , yj )|Si ) = k=1 πk N ((xj , yj )|μk , Σk ) (2) where, (xj , yj ) is the j-th touch position, N (; ) denotes a normal distribution with a mean vector μk and a covariance matrix Σk . Li is the length of Si . The accumulated probability does not take the transition probability
66
Figure 2. The input screen.
into consideration because the transition probability is always 1. The connection between the models is ergodic. Equation (2) shows that HMMs have a mixture of normal distribution with a diagonal covariance matrix, the same as the GMM. As a comparative approach to the syllable model, we use the key model, which is a single state GMM trained for the keys. In addition, we propose MLLR adaptation of the syllable HMMs in certain contexts such as differences in users or operating styles. MLLR is a model adaptation technique that estimates a set of linear transformations for the mean and variance parameters with less data [10]. In MLLR, both the mean and covariance parameters are transformed and reestimated as μk = Aμk + b (3)
Σk−1 = HT Σ−1 k H
(4)
where μk is a mean vector, Σk is a covariance matrix of Equation (2), and μk and Σk are the adapted mean and covariance matrix, respectively. (A, b, H) are estimated in a maximum-likelihood (ML) manner. The basic performance of the syllable HMMs and the adaptation to users are described in section 4. The performance and adaptation to operating styles is described in section 5.
4. Evaluation of the Syllable HMMs and User Adaptation 4.1 Data Collection As shown in Figure 1, participants held the mobile device with one hand and operated it with the other hand while sitting in a chair. They typed Japanese words using the QWERTY keyboard in a square shape on the screen as shown in Figure 2. At the same time, the device recorded the touched position and the character corresponding to the position. The participants typed 100 words for one set and typed 10 sets with a break after each set. The words were chosen from the nouns in the JUMAN dictionary [11] at random for every participant. We chose six participants (five males and one female) between the ages of 25 and 57, who have experienced the input by software keyboard, although not professional. All of the participants were right-handed
and typed with the right hand. The device was a Galaxy Tab with a 7.0-inch screen.
4.2
Training and Evaluation Method
We trained the two probabilistic models – the key model and the syllable model – with the collected data. We also compared three kinds of user models: user dependent (UD), user independent (UI), and user adaptation (UA). The UD model was trained with 900 words of the participant by the Baum-Welch retraining algorithm. The UI model was trained with 5000 words of different participants. The UA model was obtained by MLLR adaptation based on the UI model with the participant’s typed entry of 50 words. The cross validation (CV) for evaluation was conducted 10-fold inside a user for the UD model and 5-fold for the UI and UA model. We evaluated their performances with an error reduction rate (ERR) given by the following equation. E0 − Epm ERR = × 100 (5) E0 Here, E0 is the baseline input error rate of the conventional method, and Epm is the recognition error rate of the probabilistic model.
4.3
Experimental Results
4.3.1 Distribution of users’ touch position Figure 3 shows the centroids of all participants’ touch distributions over the key area. Different biases were observed for the four different areas. More specifically, the distances were small in Area 2, while the centroids leaned in the direction of Area 2 in other areas. The reason was that the users set their right hand in Area 2 as the home position and moved their hand as short a distance as possible from there. 4.3.2 Baseline evaluation for data selection To investigate how the performance of the probabilistic model was influenced by the data selection criterion if the training data included typing errors or not, we compared the Epm of the model trained only by correct data (Data A) with that of the model trained by the data containing typing errors (Data B). We used the UD syllable model for evaluation and set the number of mixtures at 2. As shown in Table 2, the average error rate of all participants’ E0 was 3.56%, and Epm was
Table 2. Difference of performance by training data. training recognition error error reduction data rate(Epm )[%] rate(ERR)[%] A 2.69 24.2 B 3.32 6.53 E0 3.56 2.69% with Data A and 3.32% with Data B. In other words, the ERRs were improved by 24.2% and 6.53%. These results show that the performance improved with the touch distributions of the participants themselves. On the other hand, the ERR of Data B was lower than that of Data A. 4.3.3 Evaluation of the user-adapted model Next, we focused on the performance of two models, the key model and the syllable model with Data A. According to Figure 3, we set the number of leaf nodes of the regression tree for MLLR to 4. Table 3 shows the ERRs of the two models of each user model. In all the models, the ERRs showed positive values. This meant that using the probabilistic model was effective. The reason was considered that the touch distributions were more or less similar regardless of the participant. In this experiment, all participants typed with the right hand. Therefore, the influence of the common feature of the input style was greater than that of the difference between participants. Meanwhile, every ERR of the syllable model showed a higher value than that of the key model. A Welch’s T-test (p