Evaluating effort reduction through different word prediction systems * Christian Bérard LAP-GRAI Bordeaux 1 University, France
[email protected] Abstract - Access to computers for all is very important for disabled users. Screen keyboards enhanced by a word prediction system are precious tools for people with severe arm mobility problems. This paper compares efficiency for text input through several screen keyboards. Then, more detailed measurements have been done to explain why the PolyPredix™ prediction system used by KeyStrokes provides the best results for disabled users in terms of effort reduction. Keywords: Access for all, Design for all, word prediction, screen keyboard.
1
Introduction
For some disabled users, efficiency in computer access is of tremendous importance because the computer is becoming more and more the control centre of their whole environment as well as a crucial means for communication with the outside world. For able users, the keyboard is still by far the most common method to input data. However, it is unfortunately also one of the most awkward and unnatural input devices that could have been devised for users with severe arm mobility problems. So, since the early 1980s, virtual screen keyboards and other software have been developed to replace the standard physical keyboard. And, from 2000 onward, mobile applications such as SMS or PDAs make writing assistance systems useful "for all". But physically challenged persons are quite different and their needs typically vary from person to person. So screen keyboards have to meet multiple and different requirements. We have experimented with several commercial products such as Wivik 3.0 [5] and free on-screen keyboards including those distributed by Microsoft with Windows 2000 and XP. We also investigated research prototypes like Dasher [6] or VITIPI [2]. Finally, we found that for many physically impaired users KeyStrokes for Mac OS X [1] is one of the most efficient available screen *
0-7803-8566-7/04/$20.00 2004 IEEE.
David Niemeijer AssistiveWare Van Speijkstraat 73-D 1057 GN Amsterdam The Netherlands
[email protected] keyboard, because it combines powerful features with detailed control over many of the parameters that control features such as dwelling and word prediction. In this article, we first present and discuss a global comparison of the above input text systems. Then, we present the prediction system designed and programmed by David Niemeijer. To measure the efficiency in terms of effort reduction a strategy is proposed to analyse the impact of the three kinds of text prediction available with KeyStrokes 3 and later.
2
Some screen keyboards
From the user's point of view, we propose to classify text input software through three key concepts: •
Compatibility: this involves the ability to use the on-screen keyboard software with a variety of standard software packages
•
Adaptability: this concerns all the features of the assistive technology software, which can be configured to the needs of each individual.
•
Efficiency: it is related to the global productivity brought to the disabled user to produce text.
Then, from the designer's point of view, prediction systems are based on two approaches eventually mixed (a) syntactical approach, (b) statistical approach as stated by [3]. A first test was performed to compare roughly the efficiency of current screen keyboards for text input: •
the standard MS-Windows 2000 virtual keyboard, without word prediction
•
Wivik 3.0, frequently used by disabled people, provided with WordQ a word prediction system [5] . It runs only with MS-Windows.
•
•
Dasher, is a new text-entry interface, driven by natural continuous pointing gestures. It is based on a statistical approach. It is available for multiple platforms. KeyStrokes, screen keyboard for Mac OS X with its advanced multilingual word prediction.
For that, we use the following small text made of 44 words, 214 characters (without spaces), 258 (including spaces). Hello guys, Sorry for my poor English but I need a short text of about two hundred characters to measure and compare the efficiency of text input through various screen keyboards or softwares. Some of these do not support French language my mother's tongue. Figure 1. Text 0 This test was completed five times by the same disabled person. Table 1 and figure 2 show the minimum and maximum values of the time in second required to enter this small text. VITIPI prototype has not been tested for technical reasons. Table 1. Time required for text input Windows 2000 min 315 max 350 strain arms
Wivik
Dasher
KeyStrokes
263 345 no
420 510 eyes
130 315 no
3
The Prediction system
This paper examines the effort reduction that different kinds of word prediction can provide for people with disabilities using an on-screen keyboard to "click-type". Measurements were carried out using version 3.2 of the KeyStrokes, on-screen keyboard developed by AssistiveWare for Mac OS 9 and Mac OS X [4]. This onscreen keyboard offers an advanced word prediction engine called PolyPredix™. In this section we will discuss the basic principles of this prediction system.
3.1
Three levels of prediction
Three different levels of word prediction are provided: word completion, next-word prediction and multi-word prediction. Word completion, the simplest method, provides suggestions of how a word might be completed based on the letters that have already been typed, so if the letter “d” has been typed the software might suggest “development”, “do”, “devoted”, “did”, and so forth. Next-word prediction ensures that if the user types “do” followed by a space, the software will suggest likely next words such as “as”, “it”, “so”, “you”. Multi-word prediction looks further ahead and suggests likely phrase elements that start with the letters that were already typed or are likely sequels to the words that were just typed. For “do” these might be: “do you”, “do you know”, “do you mean”, “down to the”. If the user has already typed "do yo", then "do you go" will get a higher likelihood ranking than, for example, "yoghurt", because the system also looks at previously typed words and if a suggestion matches not only the currently typed letters but also previous words it is more likely to reflect what the user wants to type.
600 500 400 min
300
max
200 100 0 Windows 2000
Wivik
Dasher
KeyStrokes
Figure 2. KeyStrokes prediction leads by providing asignificant time reduction for this person Results shows that KeyStrokes prediction leads in terms of the time gained for this person. We observe also that we do not agree with the conclusions presented in [7] concerning Dasher’s performance. Dasher’s use causes eyestrain and we have not managed to exceed the speed of 7 words per minute ([7] stated that 25 words per minute can be reached after an hour of practice). Of course, this may to some degree vary from person to person, but more research is clearly warranted.
The higher prediction levels have the benefit of letting the user produce a lot of text with minimal typing, but might for some users (especially younger children) lead to an information overload. For that reason, the software can be configured to use only lower prediction levels depending on the needs of the user. This provided us an opportunity to compare the effort reduction provided by these different levels of prediction. This is not just of academic interest as there are several word prediction systems on the market offering, for example, only word completion and it is useful to know how much the potential benefits would be for users to move to a more advanced word prediction system.
3.2
PolyPredix™ a statistical approach
The PolyPredix™ prediction system is based on statistical analysis of word use patterns and not on grammatical rules as is done in some other word prediction packages [2]. Because of the statistical foundation, the prediction system is language independent and can be used for any language that uses word and sentence concepts. It also means that using the same statistical analysis that was
used to build the standard prediction dictionaries for languages such as English, French, and German the software can also learn not only new words, but also word combinations, as the user types. The system is thus selflearning in that it transparently learns about new words and word combinations. It is also self-learning in another respect. The software automatically adjusts the frequencies of word and word combinations stored in the standard supplied dictionaries to the frequencies of these words and word combinations in the user's vocabulary. So if the word "dog" in the standard dictionary for English has a relative frequency F compared to a frequency of 0.5F for "donut" and the user of the software uses the word "donut" more often than the word "dog", the frequencies will get adjusted. As a consequence, if this user types "do" the software will be likely to suggest word "donut" as a possible completion whereas for another user, using the same software, the suggestion would likely be "dog". The prediction system stores word combinations of up to five words in a sequence. In this no distinction is made between a standard dictionary supplied with the software and a user dictionary. This is in contrast to certain other software that offers multi-word prediction, but not for user learned text, because only single words are learned during typing. During prediction information is combined from all open dictionaries. Typically this will be one standard dictionary for the user's language and a user dictionary containing words and word combinations learned during typing. Additional, topical dictionaries or dictionaries for other languages could also be open at the same time but this was not done in the tests discussed here. To provide suggestions to the user, the prediction system combines information about the frequency with which words and word combinations occur with information on when the user last used a particular word or word combination. This ensures that when the users starts writing about a new subject, words and word combinations frequently used during this writing activity will be suggested even if their overall frequency is rather low. For example, the name Marc is far less common in the US than the name Mark (rank 201 versus rank 14 in the 1990 US census, http://www.census.gov/genealogy/www/freqnames.html), but if the user is writing a letter about someone called Marc the prediction system should ideally suggest Marc each time the user types "ma" and not suggest Mark even though Mark has a higher frequency in the underlying word frequency database. By taking into account how recently a certain word was used, Marc, once used at the beginning of the letter, will continue to be suggested at later occasions because it was recently used. The user can adjust the degree of emphasis given to recently used words, ranging from no emphasis to a strong emphasis.
3.3
Interface and advanced settings
The prediction system can present the suggested words and word combinations as a horizontal list inside the on-screen keyboard window (see figure 3) or in a vertical list in a separate window. The user can choose to get as many suggestions at the same time as will fit in the window (i.e., the presented number of suggestions varies depending on the length of the suggested words) or set a maximum number of suggestions. Typically, the higher the number of suggestions the more time the user will spend on looking to see whether the wanted word or word combination is already suggested. For a user wanting to exert a minimal amount of effort in terms of moving the cursor around and clicking this is no problem, but the typical user will want to find a balance between the amount of effort saved and the time spend on looking to see whether the right suggestion is already there. For most of the tests in this paper we have therefore chosen to use the software’s default setting which presents an average of 9 suggestions at a time (the actual number varies with the length of the suggested words and word combinations). Larger physical effort reductions can be achieved by allowing a higher number of suggestions, but for most users this would slow down typing due to the time required to look for the right suggestion.
Figure 3. KeyStrokes with the horizontal “inline” prediction turned on. In addition to word prediction the KeyStrokes software also includes auto-type facilities that automatically add and remove spaces and invoke capitals in those places where this is needed for correct orthography and grammar. A setting allows the user to follow the rules for their language. Automatic placement of spaces after words and punctuation reduces typing efforts. This feature was turned on during all tests. The KeyStrokes software automatically collects a number of efficiency statistics that were used for the analysis in this paper.
3.4
Measurement protocol
Based on the characteristics of the PolyPredix™ prediction system used in the KeyStrokes on-screen keyboard software as discussed in the previous section a measurement protocol was defined. When examining the benefits of prediction software there are two ways of looking at it: typing speed gain and effort reduction. Typing speed gain is not that easy to objectively measure as the speed and precision with which someone can move the cursor around, the speed with which someone sees the right word, etc. varies greatly from person to person and will also depend on factors such as whether someone is tired or not. Effort reduction is much easier to measure objectively as it can be expressed in terms of the number of letters that do not need to be individually click-typed because a suggested word or word combination can be clicked to gain a number of characters at once. On the long run effort reduction is also much more important for many people with disabilities, because it will allow them to continue typing for a longer period with less effort and thus effectively produce more output. So our measurements focus on effort reduction rather then typing speed gain. To facilitate objective measurements a special version of KeyStrokes 3.2 was created that performed a typing simulation. In simulation mode, KeyStrokes automatically types a pre-selected text at a speed of two clicks per second, thereby automatically switching between clicking on keys or suggestions as needed to type the text. This leads to a completely objective measurement of the maximum benefits of the various tested settings (whereby maximum refers to the situation whereby the user never overlooks a suggestion or accidentally types the wrong letter). Real world performance will of course be a bit less as even fast click-typers will typically type a little slower than 2 clicks per second and will occasional not see that the correct suggestions is already given and therefore click on a letter when a suggestion could already have been clicked. The simulated typing speed was not set higher than 2 clicks per second so as not to interfere too much with the prioritization of recently typed items and it was not set lower so as not to let the tests take forever (single test runs took between 15 to 30 minutes, depending on the prediction level). These are the basic steps of the measurement protocol: 1) A clean install of the standard prediction dictionaries was done and the dictionaries were blocked for write access so that simply quitting and relaunching KeyStrokes was enough to start each test with a clean slate. 2) The active learning dictionary was cleared so as not to benefit from words and word combinations learned
during previous test (except for those tests that looked at the benefits of previous learning, for those test the learning dictionary was not cleared but the last usage was reset so as not to benefit tests run quickly after one another). 3) The software was reset to the default factory settings to ensure all tests started with the same clean slate. 4) Any necessary changes to keyboard size and number of items suggested and prediction level were made depending on the specific test. 5) Two UK English short stories (around 700 words) were used as test material: An adult story called “Fishing For Jasmine” Text1 (http://www.eastoftheweb.com/shortstories/UBooks/FishJasm.shtml) and a children story called “High and Lifted Up” (http://www.eastoftheweb.com/shortstories/UBooks/HighLift.shtml) Text2. 6) The KeyStrokes prediction statistics were reset to zero before each test. 7) A simulation was run. 8) Effort reduction was measured for each test using the build in statistics and calculated as the number of characters "typed" versus the number of clicks needed as well as in terms of the number of pixels the cursor had to move to click the right key or suggestion.
4
Results
In a first test the effect of changing the maximum number of suggestions was examined. A first simulation was done without prediction and auto-type in order to establish a base line. This actually requires more clicks than there are characters in the text because, for example capitals require the use of the shift key plus a letter key. Something similar applies for certain punctuation characters. Next simulations were run with a maximum of 5, 10, and 15 suggestions. The results are shown in table 2. Two things may be noted. Firstly, even with just five suggestions (and auto-type) a considerable efficiency gain can be established. Secondly, the higher the number of suggestions, the greater the efficiency gains. Ten suggestions already reduce the number of clicks to less than half.
Table 2. Influence of the number of presented suggestions on typing efficiency gains (in terms of click reduction) No pred, no autotype 3687
Max.5 suggest.
Max.10 suggest.
Table 3a. Influence of prediction type on typing efficiency gains in terms of click reduction Word-completion
Max.15 suggest
Number of 1991 1813 1729 required clicks Efficiency 1.85 2.03 2.13 gain Note: Default settings (multi-word prediction, UK dictionary, learning on, auto-type on) plus fixed maximum number of suggestions. Text1 “Fishing For Jasmine”, 3587 characters, 21 words and 899 word combinations not in standard prediction dictionary. In a second test, a comparison was made between no prediction, word completion, next-word prediction and multi-word prediction. The base line was again a simulation without prediction and auto-type. Subsequently, two runs were completed for each prediction level. A first run that started out with a completely clean slate and a second run in which the words and word combinations that were absent from the standard dictionary had been learned during the first run. So the first run in each case shows outof-the-box efficiency gains and the second run shows the efficiency gains that can be had if the complete vocabulary of a text is already known. In practice, a situation where all words and word combinations are known is not that likely to occur so these figures may be somewhat inflated. However, in other respects efficiency gains can be expected to increase in practice once both the standard and the user dictionaries have become adjusted to the frequencies with which words and word combinations are used by a particular user. Table 3a shows the results of the second test in terms of the click reduction achieved by the different prediction levels. Table 3b shows the results of the same tests, but in terms of the reduced cursor movements. As can be seen from table 3a, out-of-the-box efficiency gains are considerable for all prediction levels, with multi-word prediction reducing the number of clicks to less than half and offering a 17-19% advantage over word-completion and a 5% advantage over next-word prediction. However, where multi-word prediction really shines is when the complete vocabulary of a text is already known. Then it can reduce the amount of effort to less than one-third, whereas the benefits for word-completion and next-word prediction compared to the initial run are minor. This suggests that for users who write regularly about specific topics and frequently use the same sentence constructs and expressions multi-word prediction offers considerable gains over less advanced types of word prediction.
Next-word prediction
Multi-word prediction
First run Text 1 Text 2
1.84 1.90
1.96 2.04
2.01 2.09
Second run Text 1 1.93 2.25 3.31 Text 2 1.95 2.23 3.09 Note: Default settings (UK dictionary, learning on, auto-type on) plus different prediction levels. Text 1: “Fishing For Jasmine”, 3587 characters, 21 words and 899 word combinations not in standard prediction dictionary; Text 2: “High and Lifted Up”, 3922 characters, 18 words and 1054 word combinations not in standard prediction dictionary. In table 3b it can be seen that word prediction also offers benefits in terms of reducing cursor movements. Apparently, the additional movements needed to move to the horizontal list of suggested words (located just above the keys) is outweighed by the lesser number of movements due to prediction. Word completion offers outof-the-box gains of some 14 to 18% and multi-word prediction gains of up to 30%. For the second run the same pattern emerges as for click reduction. The higher levels of word prediction show greater benefits, with multi-word prediction almost reducing the total cursor movement distances in half. Table 3b. Influence of prediction type on typing efficiency gains in terms of reduced cursor movements Word-completion
Next-word prediction
Multi-word prediction
First run Text 1 1.14 Text 2 1.18
1.24 1.28
1.26 1.30
Second run Text 1 1.16 1.40 1.97 Text 2 1.19 1.38 1.78 Note: Default settings (UK dictionary, learning on, auto-type on) plus different prediction levels. Text 1: “Fishing For Jasmine”, 3587 characters, 21 words and 899 word combinations not in standard prediction dictionary; Text 2: “High and Lifted Up”, 3922 characters, 18 words and 1054 word combinations not in standard prediction dictionary. The above results clearly reveal how important word prediction is in terms of reducing the efforts associated with clicks and cursor movements. As might be expected, the larger the number of presented suggestions the greater the reduction in efforts (see table 2). It was also shown how
next-word prediction and especially multi-word prediction have a clear advantage over word completion (tables 3a and 3b). This difference is even more pronounced when most of the used vocabulary is already known through previous learning. Of course, part of the benefits of learning already accrue during the first run, for words and word combinations that are used multiple times in the same text.
5
Conclusion
Among text input systems available for motorimpaired users, screen keyboards remain the most efficient type of software as long as they are equipped with an efficient word prediction system. This study has shown that KeyStrokes can be faster than the Dasher prototype [7] and allows users to type with less muscular or ocular strain. Then, a more detailed analysis of the three levels of prediction shows that it even with an average of 10 suggestions multi-word prediction can reduce typing efforts by more than a factor 2 and has clear advantages over more simple prediction systems. It was also shown that it is possible to reduce typing efforts by a factor 3 when the vocabulary is already known by the system. That should invite the development of facilities that make it easier for users to manage and use topical dictionaries for the text input.
References [1] Christian BERARD Clavier-écran : concevoir avec les utilisateurs IFRATH Handicap 2004, Paris 17 - 18 Juin 2004. [2] Philippe BOISSIERE, Daniel DOURS Vers un modèle d’aide à l’évaluation de systèmes d’assistance à l’écriture : application à VITIPI. IFRATH Handicap 2002, Paris 15 - 16 Juin 2002. [3] Philippe BOISSIERE,. An Overview of Existing Writing Assistance Systems, IFRATH First French-Spanich Workshop on Assistive Technologies October, 16-17th 2003, INJS, Paris. 2003. [4] David Niemeijer, AssistiveWare, Amsterdam, 2004, http://www.assistiveware.com/keystrokes.php [5] Wivik 3.0 English, German and Norwegian version only http://www.wivik.com/ [6] Ward, D. J., Dasher version 1.6.4, available from www.inference.phy.cam.ac.uk/dasher/, (2001). [7] Fast Hands-free Writing by Gaze Direction, by David J. Ward and David J.C. MacKay. Submitted to Nature April 2002. Published August 22 2002.