is taken into account with the H operator. This value ... Apple's Developer Website Connection [a], Shark is a simple program that lists all the current programs ...
Halfway Report for INF5261 GOMS Project Trenton Schulz March 21, 2007 When designing a system, regardless of where it will be used, one may wonder how one can evaluate the system to see how well it meets a target group’s requirements. GOMS can provide both quantitative and qualitative data about a system that can be quite useful in evaluating its usability. Even though learning a particular dialect of GOMS is not challenging, developers and designers can benefit from better tools for the job. In this progress report, we do a quick review of how GOMS, and in particular the GOMS keystroke-level model (KLM) works. We then proceed to take a look at a tool that can generate a KLMs for Qt and Qtopia applications. We examine what has been done in the project and how certain issues are dealt with in the architecture and what issues still need to be dealt with before the project is complete. We conclude with a discussion about some research questions that will hopefully be answered when the project is finished.
1 Introduction In thi “wow” document presented earlier this semester, we explored some aspects of the problem area, some of the questions that we planned on exploring, and a rough idea of what needed to be done with the project. That was a little over a month ago and it’s now time to turn in another report. This halfway report will consist of several sections. It begins with a quick summary of the problem area and questions as presented in the “wow” document. That is followed with some more background on GOMS with specific information about the keystroke-level model. Then, we will check current status of where we are with the project and where is planed for the rest of the project. This is followed by some concluding remarks.
2 Problem Area 2.1 Description of Problem Many human-computer interaction books document three parts that make up the design of any interactive product or system. They are user understanding, prototyping, and evaluation. Each of these is an important part of the process. This project focuses primarily on the area of evaluation of prototypes and methods.
1
There are many ways that one can evaluate a system. Preece et al. [Preece et al., 2002, Need page] divided evaluation into three groups: 1. Observing users: This includes such things as usability testing where users are given instructions for tasks they are supposed to do while everything they do is recorded or field studies where an evaluator follows a user through their use of the product “in the wild.” 2. Asking users and experts: Where users are asked their opinion about the system or experts are brought in to look at a product. A heuristic evaluation falls squarely in this category. 3. Modeling users: Here an attempt is made to construct a model of the user and how the user reacts to situations. This is where more of the quantifiable material lies such as Fitts’ Law, Hick’s Law, and GOMS. This project is based on the work done in GOMS and will be explained below. One of the advantages about some of the versions of GOMS is the fact that the methods can be learned relatively quickly. One of the disadvantages, is the fact that constructing a model for a system can be a bit repetitive and tedious and can be prone to mistakes. It was this issue that I hoped to tackle in my master’s thesis, where the idea is to create a tool that can automatically generate GOMS models for a user interface. Initially the focus was on applications for desktop computers and specifically applications using the Qt library. The Qt library is a cross platform C++ framework designed by Trolltech ASA for writing applications. However, I realized that this tool could also be useful for mobile devices. What’s more, it would be easy enough to adapt my work to function with Qtopia, the embedded version of Qt with additional programs and functionality for mobile and embedded devices. There also was the Greenphone, a device that was released recently targetted at developers and running Qtopia Phone Edition. This seemed to be a good oppurtunity to see if the automated GOMS model would be useable for mobile devices.
2.2 GOMS 2.2.1 Introducing GOMS GOMS stands for Goals, Operators, Methods, and Selectors and was originally presented by Card, Moran, and Newell in their book The Psychology of Human-Computer Interaction [1983, Chapter 5]. The idea is that an expert user’s error-free actions can be divided up into goals, such as “apply corrections to the halfway report for INF5261,” A goal can consist of a many sub-goals, such as, correct the third word in the seventh paragraph, move this section after section five, save the document, etc. These goals can then be sub-divided into operators and methods used to achieve these goals. Since some goals can be achieved with different operators or methods, there needs to be a way to choose which way is the best. That is where selectors come into play, offering rules about which method or operator should be chosen. Since it’s introduction, GOMS has been successful and several variants of GOMS have emerged. Each has its own strengths and weakness, but they are all based on the same central idea of an expert user using the computer in an error-free way. This project focuses on the keystrokelevel model of GOMS (KLM-GOMS or KLM). The KLM is probably the simplest expression
2
Table 1: The KLM operators with times determined by Card et al. [Card et al., 1980, 1983] Operator Description Time in Seconds P Pointing with a pointing device 1.10a K Key or button press and release 0.20b H Moving hand from mouse to keyboard or vice-versa 0.40 R(t) Time t spent waiting for the system to become responsive after actions t M Mental preparation 1.35 a
This is the original value found by Card et al.. However, Gong and Kieras [Gong and Kieras, 1994] have found using Fitts’ Law gives a more accurate result. b Based on an average typing speed of 55 words per minute Card et al. [1980, 1983]
of GOMS, but it also very elegant and can be used for a variety of situations, from text editors to a database of outer space operations [John and Kieras, 1996, pages 307, 308]. 2.2.2 How KLM works The KLM ignores the high-level goals and methods and instead focuses on the key-press and mouse movements of the expert user. These are divided into several operators that are presented in table 1, with the times it takes to execute these operators. The P operator indicates pointing with some sort of pointing device excluding a button press. While the original value of 1.10 seconds determined by Card et al. was valid for them at PARC, it has been shown by Gong and Kieras [Gong and Kieras, 1994] that using Fitts’ Law gave times that were closer to what real users performed. A pointing device button press and release or a keyboard key press and release is represented with the K operator. The K operator only indicates one key press, so two separate K operators represent shortcuts key combinations such as command+O. This value is very dependant on how fast a user can type. In table 1, the value is based on an average skilled typist. There are also points where the hand moves from the keyboard to the pointing device and this is taken into account with the H operator. This value has shown to be around 0.40 seconds for a keyboard and mouse combination, but can also vary depending on the devices being used and the distance between the device and the keyboard. The user also spends time mentally preparing to execute an operation. This can include deciding how to call a command, how a command should be terminated, or which options to choose. The M operator represents this activity and its time was originally shown by Card et al. to be around 1.35 seconds [Card et al., 1980, 1983] and has been shown to be a valid upper bound by Olson and Nilsen [1987]. Finally, there are times when the computer is “busy” doing some sort of processing, even though the user could potentially be interacting with the system. This is indicated with the R(t) operator where t indicates the time in seconds that the user has to wait. This value can be dependent on various factors such as memory, computer speed, network connection, etc.. It should be noted that this is time left over from other commands. For example, if a user is mentally preparing to execute a command. The value for t is the positive difference between the M and
3
when the system is ready. Most of the operators can be defined by following the physical movement of the person. The exception is the M operator where Card et al. developed a list of heuristics for applying operations. You start by adding in M’s in all the locations where there could be a potential for mental preparation and then start removing them where they can be “optimized out”, for example, pointing and then clicking is usually done without any hesitation for an expert user. Now that the operators are defined1 , we can use them to model how a user interacts with the system. The resulting stream of operators can provide both quantitative and qualitative information. Quantitatively, we can get the total time it takes to execute the a specific sequence of actions by summing them together as follows:
Texecute = TK + TM + TP + TH + TR(t)
(1)
And TK is defined as nk tk where nk is the number of times K is in the sequence and tk is the time to execute a K operator. The M, P, and H operators are defined similarly while TR(t) is defined as a sum of the various R(t) operators. Qualitatively, even though the goals are not explicit in the model, an evaluator can look at the operators and, along with making notes, determine patterns that could be shortened or redone. Other ideas can also be tried.
2.3 Comparison with other evaluation methods One thing to keep in mind with GOMS is that it is simply one way of evaluating a system, but it is not the only way to evaluate. It is important to remember that there are some things that GOMS cannot provide. While you can get information about how fast and efficient your interface is for expert users, you do not have any idea about how well the interface will work for novice users. It can address the usability of the system, but not the functionality, nor does it reveal if the interface will be acceptable to the users or even if they will like the interface. A list of shortcomings in GOMS circa 1990 was recorded by Olson and Olson [Olson and Olson, 1990]. Several of these issues have actually been addressed by the later versions of GOMS, but the fact remains that one should realize that this cannot solve all the issues in evaluation and it is a good idea to combine the findings with GOMS with other methods in an attempt to triangulate results. However, GOMS can certainly generate very good candidate interfaces.
3 The project 3.1 Idea Since the keystroke level model matches many of the system events that happen during the execution of many GUI applications, and many other operators can be inferred from looking at a sequence of the system events, it should be possible to construct a useful tool for KLM models with this information. What follow is the basics behind the design on how it works in practice. 1
A D operator to indicate drawing a straight line between two points was also part of the original KLM definition, but it was only used for a couple of special purpose applications, so it is not used in this project.
4
The inspiration behind the main program is taken from the Shark profiling tool that is part of Apple’s Computer Hardware Understanding Development (CHUD) tools. As detailed on Apple’s Developer Website Connection [a], Shark is a simple program that lists all the current programs running on the machine and allows someone to sample any one of them and get a listing of all the functions that are called and where it spends the majority of the time. Given that the program is so simple to use and can give useful answers in many situations, it lowers the barrier of entry for profiling applications. It is hoped that making an evaluation tool easy to use will also lower the barrier and encourage developers and designers to evaluate applications earlier in the design process.
3.2 Architecture At the heart of the design are two parts. The Listener which inspects the system events and converts them into KLM operators (or parts of an operator) and the Model which takes the operators from the listener and performs various transformations on them such as coalescing operators together, combining matched pairs of others, adding in operators it can determine from the sequence etc. The Listener object is currently part of the application under evaluation. A developer creates an instance of a Listener object when starting up the application with the application’s event dispatcher as the main argument. The Listener will then install an event filter on the event dispatcher to monitor all system events that are sent to the application. By default, the Listener does nothing and the events just pass through the filter without any inspection. When told to record, the Listener will then inspect every event and create a corresponding KLM operator for it and store the new operator in a list for later retrieval. This list grows until it is told to stop recording. The values are then kept until the next the Listener is told to start recording. Since the Listener listens for events at the system level instead of the Qt level, a backend has to be written for each operating system. This consists of implementing operating specific function in the Listener that is called by the event filter. The operators all derive from a base KLMOperator class. This base class holds the information of the type of operator, the time the operator occurred and the duration of the operator. The operator for P includes the start location of the point and the stop location. The K operator is a bit more complicated. It is divided up first into mouse and keyboard, and then further divided into press and release. The main reason for this is that this is how the underlying system delivers the events and we want the inspection to be fast and not waiting for a corresponding release, which can easily be determined after recording has stopped. It also allows for richer raw data. The mouse events include which button was pressed, while the key events holds the key code and the text of the key if it is a letter or a number. The class hierarchy is shown in figure 1. As mentioned above, the Listener simply inspects every event and stores the information for later, it fills in the information about the event at that moment and leaves information such as the duration of the event or the stop location of the point operator to be determined later. The Model does these transformations. The Model takes the information provided by the Listener and applies several transformations to the list to come up with something more human-friendly. The current transformations applied are:
5
KLMOperator type operatorTime duration
PointOperator location stopLocation
ButtonClickOperator buttons
KeyStrokeOperator key text
Figure 1: The KLMOperator class chart
• Coalescing consecutive PointOperators into one PointOperator with start and stop location and the time taken to travel that distance. • Combining KeyStorkeOperator press and release events into one KeyStrokeOperator and calculate the duration. • Combining ButtonClickOperator press and release events into one ButtonClickOperator and calculate the duration. • Look for changes that indicate a change in the hand position from the keyboard to the mouse or vice-versa and insert an H operator. Aside from changing the size of the list of operators, the transformations can be applied one after the other and are not dependent on the previous transformations. Therefore, it should be possible to add additional transformations in the future with little effort. The Model is actually a subclass of the QTableModel class that is part of Qt’s model-view classes. Meaning that the Model can be easily hooked up to any of the view classes to feed the information to a GUI. Of course, one needs to make these two parts communicate with each other. This is solved by using a client-server approach with a TCP socket. The application under evaluation functions as a server and a KLM client connects to the server and controls whether or not the Listener should be recording and asks for the operators the Listener creates. Finding out which applications have servers running and what port they are running on is handled via Bonjour, Apple’s open source implementation of zero-configuration networking Connection [b]. This setup is illustrated in figure 2. One advantage of this solution is that it is not necessary to have the client application on the same device as what is being evaluated. Using Bonjour means that one can quickly find the applications to evaluate as they will simply be added to the list of available applications inside the client.
6
Broadcasting having KLM service
Listening for KLM service
Application to Evaluate
KLM Table Model & View Network Connection
KLM Server
KLM Client
Figure 2: Client-server KLM Listener and Model
4 Current status and future plans The current version of the client and server match what was described in 3.2. However, there are several things that need to be done before the project can be marked completed. One of them is to deal with the two operators that are not currently handled in the client, M and R. The R operator is debatable whether or not it is useful for many applications since in many cases parts that you will be evaluating will hopefully be responsive. The other issue with the R operator is that the speed of the underlying system may vary a lot, resulting in a problem getting a “real” number. On the other hand, as pointed out by Luo and John [2005], the R operator is much more important for handheld application, but it has to be added explicitly. The M operator definitely should be in the sequences, but there are two ways that it can be added in. One is to follow the heuristic rules given by Card et al. [Card et al., 1980, 1983]. These rules work well, and are perfect to pipeline with the other transformations in the Model. It’s a bit unclear whether or not some extra context for some of the operators as some the rules depend on whether or not the operator is part of an argument or a command. On the other hand, since we have the actual time that it took to run the evaluation, it should be possible to look at the time between operators and determine if an M operator has actually occurred. This has been done by Olson and Nilsen [Olson and Nilsen, 1987] when trying to validate KLMs for spreadsheet applications. One thing to consider is that the person using the tool may not necessarily be an expert in using the application under evaluation. This means that there may be more M’s than there actually should be. Perhaps it can be an option depending on what is being evaluated. Most of the development has gone on using Mac OS X, and some development has been done to make things work with Qtopia Core and the Qt Virtual Frame Buffer. This is all well and good, but now it is time to take what has been done and mix it with Qtopia Phone Edition. While this should not be a big problem, the author has seen enough of a problem integrating systems to not assume that there will not be any bumps. After things are working well with the Phone Edition, it will be time to actually put it on a device and see how things work on a mobile device. The Greenphone has been secured and some time has been set aside later this week to get acquainted with it. At which point hopefully some models can be made for the various applications that are on the device.
7
5 Discussion While a lot has happened during this whole process, it will be really interesting to see what can happen once the tool is actually used in conjunction with the Greenphone. Then it will be possible to see how well KLM works with these devices. One thing, as mentioned previously, is the need to invent extra operators that are necessary for to model things correctly on the phone. Luo and John [Luo and John, 2005] found that when they were devising KLMs for the Palm Pilot, they needed to develop an operator for drawing a Graffiti character. Another thing that may need to be taken into account are events that are not under the control of the user, such as getting a phone call or an SMS while interacting with the device. Is this necessary to model correctly or has the user locus of attention has changed and it is not necessary to take into account? Finally, there is a question about both the validity of the models and what is to be done with the models once they have been created. The validity question is just to question if the models will actual match expert users within acceptable tolerances. In general, the KLM has been proven many times in other areas that it should not be a large issue, but it is still worth considering. As for, what can be done with the models after the project, it is hoped that they can be turned over to the developers of the programs for Qtopia Phone Edition and that they can use them to improve the interfaces. Though, certainly, other methods should also be used for a final evaluation of these programs.
6 Conclusion We’ve now taken a bit of a look at the backgrounds behind GOMS and specifically the keystrokelevel model. We’ve also seen the motivation for the project and what has been done thus far to realize this. While a lot of progress has been made, there are still things that need to be taken care of before the project can be fully completed. Some of these are programming and software engineering related, others are theoretical and can only be solved by proceeding further in the project. Now that some of the main pre-requisites for the course have been met, such as presenting the project both in class and in this halfway report, it is time to turn back to it and see it through to the end. It should keep the rest of the semester interesting.
References Stuart K. Card, Thomas P. Moran, and Allen Newell. The keystroke-level model for user performance time with interactive systems. Commun. ACM, 23(7):396–410, 1980. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/358886.358895. Stuart K. Card, Allen Newell, and Thomas P. Moran. The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, USA, 1983. ISBN 0898592437. Apple Developer Connection. Optimizing with shark: Big payoff, small effort, a. URL http: //developer.apple.com/tools/shark_optimize.html.
8
Apple Developer Connection. Networking—bonjour, b. apple.com/networking/bonjour/.
URL http://developer.
Richard Gong and David Kieras. A validation of the goms model methodology in the development of a specialized, commercial software application. In CHI ’94: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 351–357, New York, NY, USA, 1994. ACM Press. ISBN 0-89791-650-6. doi: http://doi.acm.org/10.1145/191666. 191782. Bonnie E. John and David E. Kieras. Using goms for user interface design and evaluation: which technique? ACM Trans. Comput.-Hum. Interact., 3(4):287–319, 1996. ISSN 1073-0516. doi: http://doi.acm.org/10.1145/235833.236050. Lu Luo and Bonnie E. John. Predicting task execution time on handheld devices using the keystroke-level model. In CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems, pages 1605–1608, New York, NY, USA, 2005. ACM Press. ISBN 1-59593-002-7. doi: http://doi.acm.org/10.1145/1056808.1056977. Judith Reitman Olson and Erik Nilsen. Analysis of the cognition involved in spreadsheet software interaction. Human-Computer Interaction, 3(4):309–349, 1987. doi: 10.1207/ s15327051hci0304_1. URL http://www.leaonline.com/doi/abs/10.1207/ s15327051hci0304_1. Judith Reitman Olson and Gary M. Olson. The growth of cognitive modeling in human-computer interaction since goms. Human-Computer Interaction, 5(2&3):221–265, 1990. doi: 10.1207/ s15327051hci0502&3_4. URL http://www.leaonline.com/doi/abs/10.1207/ s15327051hci0502?3_4=. Jenny Preece, Yvonne Rogers, and Helen Sharp. Interaction Design. John Wiley & Sons, Inc., New York, NY, USA, 2002. ISBN 0471492787.
9