Assume that in this case, patient Joe Camel, who already has lung cancer and lymphoma on his problem list now has a new problem, the human disease.
Cognitive Design and Evaluation of User Interfaces William G. Cole, Mark S. Tuttle, David D. Sherertz Lexical Technology, Inc. Alameda, CA.
Conveying human concepts to a computer is a difficult but potentially rewarding challenge. If efficient and intuitive methods for concept designation are developed, it is more likely that guidelines and alerts will be triggered at the appropriate time, that outcomes analysis can be based on an accurate and sophisticated scheme of patient similarity, and that many tasks currently performed by human coders outside the daily flow of medical care can become an automatic byproduct of this daily flow. Consideration of human cognition can guide the design and evaluation of user interfaces for concept designation. Any interface asks the user to perform one or both of two actions: to express or to select a concept that the computer system can recognize and thus act on. Expressive interfaces (speech recognition, handwriting recognition, typing) present certain challenges to human information processing. Selective interfaces (choose from an authoritative list) present other challenges. Based on an analysis of these cognitive challenges, a hybrid express/select interface is proposed, called Metaphrase. Methods derived from cognitive psychology are proposed as the basis of an evaluation. Introduction A deep and persistent issue in medical informatics is transformation of human mental concepts into accurate computer symbolizations. For example, when using an electronic medical record (EMR) it is useful, sometimes necessary, to specify indications for procedures in order that proper reimbursement can be guaranteed. Similarly, in order for an EMR to be able to suggest guidelines and offer alerts, the computer and the user must be able to coordinate concepts. How
should medical procedures, actions, attributes be designated by a caregiver in order to trigger useful computer actions? Vocabulary & Concepts On the surface this is a naming problem. A great deal of what goes on in the field of medical informatics is addressed to developing tools for dealing with various naming problems. Coding systems, authoritative institutional vocabularies, the Unified Medical Language System (UMLS) [1], all are tools for dealing with naming problems. Naming problems are often problems of communication between two concept systems. How does knowledge system A (such as me) indicate by gesture or utterance a concept to knowledge system B (you, for example)? Exactly what dance must a male prairie chicken do to indicate 'let's mate' to a female prairie chicken? Exactly what utterance in Portuguese will get me a hotel room with a bath if I'm in Brazil? Exactly what term will my hospital information system recognize for the disorder I think of as 'mad cow disease'? Before I know how to designate a concept I have to have some notion of what your conceptual space is. If I am working under a sink, should I ask you for a "ball peen hammer" or "that little hammer with one round end"? In my concept space, these are synonyms. But in order to specify the concept to you, I have to take a guess about your conceptual space before I name the thing. You may not know what a ball peen hammer is. It is not enough for me to correctly point to something in my own mental space. I have to correctly point to the concept in yours. Do you think
of rheumatoid arthritis as an immune disorder or as a skeletal/connective tissue disorder? This may affect Expressive Interfaces how I name the disorder to you. How does my We can express the name of an object or action by hospital information system (HIS) think of it? This speaking or writing it. In the case of medical may affect how I specify the disorder to the HIS. information systems, writing usually means typing, In deciding how to specify a concept when dealing although pen based free input interfaces continue to with a computer system, I need a method of draw closer to significant use. There are two types of designating things so that the computer system 'can challenge in constructing expressive interfaces, one understand', which is another way of saying 'can technological and one cognitive. Technological classify'. Historically, the way this has been achieved challenges are somewhat different for the three has been to have users accommodate to the computer. expressive interfaces (speech, pen, typing); one great That is, if users wished to take advantage of computer cognitive challenge is shared by them all. processing, they have been required to learn the Speech recognition interfaces technological computer's conceptual space and vocabulary. The command line interface is one example of this challenge. The technological problem in a speech approach. In a command line interface, every action based interface is recognition of ill defined stimuli. the computer can take is associated with some (perhaps Speech is notorious for offering an ill defined physical arbitrary) string of input characters. To trigger the pattern. Looking at a speech waveform, it is not even action, the user must type the correct string of clear where the boundaries fall between words (segmentation), much less which words are being characters. spoken. Such utterances thus tend to be ambiguous. In recent years it has become clear that we cannot Computer speech recognition systems typically require this degree of accommodation from the average caregiver. In a traditional command line interface, the reduce the potential for ambiguity by restricting the user (e.g. physician at the point of care) must recall, allowable alternatives. If a system only agrees to either from memory or from documentation, the recognize two concepts, "dog" and "protozoa", it can character strings that trigger actions or name concepts. be fairly accurate based purely on duration of utterance Demanding recall from users not only lowers (one syllable vs. four). As greater variety of expression probability of successful use, it lowers probability of is permitted, the identification problem becomes much any use at all. If command line interfaces are not an harder, and creates the fundamental tension in speech appropriate solution, what does this tell us about what recognition research. You can have any two of the would be an appropriate one? What alternative following three properties: accuracy, large recognition approaches are available? How would we know vocabulary, affordability. whether one approach was a good one, or the best The typical design solution in medicine, given that one? These are issues of interface design and low accuracy dooms acceptability to users and high evaluation. cost dooms acceptability to the institution, is to sacrifice large recognition vocabulary. As Clancey has Interface Design observed about such a system, "computerization may There are three classes of interface for specifying restrict what can be recorded and [thereby] rigidify concepts to computer systems: expressive, selective, health care interactions..." [2] Creating an affordable and hybrid. Expressive and selective interfaces each computer system that can recognize speech in a general limit concept specification in their own ways. A hybrid medical setting (i.e. not just for a circumscribed domain interface that is both expressive and selective can build such as naming cell types in pathology) remains a on the strengths of both while minimizing each's challenge. limitations.
Pen free input interface technological challenges. The issues here are very similar to those in speech interfaces. Both speech and pen free input (i.e. handwriting recognition) interfaces require an initial step of digitizing an analog input, by definition an ill defined and ambiguous input. Hand drawn letters and hand gestures are rich but vague. Was that tiny mark in the lower right of what looks to be the letter O intended to turn it into a Q? Is that the number 1 or a lowercase L? Distinguishing a Q from an O or an L from a 1 is character-based recognition. When people read free text, however, the basic unit of recognition appears to be the word, not the character. Pen-based interfaces use word recognition as well as character recognition. In the typical pen interface, three things are going on simultaneously: a) characters are being analyzed for recognition, b) segmentation is being attempted, and c) segmented character strings are being compared to a stored vocabulary and a best match is being computed. As with speech interfaces, accuracy is improved by reducing the number of items that are in the recognition vocabulary, but as with speech interfaces, a price is paid when recognition sets are small. Small recognition vocabularies are acceptable for demos, which is why we see speech and pen demos that look enticingly plausible, but are probably not acceptable for an EMR. Typing interfaces. Typed input is far less ambiguous than spoken or handwritten, because the physical input to the recognition process is not an ill defined speech waveform but a set of symbols --ASCII characters. The input to the system is digital rather than analog, a big first step. Although correct interpretation of typed input is not without its own problems (not least of which is the difficulty of interpreting misspelled words) at least it avoids the problem of limited recognition vocabularies.
the skill requirements. It is commonly believed that the typical physician has poor typing skills due to lack of experience, although this is doubtless changing as younger cohorts join the physician population. Even for someone facile with the keyboard it is very difficult to consistently type in long strings such as 'encephalotrigeminal angiomatoses' without slips of the finger or failure to recall correct spelling. For someone unused to keyboard input, having to specify such formal concepts by typing every day is tedious at best. It represents a very significant barrier to user acceptance of the system. It is therefore no surprise that word completion and spelling correction algorithms are the most commonly offered augmentations to keyboard entry systems. Anyone talking to the vendor community, those who actually build the medical information systems, will find companies who firmly believe that medical caregivers will not type, and believe that until accurate, easy to use speech or pen systems are developed, expressive interfaces have no chance of success. The cognitive challenge, in all expressive interfaces. The cognitive problem with all expressive interfaces is human recall. If the human wishes to express a familiar proposition such as "order a lipid profile to be performed" and the human recalls the correct method of expressing this to the computer "lab test: lipid profile", then the probability of success is high. On the other hand, if the human wishes to express a concept that is less easily recalled, such as "that neurological disease they get in New Guinea from eating human brains", or the human fails to recall how to express a concept to the computer, mistakenly typing, "run lipids on Mr. Johnson", probability of success is very low. Here, the concept problem becomes acute. In order for the user to achieve successful computer classification, it is not only necessary to recall which concepts the computer is capable of understanding, but necessary to recall which actions (utterances, typed strings, pen written words) are computer-recognizable correlates of these concepts.
Traditionally, the major objection to typing interfaces is the belief that line physicians at the point One way to address the recall problem is a of care are highly unlikely to use typing as the principal thesaurus. It is exactly for this purpose that the UMLS means of information transmission simply because of
Metathesaurus was created and that efforts have been launched to enrich its clinical terminology [e.g. 3]. Selection Interfaces Another way to address the recall problem is an interface that presents to the user a set of symbol strings (e.g. names, phrases) the computer can recognize and asks only that the user recognize which one expresses the concept in mind. Such an interface is usually called a 'point and click' interface, but since touch screen technology provides roughly equivalent functionality without any mouse clicking, the term selection interface will be used here, with the fundamental act of designating a concept in such a system being termed selecting. Probably the most important strength of such an interface is that selecting is based on recognition rather than recall. Human recognition is far more successful than recall. For example, roughly one college undergraduate in ten will be unable within 30 seconds to name a fruit that begins with the letter "S" and yet none fails to recognize that a strawberry is a fruit (Similar examples are Fruit-Q or Fruit-K). [4]. Recall involves spontaneous generation of items from memory (or from written documentation), and such generation is time consuming, effortful, and prone to failure. The weakness of a selection interface lies in the length of list. Finding the term 'encephalotrigeminal angiomatoses' within a list of terms is feasible if there are only 100 terms, and they are organized either alphabetically or by some semantic scheme (e.g. anatomic site). Finding that term by scrolling though all the items in the Metathesaurus, however, would be both time consuming and tedious. This is the same principal problem shared by speech and pen free input systems. Short lists make for good performance but greatly limit what may be designated. Long lists allow designation of a richer set of concepts but lead to slow and tedious performance. To review the main problems identified thus far:
(1) speech and pen free input interfaces tend to rely on recognition vocabularies which work best when they are short, but short lists limit human expression. (2) typing interfaces require caregivers to type, which many feel is a high barrier to acceptance. (3) all expressive interfaces require the user to recall which kinds of things the computer system knows about and how to express these things in terms the computer can recognize. (4) selective interfaces also rely on pre-defined vocabularies and again shorter vocabularies work best because finding an item in a long list can be time consuming. A hybrid expressive/selective interface. A hybrid expressive + selective interface might be able to minimize at least some of these problems. Imagine an interface that allowed the user to express a concept in casual, relatively idiosyncratic terminology but quickly arrived at an authoritative, computer recognized name for this concept. A hybrid system might allow such performance, using a strategy similar to that of the human visual system. A great deal of human vision including reading is based on saccadic eye movements. Every 250 milliseconds or so, the eye makes a ballistic motion, leaving one fixation point and arriving at another. In saccadic visual processing, the eye only processes information while at rest, during the fixation period between saccades. (There are two other forms of eye movement, nystagmus and smooth tracking. They have different characteristics) It turns out that this is a highly efficient use of the retina's limited number of receptors because such a system allows for two stage processing. Before a saccadic eye movement begins, relatively crude processing can suggest a neighborhood in the visual field where the next saccade should arrive. Based on this crude "guess", the eye is yanked over to that neighborhood and at that point very high quality processing can refine the information gathering. It is as though 20% of the processing gets your eye to the
neighborhood, leaving the other 80% of the processing for high quality processing once you are there. Similarly, a hybrid expressive/selective system might allow a user to perform an initial casual expressive step that need not be perfectly accurate. This step would rather perform the function of narrowing the vast range of possible concepts to a neighborhood and then a second selective step would permit exact specification of the concept in mind. The first step would be expressive, the second selective.
Figure 1 shows one way such a system might look. There are three columns, corresponding to three steps in getting from caregiver's concept to EMR-coded authoritative concept. The left column permits either free text entry (expressive interface) or selection from a short list of frequently used terms. The middle column shows matching authoritative terms the computer system recognizes, from which the user selects the one intended. The right column shows the current state of authoritative terms associated with this patient, in this case a problem list.
Figure 1. A hybrid express/select interface for concept designation.
Assume the task is the following. A caregiver at the point of care wishes to have an authoritative term (i.e. the term used by his or her healthcare institution) for a concept be added to this patient's problem list. The caregiver does not wish to have to memorize all the authoritative terms, nor become an expert typist. Assume that in this case, patient Joe Camel, who already has lung cancer and lymphoma on his problem list now has a new problem, the human disease
resulting from interacting with an animal with mad cow disease. If the user were to quickly type in an easily recalled version of the concept in mind, not worrying too much about spelling, the entry might look like Figure 2, with one misspelled word. Alternatively, the user might have typed in 'BSE", which is a common acronym for "bovine spongiform encephalopathy", or "mad cow dis*" to trigger a word completion algorithm.
Figure 3. Results. Authoritative terms the computer system has within its classification scheme. The user need only select whichever correspond most closely to the originally conceived concept.
Figure 2. Entry of a casual, easy to recall term, with loose spelling. In the second column, results appear, as in Figure 3. These are authoritative terms recognized by the medical information system (by consulting an augmented version of the UMLS Metathesaurus). The user's task is simply to select one or more of these terms, verifying that this is the concept intended, the one that should go onto the problem list for this patient.
Figure 4. The final result. This patient's problem list is one item longer than at the outset. The new item both corresponds to the user's original concept and to the computer's categorized set of concepts. Furthermore, in the Metaphrase interface, there are provisions for navigating to knowledge [5]. Suppose the initial term that comes to mind is not really a synonym of the concept that should go on the problem list. Figure 5 shows the results of selecting a show neighborhood option. The user in this case has chosen to view the local neighborhood of bovine spongiform encephalopathy within the MeSH hierarchy. With only a moment's consideration, it is easy to select, finally, the exact term desired for the patient's problem list, Creutzfeldt-Jakob Syndrome.
human-computer interaction there is a third methodology which blends these two, called the usability study [6]. Usability studies investigate preference and performance. That is, they look at how much users like a system but also at how effectively users can use the system. Although these two are correlated, they are both logically distinct and they do not always go together [7]. Two types of evaluation are being carried out on the hybrid interface presented here. The first is a usability laboratory evaluation being conducted at the Mayo Clinic under the supervision of Professor C. Chute, with guidance and collaboration from Professor V. Patel. (The Mayo Clinic evaluation involves a slight variation on the interface presented here. This Mayo variation is named "YATN".) The second is a set of clinical deployments, one of which will take place in the first Quarter of 1997 at the Mayo Clinic. Another clinical deployment, in the second Quarter of 1997, will take place at BethIsrael Hospital in Boston under the supervision of Professor C. Safran. These two types of evaluation will address complementary issues.
Figure 5. Navigating to knowledge. Having input the casual initial term 'mad cow disease', the caregiver now sees a neighborhood of closely related terms and can pick the exact concept that should be encoded in the problem list. Evaluation How would we know which type of interface we should use? How can we evaluate any interface to determine if it is at all good, much less the best among our choices? Or how can we determine the circumstances under which one approach is likely to work, while alternatives would work less well? In the study of human psychology, there are two broad classes of evaluation methodology: experimentation and field study. In the study of
Usability lab testing addresses performance issues at a fine grain. How well does the typical user carry out one single task? What errors do users make while carrying a task out? If they talk aloud as they carry out the task, what sort of mental model do they seem to have of the system, how it works, what it is capable of, and how to operate it? The strength of usability testing, as in any laboratory research, is control. It is possible to give each user the same task or tasks. It is possible to record their behaviors, including the time required to do things. It is also possible, once a rich record of user behavior has been captured on videotape, to analyze these data carefully over many days. The weakness of usability testing is that of all laboratory research. It lacks ecological validity. For example, if all users are asked to perform the same task, then this task is not one that was personally generated by the user as a part of daily work flow.
The strength of evaluating clinical deployment lies in ecological validity. Caregivers use the system within the actual task environment. performing tasks that are self generated. Conclusion Classes of interface have characteristic consequences for human cognition. Expressive interfaces all challenge human recall. Additionally, expressive interfaces such as speech and pen free entry that attempt to digitize analog input rely on recognition vocabularies that can limit and distort the way References
caregivers conceptualize medical actions, events, and entities. Selective interfaces face a similar tension between the benefits of creating longer lists of selectable items (greater likelihood of finding the exact concept to be expressed) versus the costs of such lists (long lists are hard to search). A hybrid approach may minimize both sets of problems, by allowing rapid expression of a term that approximates the concept in mind, followed by an exact selection from a small semantic neighborhood of authoritative terms. Evaluation of this interface is taking place at two levels: usability lab testing and clinical deployment.
4. Cole WG. Structure and processing in semantic memory. Unpublished doctoral dissertation. 1980.
1. Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Meth Inform 5. Tuttle MS, Cole WG, Sherertz DD, Nelson SJ. Med 1993:32: 281-91 Navigating to Knowledge. Meth Inform Med 1995; 34 214-31. 2. Clancey WJ. The Learning Process in the Epistemology of Medical Information. Meth Inform 6. Neilsen J. Usability Engineering. Academic Med 1995: 34: 122-30. Press, San Diego, Calif. 1993. 3. Cimino JJ. Use of the Unified Medical 7. Neilsen J, Levy J. Measuring Usability: Language System in Patient Care at the Columbia- Preference vs. Performance. Comm of the Assoc for Presbyterian Medical Center. Meth Inform Med 1995; Computing Machinery 1994; 37: 66-75. 34 158-64.