Effectiveness of Augmented-Reality Visualization versus ... - CiteSeerX

31 downloads 36 Views 710KB Size Report
ment of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261. .... the stages as transitioning from declarative knowledge to procedural ...
1

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions in Near Space ROBERTA L. KLATZKY, BING WU, and DAMION SHELTON Carnegie Mellon University and GEORGE STETTEN Carnegie Mellon University/University of Pittsburgh

The present study examined the impact of augmented-reality visualization, in comparison to conventional ultrasound (CUS), on the learning of ultrasound-guided needle insertion. Whereas CUS requires cognitive processes for localizing targets, our augmented-reality device, called the “sonic flashlight” (SF) enables direct perceptual guidance. Participants guided a needle to an ultrasound-localized target within opaque fluid. In three experiments, the SF showed higher accuracy and lower variability in aiming and endpoint placements than did CUS. The SF, but not CUS, readily transferred to new targets and starting points for action. These effects were evident in visually guided action (needle and target continuously visible) and visually directed action (target alone visible). The results have application to learning to visualize surgical targets through ultrasound. Categories and Subject Descriptors: H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2 [Information Interfaces and Presentation]: User Interfaces— Evaluation/methodology General Terms: Performance, Experimentation, Human Factors Additional Key Words and Phrases: Perception, learning, augmented reality, motor control, spatial cognition ACM Reference Format: Klatzky, R. L., Wu, B., Shelton, D., and Stetten, G. 2008. Effectiveness of augmented-reality visualization versus cognitive mediation for learning actions in near space. ACM Trans. Appl. Percpt. 5, 1, Article 1 (January 2008), 23 pages. DOI = 10.1145/1279640.1279641 http://doi.acm.org/10.1145.1279640.1279641

1.

INTRODUCTION

Medical technology increasingly supports surgical intervention without direct sight of the affected tissue. In laparoscopic surgery, for example, practitioners perform procedures through a small incision This research was supported by National Institutes of Health Grant # R01EB00860-03 and National Science Foundation Grant # 0308096. Authors’ addresses: Roberta L. Klatzky, Department of Psychology/Human-Computer Interaction Institute, Carnegie Mellon University, Pennsylvania 15213; email: [email protected]; Bing Wu, Robotics Institute/Department of Psychology, Carnegie Mellon University, Pennsylvania 15213; Damion Shelton, Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213; George Stetten, Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213/Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2008 ACM 1544-3558/2008/01-ART1 $5.00 DOI 10.1145/1279640.1279641 http://doi.acm.org/10.1145/1279640.1279641  ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 2



R. L. Klatzky et al.

Fig. 1. Real-time tomographic reflection (RTTR) with ultrasound. Through the half-silvered mirror, the ultrasound image is projected as if it “shines out” from the transducer and illuminates the inner tissue. Binocular depth cues including convergence and accommodation of viewer’s eyes and disparity between the two retinal projections of the ultrasound slice are naturally available for localizing the target in 3D space. (Figure reproduced from Wu et al. [2005], with permission).

while viewing a remote screen. Ultrasound-aided surgery is another important example of this trend, finding increasing use in biopsies and catheter placement. Remote views present problems, however, as they disconnect the perceptual basis of action from the action itself. To address this issue, a number of augmented-reality techniques have been developed to create a unified environment for both perception and action by superimposing the medical images on the patient [Bajura et al. 1992; Fuchs et al. 1996; Sauer et al. 2001; State et al. 1996; Stetten et al. 2000]. Importantly, for the present paper, this reintegration has implications for the time course and efficacy of learning. Our work compares two kinds of ultrasound displays that provide visual support for action, but that differ in both fundamental and practical ways. One of the imaging systems is conventional ultrasound (CUS), a paradigm in which the user acquires data from a transducer held against the patient’s body, while viewing it on a displaced screen. The second system is an augmented-reality device that produces a virtual counterpart of the ultrasound data at the location of the scanned anatomy, by using a technique called real-time tomographic reflection (RTTR) [Masamune et al. 2002; Stetten and Chib 2001; Stetten et al. 2000]. The RTTR device used in this research, developed by Stetten [2000] and schematized in Figure 1, is called the sonic flashlight (SF), because the ultrasound image emerges as a virtual slice, shining out from the tip of the ultrasound transducer. Through RTTR, light rays from the image are reflected back to the eyes of the user, providing binocular depth cues, including convergence and accommodation (deformation of the lens by the oculomotor muscles to maintain near focus) of the eyes and disparity between the left and right retinal images. These cues cause the virtual slice and the target within it to be perceived at the appropriate orientation and depth plane within the body. From a psychological perspective, there are a number of important differences between CUS and the SF as a vehicle for ultrasound-guided action, as shown in Figure 2. Most fundamentally, because ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 3

Fig. 2. Schematic of the perceptual/cognitive issues in the two visualization devices. (a) In CUS, there is displaced hand– eye coordination, as the user has to look away from the site of operation to check information on the display. In addition, demanding cognitive processing is required to normalize the metric of the display, align multiple frames of reference, and form a representation of the target for planning and guiding the action. (b) The SF allows the user to aim the needle directly at the virtual target, circumventing the displaced sense of hand–eye coordination. (Figure reproduced from Wu et al. [2005], with permission.)

the SF displays the imaged anatomy at its true location, it allows action to be directly guided by the perceptual system. Sight of the patient’s body, the US slice and target, the needle, and the hands are all superimposed on each other and synchronized with the proprioceptive and motor-control systems. This provides a natural perceptual-motor coupling, and motor behavior can be assisted by immediate, continuous, visual feedback of action results. In contrast, CUS, by displacing the image, requires mediating processes to construct a mental representation of the spatial location to which actions are directed. The processes that build the representation from the CUS display are cognitive, spatial transformations. They include scaling the image on the remote screen to the action space. For this purpose, the screen typically displays a calibration meter (markings in centimeters), the scale of which changes on the screen if the user zooms in or out, necessitating cognitive rescaling. (The fixed calibration of the SF precludes zooming.) The CUS user must also coordinate the spatial frame of reference provided by the display to the frame that governs action. He or she must mentally translate and reorient the displayed ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 4



R. L. Klatzky et al.

region to align it within the two frames of reference. In addition, memory processing is demanded by CUS as the user looks back and forth from the patient to the screen. When it comes to the medical intervention itself, the CUS user must act on the patient while looking away at the displayed ultrasound slice, guided by the mediating cognitive representation. Recently, we showed that the direct perception afforded by the SF has two important consequences [Wu et al. 2005]. First, it leads to a more accurate internal representation of the location of the target than does CUS. This was demonstrated by having participants point to an ultrasound-imaged target from multiple locations, then combining the responses in 3D space to compute the perceived location. Second, because action is guided by the internal representation, the SF leads to greater accuracy in accessing targets. This was demonstrated by a task in which participants conducted a needle to the ultrasound-viewed target. Although they were not experienced, users of the SF tended to aim directly toward the target and direct the needle to it on a straight trajectory. The same participants, when using CUS, showed slower movements, greater variability, and systematic bias that was predictable from errors in their perceptual localization of the target. Slower performance was confirmed by Chang et al. [2006], who found that novice users performed faster with the SF when guiding a needle to penetrate a simulated vessel within a gel phantom. In the present paper, we turn to the differences between CUS and the SF with respect to the time course and specificity of learning. The acquisition of motor skills has been extensively studied. In a seminal paper, Fitts [1964] decomposed the learning time course into three phases, which he called cognitive, associative, and autonomous. Anderson [1982], in modeling cognitive skills, characterized the stages as transitioning from declarative knowledge to procedural knowledge. Regardless of terminology, the general idea is that the learner first capitalizes on pre-existing, generally applicable mental processes, particularly cognitive mediation, to perform the task. Over the course of learning there is a transition to mechanisms that are task-specific and that allow performance to become more automatic. The idea of this transition is supported by measurements of brain activation over the course of learning perceptual and motor skills. A general tendency has been found for activation to shift from frontal cortical areas, associated with cognitive, and therefore limited capacity, mediation, to cortical or subcortical areas associated with more direct sensorimotor control [Floyer-Lea et al. 2004; Pollmann and Maertens 2005; Sakai et al. 1998; Tracy et al. 2003]. The specificity, or conversely, the generality, of motor learning has been a topic of considerable research. Since the early 20th century, it has been proposed that practice at one task will facilitate another to the extent that the two tasks have elements in common [Thorndike 1913; Thorndike and Woodworth 1901]. One of the often-noted problems of this dictum is the difficulty in identifying what elements define a task, especially as this can depend on how the performer represents it. The internal representations that control motor skills are often characterized as having multiple levels of elements arranged hierarchically [Rosenbaum 1991], where the lowest elements are sensorimotor mechanisms and the higher elements represent organization and control. Given this structure, elements that might underlie generalization can be defined at several levels. For many tasks, the course of skill development relies primarily on learning at the higher levels in the hierarchy, as the lower levels are well-learned from the outset [MacKay 1982; Welford 1968]. If, for example, learning to write script letters produces high-level control elements for relative timing and scale-independent shape, a person trained to write on a tablet will quickly adapt to writing on a blackboard, although the joints and muscles involved are quite different. High-level control comes at an expense, however. It can be slow, particularly if the parameters of lowlevel actions must be input on the fly [Anderson 1982]. Moreover, generalized motor learning appears to be relatively fragile and subject to perturbation [Seidler 2004]. Accordingly, a body of evidence suggests that training action skills often defaults to introducing low levels of control, limiting generalization, but ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 5

promoting speed, accuracy, and stability. One finding to this effect pertains to the effects of repetitive versus variable practice. Trial-to-trial variation has often been found necessary to induce generally applicable motor skills, but in the absence of enforced variation, learners tend to develop stereotypic motions that generalize little [Schmidt and Bjork 1992; Seidler 2004]. An analogy is provided by cognitive tasks, such as letter arithmetic (does A + 4 = E?), where people who are given repeated problems quickly transition from using rule-based reasoning to retrieving previous answers from memory. As a result, they do not become more skilled at applying the rule and learning of this sort does not generalize across problems [Logan 1988, 1990]. Furthermore, studies of brain activation suggest that the governance role of sensory input for repeated actions decreases as skill develops and movements become automatic [Jenkins et al. 1994; see Gazzaniga et al. 2002]. We now turn to the implications of these findings for learning ultrasound guidance, and, in particular, to potential differences between conventional ultrasound and the sonic flashlight. By virtue of augmented reality, the SF enables the user to execute actions under direct, perceptual guidance. To the extent that this sensory-guided motor control is well-trained in everyday life, it should require little learning to reach a high level of performance, limited only by constraints of motor precision or perceptual acuity relative to the imaged data. (In our context, ultrasound quality and display size afford clear visualization of the target.) Conventional ultrasound, however, requires cognitive mediation to build a spatial representation, to which action can be directed. Although the sensorimotor components of the task may be well learned from the outset, the mediation processes must be acquired. Accordingly, learning should be slower and generalization should be limited by the nature of what is learned. Conventional ultrasound guidance comprises two components to be learned. One is forming a representation of the target location in space, given screen input. This requires cognitive processing. The other is guiding action toward this cognitively mediated representation, which is not equivalent to perceptually directed action. There are, then, three types of training outcomes that can be distinguished. First, at the most general level, people could learn both processes: to form a mediated spatial representation from any arbitrary screen input and guide action to it. Second, they could learn one of the processes and not the other. For example, they could learn to effect cognitively mediated action, but their ability to form a spatial representation could be specific to the particular screen input that has been used in training. The third type of learning is the extreme case in which neither the skill of building the representation nor its mediated guidance of movement is acquired; instead, people learn only to make movements along the specific trajectory in space that has been trained. These types of learning have different consequences when the task changes, as follows. First, if training produces the ability to form an arbitrary representation and act on it, it should readily generalize to new targets broadly across space. Second, if people learn a spatial representation specific to the trained target—that is, they fail to learn a general cognitive skill of forming spatial representations, but they do learn the skill of guiding action to that cognitively mediated representation, they will not generalize to new targets but should be able to reach the previously trained target from a new starting point. Third, if what is learned is even more narrowly confined to the trained action itself (whether or not a general representational skill has been learned), new targets or new approaches should both be problematic. These possibilities were discriminated in the present research. Participants performed repeated trials in which they conducted a needle through an opaque fluid to a single target, using ultrasound guidance. The required action typically unfolds over a time course on the order of 10 to 20 s [Wu et al. 2005], rather than being a ballistic aiming response ( 0.2). ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 14



R. L. Klatzky et al.

Because the shift in the target was accompanied by changes in both the target location (necessitating a new representation) and the required angle of approach, the lack of generalization could stem from either of two sources: CUS users could fail to learn a general skill of target representation or a general skill of needle guidance to a cognitively mediated representation. Either failure would thwart generalization. In Experiment 2 we held the target constant across the transfer point and changed only the action trajectory. If the guidance process has been learned as a general skill, but the representational process has not, users of CUS should be able to generalize after the shift. 3.

EXPERIMENT 2: THROUGH-PLANE INSERTIONS, RESPONSE SHIFT

Experiment 2 again comprised a learning phase and a shift phase, but this time the radius of the entry location for the response was shifted, not the target, as in Experiment 1. Thus only the ideal action trajectory was changed; localization of the target remained constant. 3.1

Method

3.1.1 Participants. Ten new participants, ranging in age from 20 to 33 years old, were tested. All were right-handed and had normal or corrected-to-normal vision and normal stereo acuity. 3.1.2 Apparatus and Procedure. The experimental apparatus and procedure were identical to the previous experiment, except that only two phantoms were tested. Both had a target at the depth of 5.0 cm and two sets of entry points on the lid. One set of entries, positioned with a radius of 5.0 cm (relative to the lid location above the target) and imposing insertions with an elevation angle of 45◦ , was used in the learning sessions, while the other set (radius = 7.1 cm and elevation = 35◦ ) was used in the postlearning session. That is, after three learning sessions, the participant shifted to entries from a new radius, but guided insertions into the same target in 3D space. 3.2

Results

3.2.1 Learning. The data are shown in Figure 5b and Figures 6–8 (right panels), for ease of comparison with the data from Experiment 1. Overall, the learning results of the two experiments were very similar. The learning analysis involved only the first three blocks, which used a constant target. As before, endpoint accuracy started out at a lower level for CUS than SF, which showed continually high accuracy across blocks. Thus, both the success rate and distance from the target showed effects of block, device, and an interaction. Also, as previously, the SD in both aiming elevation and azimuth showed greater consistency for the SF. Only the SD in aiming elevation decreased over blocks, and then only for the CUS, as reflected in a block by device interaction. Finally, the spatial dispersion in the endpoint location was again consistently greater for CUS and tended to decrease over blocks, this time for both devices (hence, no interaction was found). 3.2.2 Generalization. Comparison of the preshift and shift blocks (3 and 4), shown in the same figures as for the learning analysis, again showed failure to maintain performance for the CUS with respect to success rate and distance from the target, leading to a block by device interaction, as well as an overall advantage for the SF in these accuracy measures. There were, however, some changes from the pattern observed in Experiment 1. Specifically, no effect of block or device was found for the SD in either aiming elevation or azimuth; the variability was statistically constant across the pre- and postshift blocks. In contrast, the measure of endpoint dispersion showed a significant increase for both devices, confirming the trend in Experiment 1, along with an overall advantage for the SF. 3.2.3 Trajectory Variability. The right panel of Figure 9b shows the RMS deviation along the insertion trajectory. The same points pertain as in Experiment 1: The trajectory variability is small relative ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 15

to the spatial dispersion in the endpoint location, indicating that the latter measure is not simply because of motor error. Moreover, the two devices show similar trajectory variability, indicating that sight of the hand—available with the SF but not CUS—is not necessary to control motor noise. 3.3

Discussion

The results of Experiment 2, in which there was a change only in the required action at the shift point, were very similar to Experiment 1, where the equivalent action change was accompanied by a perceptual shift. Again, every measure showed better performance with the SF than CUS. CUS users showed greater learning over the training blocks with respect to success rate, distance from target, and variability in aiming elevation. This reflects the poor initial level of performance, as the early success with the SF allowed little room for improvement. As before, at the point of shift, CUS users showed a decline in performance with respect to measures of accuracy, i.e., success rate and distance from target. The latter increased by 31.6% relative to block 3, to a level statistically equivalent to untrained performance in block 1 (t(11) = 0.110, p > 0.9), an effect very similar in magnitude to that found in Experiment 1. This experiment also showed a performance decrement for both devices after the shift with respect to endpoint dispersion. These results disconfirm the hypothesis that the process of guiding action to a cognitively mediated spatial representation was learned as a general skill by users of CUS; if so, since the spatial representation was held constant (i.e., the target did not move), they should have been able to generalize after the shift. The combined results of the two experiments show very similar decrements in performance after an action shift, whether or not the target location changed. This is consistent with the idea that what participants learn is to perform a specific action, perturbation of which brings performance back to near the starting level. This result occurred despite the use of two different entry points at the same radius (hence, the same ideal angle of approach) during learning, which precluded an invariant body posture. Although participants could not rely on a fixed pattern of muscle activation, learning was still not general enough to transfer to a target having a 10◦ difference in elevation angle. 4.

EXPERIMENT 3: IN-PLANE INSERTIONS; TARGET SHIFT

Experiment 3 further tested the narrowness of what has been learned by reducing the need for cognitive mediation to represent the target location. It did so by using in-plane insertions. Ultrasound guided actions can vary with respect to the geometrical relation between the needle and the image slice. With in-plane insertions, users pilot the needle toward the target along an axis that remains within the imaged slice. The needle and its effects on the surrounding medium can, thus, be imaged and tracked continuously. With through-plane insertions, the needle’s pathway lies perpendicular or oblique to the image plane. It becomes visible only as it approaches and intersects the plane of the image (although the hand and exposed portion of the needle can continuously be sighted). Figure 3 illustrates the difference between the in- and through-plane cases. Users of the SF have two potential strategies for the in-plane task. If they continue to guide action by perceived end-point location, there should be no difference between in- and through-plane insertions for the SF. However, if they adopt an alternative strategy of aligning the needle with the virtual slice plane (the edge of which can be seen through the augmented-reality projection), they can take advantage of closed-loop control. Given the high performance level achievable with the end-point guidance strategy, one would expect little gain in accuracy from the closed-loop strategy. To the contrary, it might produce some degree of slowdown, and initial aiming variability might increase because of the potential for later correction. In contrast, CUS users should clearly improve from the through- to the in-plane situation. CUS allows no visual feedback during through-plane insertion until the needle reaches the scanned area, ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 16



R. L. Klatzky et al.

whereas in-plane insertion affords closed-loop control by allowing the user to track the needle within the image. One consequence should be an advantage for in over through-plane insertions with CUS. Of greater interest here is whether the closed-loop control made possible by in-plane insertion will increase the generality of learning with the CUS. Although users must still use cognitively mediated action to effect initial aiming and guidance to the edge of the slice, the process accommodates some error, as an adjustment can be made along the imaged trajectory. Once the in-plane configuration of the needle and transducer is reached (achieving an in-plane view), guidance becomes perceptually governed. If, under these circumstances, general skills are acquired, users of CUS should readily accommodate to changes in the target location. This would directly contrast with the results of throughplane insertion in Experiment 1. Accordingly, Experiment 3 replicated Experiment 1, but with in-plane insertions. 4.1

Method

4.1.1 Participants. Eight naive participants and two participants who previously took part in Experiment 2, ranging in age from 20 to 39 years old, were tested. All were right handed and had normal or corrected-to-normal vision and normal stereo acuity. 4.1.2 Apparatus and Procedure. The experimental apparatus and design were identical to Experiment 1, except that here the experimenter began by demonstrating an in-plane approach for the subject to follow. With both devices, this means that the progress of the needle could be seen in the ultrasound image, because of its echoic properties. In addition, with the SF, the needle itself could be directed to stay within the ultrasound slice, which was visible in the virtual image. (In CUS, the needle can still be directed to stay within the slice, but it is harder to do so.) As in Experiment 1, the participant moved on to a block of trials with a new target (3.5 cm deep) after three learning blocks. 4.2

Results

Figure 10 shows the endpoint dispersion from Experiment 3. It is apparent that, although the distribution again shows no obvious spatial bias overall, somewhat more of the variability is in azimuth, along the x axis of the figure, which represents users’ efforts at staying within the width of the ultrasound slice. The elevation errors, the y axis of the figure, are minimal, presumably because of the subjects’ ability to see the target in the slice plane as they approach it. Again, we performed separate analysis of learning (blocks 1–3) and generalization (blocks 3 and 4). 4.2.1 Learning. Figure 11a shows the success rate from all three experiments for purpose of comparison. For Experiment 3 alone, Figure 11b shows distance from the target, Figure 12 shows the variability in aiming, and Figure 13 the spatial dispersion of endpoint locations. Clearly, performance with CUS was better with in-plane insertion (Experiment 3) than with through-plane insertion (Experiments 1 and 2), whereas SF performance was high across all experiments. Despite an overall improvement with CUS, the pattern of learning remained similar to that observed previously. There was a differential learning rate, reflecting initial lower performance with CUS, for both accuracy measures (success rate, in Figure 11a, and distance from target, in Figure 11b) and endpoint dispersion (Figure 13), leading to a block effect and a block by device interaction. Four out of five measures (excluding only SD in aiming elevation) showed a significant advantage for the SF. 4.2.2 Generalization. With the in-plane insertion of Experiment 3, fewer outcome measures showed impairment and device differences across the shift blocks than were previously observed with throughplane insertion. Only success rate showed a decline in performance at the shift, in the form of a block ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 17

Fig. 10. Endpoints of in-plane insertions in the target plane when using the SF or CUS. Different symbols represent insertions by individual subjects. The x axis is distance (in cm) from target along a sagittal direction through the slice plane; the y axis is vertical distance (in cm) from target on the slice plane.

Fig. 11. (a) Success rates of insertions in all experiments. Experiment 3 (in-plane) is shown as solid lines and Experiments 1 and 2 (through-plane) as dashed lines (b) Mean distance from the needle endpoint to the center of the target in Experiment 3 only. Error bars represent ±1 standard error.

effect, and differential shift effects, in the form of a block by device interaction. Performance was also superior overall for the SF with respect to the success rate. The one remaining effect in the shift blocks was a device effect with respect to the SD in aiming azimuth, which, in this case, lies orthogonal to the slice plane. The SF provides a particular advantage in that the slice is visible in depth, which would facilitate keeping the needle within the slice as it is guided toward the target. ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 18



R. L. Klatzky et al.

Fig. 12. Mean of the within-subject standard deviations in initial aiming of the needle for in-plane insertions with the two visualization devices. Data are shown separately for elevation and azimuth. Error bars represent ±1 between-subject standard error.

Fig. 13. Mean of the within-subject spatial dispersion (distance from centroid of responses of the endpoints in the target plane) for in-plane insertions with the two visualization devices. Error bars represent ±1 between-subject standard error.

4.2.3 Trajectory Variability. Figure 14 shows the RMS deviation along the insertion trajectory, this time divided into two components—within and orthogonal to the slice plane. In both cases, as in previous experiments, no device difference was found. The within-plane deviation is similar in magnitude to that found previously for through-plane insertions (see Figure 9b), although the pattern tends to be different. Specifically, the RMS deviation increases close to the point of needle entry, as was observed previously, but now saturates as the tip approaches the target, presumably reflecting the closed-loop control. The cross-plane deviations show this same trend, but they are far greater in magnitude (almost twofold) than those within plane, a significant effect (F (1,9) = 10.27, p < 0.05), indicating large adjustments are made to keep the needle within the slice plane. Even with this increase in magnitude across the plane, the trajectory variability is small relative to the spatial dispersion in the endpoint location, shown in Figure 13. Again, this indicates that the endpoint dispersion cannot be attributable simply to motor noise. ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 19

Fig. 14. Mean over subjects of the RMS deviation along the insertion trajectory. Error bars represent ±1 between-subject standard error. (a) RMS deviation across the slice (azimuth); (b) RMS deviation within the slice plane (elevation).

4.2.4 Insertion Speed. To measure speed in Experiment 3, we used only the data after the needle appeared in the slice plane. As was mentioned above, the experiments generally failed to showed systematic learning or generalization effects on insertion speed. However, the mean speeds for Experiment 3 (0.46 cm/s for CUS and 0.55 cm/s for SF) were less than one-half the speeds observed in Experiments 1 and 2 (respectively, 1.08 and 1.16 cm/s for CUS; 1.23 and 1.27 cm/s for SF). We performed ANOVAs on insertion speed within the learning and across the shift blocks with factors of block, device, and experiment, the latter variable manipulated between participants. The decrease in speed from through- to in-plane insertion was significant, F (1,20) = 13.781, p < 0.001, in the learning ANOVA and F (1,20) = 13.698, p < 0.001, in the generalization ANOVA. The only additional effect was a block by experiment interaction in the generalization blocks, F (1,20) = 8.277, p < 0.01, reflecting a slight decrease in speed from block 3 to 4 in Experiments 1 and 2, but not Experiment 3; as this did not compromise the general trend toward slower speed in Experiment 3, we will not consider it further. 4.3

Discussion

The most important finding of Experiment 3 is that with in-plane insertions, which reduced the need for cognitively mediated representation and guidance, greater generalization to new targets was observed than previously. By most measures, users of both devices accommodated to a target shift. There was some decrement in the success rate at the shift point with CUS, but percentage-wise, relative to the preshift block (15%), it was considerably reduced from the previous experiments. We attribute this improved generalization not to developing skill in forming mediating representations that guide action, but rather to the greater role of perceptual guidance. Indeed, several findings indicate that the in-plane insertions led to a strategy shift for both devices toward using the ultrasound image as a means of closed-loop control. As a result, performance with the two devices became more similar. First, there was a striking slowing of performance with both devices for in-plane insertion relative to previous through-plane studies. Second, unlike previous studies, no reliable difference between devices emerged in the variability of aiming elevation. The ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 20



R. L. Klatzky et al.

observed similarity between the two devices was unexpected; we anticipated that users of the SF would perform in-plane insertions just as they had through-plane insertions. Despite the high level of performance with through-plane insertions, however, once SF users were able to track the needle within the ultrasound slice, they apparently chose to rely more on moment-to-moment visual feedback about its position, slowing performance. The SF did show lower variability than CUS in aiming azimuth. As the azimuth corresponds to the width of the slice plane, this indicates that SF users gain an advantage by their direct perception of that slice plane. 5.

GENERAL DISCUSSION

The basic findings of these experiments confirm predictions derived from theories of action learning. Augmented-reality viewing, which allows users to perform visually directed action, required little learning and readily generalized to new targets and actions. Performance with the SF, relative to CUS, was superior with respect to measures of accuracy and noise. In this study (cf. Wu et al. [2005]), the two devices did not show differences in systematic bias, undoubtedly because CUS users made trial-to-trial corrections that were afforded by the constancy of the target and the feedback available. Turning to the issues raised in the introduction, what can we conclude more generally about the levels at which learning and generalization occur? The sonic flashlight appears to exemplify the case where perceptually guided action is under control of mechanisms that are well established pre-experimentally. Generalization to new targets, like continued practice with a single target, makes use of these deeply rooted skills; hence, users consistently perform with high accuracy and low variability. The case of conventional ultrasound is quite different. We propose that as learning develops, the spatial/cognitive activity of localizing the target and using that localization to guide action contributes less and less to performance. Users off-load performance from the cognitive system, instead coming to rely on the movement parameters that have been established. This pattern is then inapplicable to new actions, as demanded by shifts of the target or the response location. In our data, performance at the point of shift declined, by some measures, essentially to the novice level. The similar effects of locationplus-movement shift (Experiment 1) and movement-alone shift (Experiment 2) indicate that the new action pattern dominated task demands. Any savings due to maintaining the same target location was not apparent. On the applied side, our research is intended to have a positive impact on ultrasound-aided surgical intervention. By demonstrating the efficacy and ease of use of the sonic flashlight, we intend to further progress toward an approved medical device that has widespread use. For example, the finding that the SF produced lower variability in aiming azimuth for in-plane insertion suggests it might show a significant advantage over CUS for applications where that procedure is used, such as the biopsy of suspected tumors in breast cancer. Moreover, by understanding the psychophysics of the device, improvements in its design may be effected. Our work is also potentially relevant to other augmentedreality systems, to the extent that those incorporate direct perceptual visualization. It is important to note that other approaches to AR provide only some of the cues of the SF; in particular, head-mounted displays fail to retain convergence of the eyes as a cue. At the same time, our work suggests avenues for ultrasound training that may reduce perceptual and motor sources of error, particularly among less skilled users. An avenue to pursue further is to consider whether variability in training will alleviate the narrowness of learning observed here. Consider, in this regard, that even repeated trials with the same target did not bring CUS users to the level found in the initial trials with the SF. Although variable training is likely to improve generality of application (e.g., Lee and Magill [1983] and Shea and Morgan [1979]), it seems doubtful that it could bring ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 21

performance with conventional ultrasound to the level that is achievable with an augmented reality display. APPENDIX Table A.I. F Tests from ANOVAs on Learning and Transfer in Each Experiment, for Five Measuresa Effects in Learning ANOVA (blocks 1–3) Experiment 1 (through plane)

2 (through plane)

3 (in plane)

Measure Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion

Block F(2,22) = 15.582*** F(2,22) = 11.631*** F(2,22) =10.498*** F(2,22) = 1.291 F(2,22) = 12.113** F(2,18) = 9.557*** F(2,18) = 3.742* F(2,18) = 3.619* F (2,18) = 1.518 F(2,18) = 5.561* F(2,18) = 13.009*** F(2,18) = 4.953* F (2,18) = 0.247 F (2,18) = 2.005 F(2,18) = 4.944*

Device F(1,11)= 38.472*** F(1,11) = 19.233*** F(1,11) = 14.898** F(1,11) = 8.687* F(1,11) = 26.436*** F(1,9) = 69.861*** F(1,9) = 23.810*** F(1,9) = 63.618*** F(1,9) = 10.190* F(1,9) = 21.488*** F(1,9) = 21.441*** F(1,9) = 9.685* F (1,9) = 3.484 F(1,9) = 20.019** F(1,9) = 12.215*

Interaction F(2,22) = 11.658*** F(2,22) = 7.976** F(2,22) = 1.691 F(2,22) = 0.815 F(2,22) = 6.201** F(2,18) = 4.131* F(2,18) = 4.374* F(2,18) = 4.252* F (2,18) = 1.066 F (2,18) = 0.332 F(2,18) = 3.611* F(2,18) = 3.705* F (2,18) = 0.591 F (2,18) = 0.166 F(2,18) = 6.425*

Effects in Transfer ANOVA (blocks 3 and 4) Experiment 1 (through plane)

2 (through plane)

3 (in plane)

a ∗

Measure Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion Success rate Distance from target SD aiming elevation SD aiming azimuth Endpoint dispersion

Block F(1,11) = 13.564** F (1,11) = 2.197 F(1,11) = 8.608* F (1,11) = 2.468 F (1,11) = 4.083 F(1,9) = 18.778** F(1,9) = 16.252** F (1,9) = 0.850 F (1,9) = 0.810 F(1,9) = 34.573*** F(1,9) = 11.250** F (1,9) = 0.374 F (1,9) = 2.319 F (1,9) = 0.029 F (1,9) = 0.001

Device F(1,11) = 40.209*** F(1,11) = 20.317*** F(1,11) = 7.983* F (1,11) = 1.232 F(1,11) = 17.775*** F(1,9) = 30.533*** F(1,9) = 13.129*** F (1,9) = 3.420 F (1,9) = 4.446 F(1,9) = 24.108*** F(1,9) = 20.167** F (1,9) = 3.547 F (1,9) = 3.873 F(1,9) = 12.332** F (1,9) = 2.329

Interaction F(1,11) = 8.570* F(1,11) = 5.516* F (1,11) = 2.092 F (1,11) = 0.793 F (1,11) = 1.649 F(1,9) = 6.444* F(1,9) = 9.193** F (1,9) = 0.030 F (1,9) = 1.231 F (1,9) = 0.934 F(1,9) = 7.579* F (1,9) = 0.003 F (1,9) = 1.386 F (1,9) = 0.564 F (1,9) = 0.464

p < .05; ∗∗ p < .01; ∗∗∗ p < .001.

REFERENCES ANDERSON, J. R. 1982. Acquisition of cognitive skill. Psychological Review, 89, 369–403. BAJURA, M., FUCHS, H., AND OHBUCHI, R. 1992. Merging virtual objects with the real world: Seeing ultrasound imagery within the patient. In Proceedings of SIGGRAPH ’92, Chicago, IL. 203–210. CHANG, W., STETTEN, G., LOBES, L., SHELTON, D., AND TAMBURO, R. 2002. Guidance of retrobulbar injection with real time tomographic reflection. Journal of Ultrasound in Medicine 21, 1131–1135. CHANG, W., AMESUR, N., KLATZKY, R. L., ZAJKO, A., AND STETTEN, G. 2006. The sonic flashlight is faster than conventional ultrasound guidance to learn and use for vascular access on phantoms. Radiology 241, 3, 771–779. ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Article 1 / 22



R. L. Klatzky et al.

FITTS, P. M. 1964. Perceptual-motor skill learning. In Categories of Human Learning, A. W. Melton, Ed. Academic Press, New York. 243–285. FIOYER-LEA, A., RADCLIFFE, J., AND MATTHEWS, P. M. 2004. Changing brain networks for visuomotor control with increased movement automaticity. Journal of Neurophysiology 92, 4, 2405–2412. FUCHS, H., STATE, A., PISANO, E. D., GARRETT, W. F., HIROTA, G., LIVINGSTON, M., WHITTON, M. C., AND PIZER, S. M. 1996. Toward performing ultrasound-guided needle biopsies from within a head-mounted display. Visualization in Biomedical Computing ’96, ser. Lecture Notes in Computer Science. Hamburg, Germany, vol. 1131. 591–600. GAZZANIGA, M. S., IVRY, R, B., AND MANGUN, G. R. 2002. Cognitive Neuroscience: The Biology of the Mind, 2nd ed., W. W. Norton, New York. JENKINS, I. H., BROOKS, D. J., NIXON, P. D., FRACKOWIAK, R. S. J., AND PASSINGHAM, R. E. 1994. Motor sequence learning: A study with positron emission tomography. Journal of Neuroscience 14, 3775–3790. KLATZKY, R. L., LIPPA, Y., LOOMIS, J. M., AND GOLLEDGE, R. G. 2003. Encoding, learning and spatial updating of multiple object locations specified by 3-D sound, spatial language, and vision. Experimental Brain Research 149, 48–61. LEE, T. D. AND MAGILL, R. A. 1983. Locus of contextual interference. Journal of Experimental Psychology: Learning, Memory, and Cognition 9, 730–746. LOGAN, G. D. 1988. Toward an instance theory of automatization. Psychological Review 95, 492–527. LOGAN, G. D. 1990. Repetition priming and automaticity: Common underlying mechanisms? Cognitive Psychology 22, 1–35. LOOMIS, J. M., LIPPA, Y., KLATZKY, R. L., AND GOLLEDGE, R. G. 2002. Spatial updating of locations specified by 3-D sound and spatial language. Journal of Experimental Psychology: Human Learning, Memory, and Cognition 28, 335–345. MACKAY, D. G. 1982. The problems of flexibility, fluency, and speed-accuracy trade-off in skilled behavior. Psychological Review 89, 483–506. MASAMUNE, K., FICHTINGER, G., DEGUET, A., MATSUKA, D., AND TAYLOR, R. H. 2002. An image overlay system with enhanced reality for percutaneous therapy performed inside CT scanner. In Fifth International Conference on Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Computer Science 2488, Part 2. Springer Verlag, New York. 77–84. POLLMANN, S. AND MAERTENS, M. 2005. Shift of activity from attention to motor-related brain areas during visual learning. Nature Neuroscience 8, 11, 1494–1496. ROSENBAUM, D. A. 1991. Human motor control. Academic Press, San Diego, CA. ¨ , B. 1998. Transition of brain activation from frontal to SAKAI, K., HIKOSAKA, O., MIYAUCHI, S., TAKINO, R., SASAKI, Y., AND PUTZ parietal areas in visuomotor sequence learning. Journal of Neuroscience 8, 5, 1827–1840. SAUER, F., KHAMENE, ALI, BASCLE, B., SCHIMMANG, L., WENZEL, F., AND VOGT, S. 2001. Augmented reality visualization of ultrasound images: system description, calibration, and features. International Symposium on Augmented Reality. IEEE and ACM, Los Alamitos, CA and New York. 30–39. SCHMIDT, R. A. AND BJORK, R. A. 1992. New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science 3, 4, 207–217. SEIDLER, R. D. 2004. Multiple motor learning experiences enhance motor adaptability. Journal of Cognitive Neuroscience 16, 1, 65–73. SHEA, J. B. AND MORGAN, R. L. 1979. Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning and Memory 5, 179–187. STATE, A., LIVINGSTON, M., HIROTA, G., GARRETT, W., WHITTON, M., FUCHS, H., AND PISANO, E. 1996. Technologies for augmentedreality systems: realizing ultrasound-guided biopsies. Computer Graphics, Proceedings of SIGGRAPH. 439–446. STETTEN, G. 2000. System and method for location-merging of real-time tomographic slice images with human vision, U.S. Patent no. 6,599, 247, filed Oct. 11, 2000, issued, July 29, 2003. STETTEN, G. AND CHIB, V. 2001. Overlaying ultrasound Images on direct vision. Journal of Ultrasound in Medicine 20, 235–240. STETTEN, G., CHIB, V., AND TAMBURO, R. 2000. System for location-merging ultrasound images with human vision. In Proceedings of the Applied Imagery Pattern Recognition Workshop. Washington D.C. IEEE Computer Society, Los Alamitos, CA. 200–205. TRACY, J., FLANDERS, A., MADI, S., LASKAS, J., STODDARD, E., PYRROS, A., NATALE, P., AND DELVECCHIO, N. 2003. Regional brain activation associated with different performance patterns during learning of a complex motor skill. Cerebral Cortex 13, 9, 904–910. THORNDIKE, E. L. 1913. Educational Psychology. Columbia University Press, New York. THORNDIKE, E. L. AND WOODWORTH, R. S. 1901. The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review 8, 247–261. WELFORD, A. T. 1968. Fundamentals of Skill. Methuen, London. ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Effectiveness of Augmented-Reality Visualization versus Cognitive Mediation for Learning Actions



Article 1 / 23

WU, B., KLATZKY, R. L., SHELTON, D., AND STETTEN, G. 2005. Psychophysical evaluation of in-situ ultrasound visualization. IEEE Transactions on Visualization and Computer Graphics Special Issue: Haptics, Virtual and Augmented Reality 11, 684–699. ZELAZNIK, H. N., SHAPIRO, D. C., AND MCCLOSKY, D. 1981. Effects of a secondary task on the accuracy of single aiming movements. Journal of Experimental Psychology: Human Perception and Performance 7, 1007–1018.

Received August 2006; revised January 2007; accepted February 2007

ACM Transactions on Applied Perception, Vol. 5, No. 1, Article 1, Publication date: January 2008.

Suggest Documents