The effect of the communication output method on ...

3 downloads 1482 Views 660KB Size Report
frequent misunderstandings and message repairs. Preliminary .... and repair) when an augmented communicator utilizes the Word .... the computer's hard drive.
Augmentative and Alternative Communication, June 2007 VOL. 23 (2), pp. 140 – 153

The Effect of the Communication Output Method on Augmented Interaction D. JEFFERY HIGGINBOTHAMa*, KYUNG-EUN KIMb and CHRISTINE SCALLYc

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

a

University at Buffalo, bUniversity of Central Florida and cPrivate Practice, Knoxville, TN, USA

The experiment compared the ability of a Comprehension Model versus an Interaction Model to account for the communication performance of augmented communicators. Five dyads consisting of adults without disabilities, with one participant in each dyad randomly assigned to use an augmentative communication device, engaged in ten direction-giving tasks in which the augmented communicator used either a Word Method (i.e., spoken words only) or a Mixed Method (i.e., mix of spoken words and letters) speech output. Results indicated an overall completion time and communication rate advantage for the Mixed Method output in most communication tasks, supporting the Interaction Model of augmented communication. The role of communication co-construction in augmented interaction and the implications of the Interaction Model for future communication device design are discussed. Keywords: Rate; Communication rate; Communication aid; Interaction; Communicative; Discourse strategies; Interface design

INTRODUCTION Face-to-face communication between people who use AAC and people who use natural speech has been described as being unique with respect to interaction management and discourse production (Blau, 1987; Buzolich & Wiemann, 1988; Farrier, Yorkston, Marriner, & Beukelman, 1985; Higginbotham & Caves, 2002; Higginbotham, Mathy-Laikko, & Yoder, 1988; Mu¨ller & Soto, 2002; Light, 1998; Sweidel, 1991). Interaction characteristics include explicit display of turntaking signals, collaborative message productions, a general difficulty initiating and maintaining speaking turns, slow communication production rates between five and 15 words per minute (wpm), and frequent misunderstandings and message repairs. Preliminary evidence suggests that these interaction characteristics may be due to augmented communicators’ physical limitations; the experiential, stylistic, and relational characteristics of the conversant; and the augmentative communication technology employed (Higginbotham & Caves, 2002; Higginbotham & Wilkins, 1999). With respect to the possible effect of technology on interactive communication, two different

models of communication may be used to account for the unique interaction characteristics underlying augmented communication: the Comprehension Model and the Interaction Model. The Comprehension Model is based on a substantial body of applied speech synthesis research investigating the intelligibility and comprehensibility of Speech Generating Devices (SGDs) (Duffy & Pisoni, 1992; Drager & Reichle, 2001; Higginbotham, Drazek, Kowarsky, Scally, & Segal, 1994; Kim, 2001; Kim, Higginbotham, & Gavin, 2005; McNaughton, Fallon, Tod, & Weiner, 1994; Mirenda & Beukelman, 1987, 1990). Research in this area focuses on the perceptual and higher order cognitive processes involved in understanding the synthesized speech signal. The Interaction Model is based on a body of research in conversational analysis and interaction dealing with the social aspects of language use (Buzolich & Wiemann, 1988; Clark, 1996; Clark & Brennan, 1991; Higginbotham, 1989; Higginbotham & Caves, 2002; Higginbotham et al., 1988; Light, 1988). The present study tested the relative adequacy of the Comprehension Model versus the Interaction Model in accounting for the effects of output

*Corresponding author. 122 Cary Hall, Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY 14214, USA. Tel: þ1 716 829 2797, x 635. Fax: þ1 716 829 3979. E-mail: [email protected] ISSN 0743-4618 print/ISSN 1477-3848 online ! 2007 International Society for Augmentative and Alternative Communication DOI: 10.1080/07434610601045344

THE EFFECT OF COMMUNICATION OUTPUT METHOD

method differences on various aspects of interaction and discourse.

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

Contributions from Speech Synthesis Research (Comprehension Approach) The most simple and straightforward account of augmentative device-mediated interaction is one that views communication performance as a direct result of the listener’s ability to perceive and comprehend the AAC output. Because of the degraded nature of the synthesized speech signal, listeners take more time to process the signal and have more difficulty identifying the words, compared to listening to natural speech (Logan, Greene, & Pisoni, 1989). In Duffy and Pisoni’s (1992) review of recent speech synthesis comprehension research, the authors conclude that the increased perceptual-level processing demands for synthetic speech limit the listener’s ability to perform higher level comprehension processing, such as syntactic processing and inferencing. Thus, changes in the acoustic characteristics of synthesized speech can significantly affect a listener’s ability to comprehend the associated utterance. Listeners have also been shown to alter their comprehension processing strategies to compensate for the degraded nature of the speech synthesis signal (Duffy & Pisoni, 1992; Mathy-Laikko, 1992; Raghavendra & Allen, 1993; Ralston, Pisoni, Lively, Greene, & Mullenix, 1991; Talbot, 1987). With respect to discourse comprehension, Higginbotham and colleagues (Higginbotham & Baird, 1995; Higginbotham et al., 1994; Higginbotham, Scally, Lundy, & Kowarsky, 1995) analyzed the abilities of listeners without disabilities to summarize discourse passages produced under a variety of conditions simulating the output of current SGD technology. In one

141

study (Higginbotham et al., 1994), participants listened to discourse passages differing in synthesized speech voice quality (DECtalkTM versus EchoTM) and speech output methods (words separated by 10-s silent intervals versus sentences). Each listener’s written summaries of the discourse passages were rated as being either full, partial, changed, or fragmented renditions of the original discourse passage. Listener performance was found to be optimal for high quality synthetic speech issued as individual words separated by ten s silent intervals (i.e., Word Method) versus connected speech (i.e., Sentence Method) (see Figure 1 for an example of speech output methods). Performance was also affected by changes in the length and complexity and topic familiarity of the discourse passages. The researchers concluded that listener performance problems were related to both perceptual processing difficulties due to speech quality and the speed of utterance delivery, and to the higher order information processing demands. In a similar follow-up study, Higginbotham et al. (1995) examined the impact of Word, Sentence, and Mixed Methods (i.e., whole words þ spelled words) on the quality of discourse recall. Results were consistent with earlier findings, showing listener summarization performance significantly declined across Word, Sentence, and Mixed output methods, with the Mixed Method imposing the greatest constraints on comprehension. The authors concluded that the temporally bound memory and processing requirements to comprehend SGD utterances exceed the ability of many listeners, thus limiting discourse comprehension in many typical SGD output method contexts. Discourse comprehension accuracy of synthetic speech has also been shown to be affected by attention demands (Drager & Reichle,

Figure 1. Examples of three different speech output methods. The ‘þ’ characters signify the keystrokes made by the AS to produce the message. Individual letters or words indicate the point at which the text material is spoken. 1

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

142

D. J. HIGGINBOTHAM

2001), slow speech production rates (Kim et al., 2005), and noise (Talbot, 1987). Taken together, these results support Duffy and Pisoni’s (1992) general resource model for comprehension processing. Listeners are at increased risk for making comprehension errors, particularly when the synthesized voice (a) is of a lower quality, (b) is produced in a noisy environment, (c) is connected speech, and/or (d) forces the listener to remember and process individual, spelled words as a part of the listening task. In these risky comprehension situations, listeners often ‘filled in the gaps’ using their background knowledge of the topic as well as their inferencing abilities (Higginbotham & Baird, 1995). Unfortunately, these comprehension strategies frequently contribute to further misunderstandings in difficult listening situations. The comprehension explanation provides a simple, reasonable, and testable hypothesis to account for the communication problems faced by augmented communicators. If communication ease is a direct consequence of the listener’s ability to comprehend the utterances produced by a SGD, then interactions should proceed more quickly and with less difficulty (e.g., communication troubles and repair) when an augmented communicator utilizes the Word Method as opposed to the Mixed Method of output. Here, the focus is on listener comprehension in a non-interactive communication context. Fewer comprehension processing demands placed on the listener should result in better interaction performance. Contributions from Social Interaction Research (Interaction Approach) Research from the AAC field, social psychology, and human factors provides a different explanation about the communication adaptations made by participants when attempting to overcome the limitations imposed by their communication technologies. Chapanis and colleagues (Chapanis, Ochsman, Parrish, & Weeks, 1972; Chapanis, Parrish, Ochsman, & Weeks, 1977) examined the impact of different communication media (faceto-face, speech only, writing, typing, etc.) on various aspects of communication performance and found statistically significant changes in communication speed, utterance length, and complexity as a function of the media employed. Clark contends (Clark, 1996; Clark & Brennan, 1991) that both the speaker and the listener are under significant temporal pressure to communicate within specified time frames. Conversational success is further affected by various characteristics of the communication medium used (e.g., required co-presence or co-temporality

of the communicators, display permanence and revisability of the message, turn sequence restrictions). Each medium extracts different costs on the communicators and likelihood of successful communication (e.g., ease of starting, formulating, and producing an utterance; utterance reception and understanding; speed and speaker change). To abate the communication pressures and overcome media constraints and their resultant costs, participants utilize a variety of grounding tactics to ensure communication success. Thus, the different interaction styles observed are due, in part, to the particular communication media employed (face-to-face, telephone and email, AAC device, etc.) and participant adaptations to the associated communication constraints (Higginbotham & Caves, 2002; Higginbotham & Wilkins, 1999). Observational research has shown AAC faceto-face communication to be highly interactive and collaborative with respect to message production. Studies of AAC device-mediated interaction can be viewed in terms of communication media constraints. During turn taking, the augmented communicator usually communicates letter-by-letter, word-by-word, or sometimes utterance-by-utterance, depending on the output characteristics of their communication device (Blau, 1987; Buzolich & Wiemann, 1988; Higginbotham, 1989; Higginbotham et al., 1988; Sweidel, 1991). In response, the communication partner verifies the augmented communicator’s contributions by repeating or acknowledging the previously uttered message part. Guessing, confirmation, and repair sequences also typify these communication exchanges. This active collaboration by both participants during utterance production reflects each participant’s attempts to achieve mutual understanding under the constraints imposed by the particular communication medium. Experimental research in AAC has shown that changes made to the output mode of devices (device versus no device, letter board versus SGD, word board versus electronic display) affect the structure of turn taking, utterance formulation, and communication repair. Several studies found interactions among communicators using AAC and people without impairments to be asymmetrical, with the non-impaired addressee assuming greater interpretative and regulatory control of conversation (Buzolich & Wiemann, 1988; Farrier et al., 1985; Higginbotham, 1989). Farrier et al. (1985) also included a no-device condition (i.e., subjects were able-bodied) and demonstrated that conversational asymmetries were device-related. Higginbotham (1989) found the addition of a visual display on an AAC device did not

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

THE EFFECT OF COMMUNICATION OUTPUT METHOD

consistently affect the length of turns or messages constructed by the dyads, but did reduce the frequency and length of repair-related interactions. The display allowed both of the participants the ability to review the preceding utterances rather than attempting to memorize them. Recent research by File and Todman (2002) indicates that with SGDs that permit whole utterances to be issued at more rapid rates, participants adapt their discourse to be more like speech mediated interactions and increase their use of regulatory talk in response to the particular characteristics of utterance-based communication devices. The Interaction Model views AAC-mediated communications to be a product of each participant’s adaptation to the limitations of the augmentative device (e.g., lack of intelligibility, slow speed) and the temporal-sequential constraints of human social communication. Problematic utterance output may require participants to verify each linguistic unit as it is expressed, utilize more contingent queries, and/or engage in frequent repair. Less problematic output may result in a different set of interaction patterns. For example, Higginbotham (1989) notes that rapid turn taking may be a desirable feature for AAC interaction, because it provides turn-taking opportunities required for collaborative utterance formulation and repair. Access to frequent turn taking may also furnish the socio-emotional context needed to achieve and maintain attitudinal alignment and sustained attention to utterance production. Augmented technologies that do not output until a word or utterance is completed may, by unduly prolonging a speaker’s turn, incur costs to joint attention, production, reception, and understanding, thereby making face-to-face communication more difficult. The Interaction Model does not argue against the Comprehension Model, per se. Rather, it subordinates comprehension processing to one of several factors noted above, shaping interaction. In contrast to the Comprehension Model, the Interaction Model would favor Mixed Method output (i.e., single words and spoken letters), which would allow participants to engage in turn taking, guessing, and message repair after every keystroke issued by the augmented communicators, rather than only between constructed words.

143

conditions: the Word Method versus the Mixed Method. The comprehension hypothesis predicts faster task completion times and communication rates using the Word Method because of its demonstrated advantage for listener comprehension compared to the Mixed Method. Independent production words, particularly those of high information load by the AC (e.g., the first mention of a referring expression) would support the Comprehension Model. The Interaction Model predicts better performance in the Mixed Method since it affords more flexibility in turn taking, promotes collaborative message construction, and optimizes the use of abbreviation, guessing, and other strategies to speed up the communication process. The interaction model also predicts a higher percent of SC and collaboratively produced words and messages, particularly for the first mention of referring expressions. METHOD Participants Five adult dyads were selected from individuals responding to advertisements placed around the University at Buffalo campus. Each dyad consisted of individuals of the same gender (three female, two male) who reported to have been good friends for at least one year prior to the investigation. Each individual successfully completed a spoken and written communication skills screening (Keenan & Brassell, 1975); passed an audiometric screening (0.5, 1.0, and 2.0 kHz at 25 dB; American National Standards Institute (NSI) (1969); had taken college-level course work; and reported no history of neurological problems, diagnosed learning disabilities, uncorrected visual impairment, or motor impairment. One participant was randomly chosen from each dyad to serve as the augmented communicator (AC) and the other as the speech communicator (SC). Ablebodied participants were employed as augmentative system users to study communication device effects without the potentially confounding performance variability due to disabling conditions (Farrier et al., 1985; Higginbotham & Bedrosian, 1995). Setting

STATEMENT OF THE PROBLEM The present experiment tested the adequacy of the Comprehension Model and Interaction Model by examining various aspects of interaction performance under two different output mode

The experiment took place at the Communication and Assistive Device Laboratory at the University at Buffalo. During both the training and experimental interactions of the study, the AC was seated behind a small table on which the communication device was situated. The SC was

144

D. J. HIGGINBOTHAM

seated across the table from the AC so that the individuals could easily see each other’s gestures and facial expressions. The experimenter was secluded from the dyad’s view behind a barrier to monitor the recording equipment. Materials

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

Behavior-Recording Equipment A Panasonic WV 3500TM video camera and a Panasonic AG-6300 VHSTM videotape recorder were used to make all video recordings of the experiment. The camera was positioned 17 feet from the dyad, providing a profile view of the participants during their interactions. Audio events were recorded via a Shure Lavaliere SMIlTM microphone suspended 1 foot above the participants’ heads and connected to the videotape recorder. A FOR.A VTG 33TM time code generator was used to place a unique time code stamp on the upper portion of each video frame (0.1-s resolution).

squares (0.560.5 in) contained alphanumeric characters, punctuation, and control keys (visual display, error correction, and spacing). Letters were arranged alphabetically to control for individual typing proficiency. The remaining 75 squares (160.5 in) contained vocabulary items selected from frequently occurring words (Beukelman, Yorkston, Poblete, & Naranjo, 1984) as well as those words necessary for the task. Vocabulary items were arranged alphabetically into columns. The content and organization of the SGD display remained constant across trials. Only the AC could see the display. Pressing the touch window over the desired item activated the communication device. Selections made on the communication device were simultaneously sent to a two-line, 134 character display area on the top of the application; through the DECtalkTM speech synthesizer to two RealisticTM 40 – 166 speakers mounted on the computer table facing the SC; and to a logfile on the computer’s hard drive. Each data point sent to the logfile was time-stamped at a 0.1-s resolution.

Speech Generating Device The SGD hardware used in this study consisted of a Macintosh IICXTM computer, an Edmark TouchWindowTM, and a DECtalkTM Speech Multivoice synthesizer. A direct selection communicator was programmed in HyperCardTM consisting of 116 word and letter squares arranged on a 6.5-69-in overlay (see Figure 2). Forty-one

Speech Output Method The synthetic speech output method was controlled according to two output methods corresponding to the experimental conditions (see Figure 1). In the Word Method condition, synthetic speech output occurred when a whole word item, the space bar, or a punctuation character

Figure 2. On-screen communication device interface used in the experiment.

THE EFFECT OF COMMUNICATION OUTPUT METHOD

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

was selected. In the Mixed Method, synthetic speech output occurred after every user selection. In order to simulate the communication rates produced by most AAC users, a 2-s delay was programmed into the application interface, which had the effect of postponing keystroke actuation on the visual display and speech synthesis output (Higginbotham, 1989). This constraint produced a maximum speech production rate between 3.8 wpm when all words were spelled, to 16 wpm when only directly selected words were used to compose a message. Maps Four U.S. Geological Survey topographic maps of Las Vegas, Nevada (circa 1965; scale 1:24,000) were used to develop 28 smaller maps (12612 in) used in the experiment. Each of the experiment maps consisted of a route and a script. Routes involved ten turns drawn over existing roadways, leading from a starting location to a final destination, with two intervening stops. The complexity of the routes were equated based on the number of turns and stops, and the salience of turn and stop locations. A script was developed for each route, consisting of: (a) e the initial location, (b) two intermediate location stops and accompanying activities (e.g., pick up a package), and (c) a final destination. The scripts were judged to be of similar difficulty based on the total number of words (55 – 63), the predictability of the locations and tasks, and the complexity of the task descriptions and location names. The maps were randomly divided into 15 experimental maps and 11 training maps. The remaining two maps were kept as alternates. During each interaction, the maps were placed behind a visual barrier so that the individuals could see only their own map. Questionnaires An eight-item, short-answer questionnaire was developed to elicit script information from the SC following each interaction. The questionnaire queried a combination of location and route names and accompanying activities. The primary purpose of the questionnaire was to increase task relevance (i.e., participants were told that they needed to pay attention because there would be a test at the end of the task). Procedures The experiment consisted of two phases: a training phase and an experimental phase. During both phases of the experiment, each dyad

145

participated in a series of interactions that were controlled according to three output method conditions. In addition to the two devicemediated conditions (Word Method and Mixed Method), participants took part in a speech condition, in which the AC was allowed to speak using his or her natural voice. The speech condition was not analyzed for this study. Task The interactions during both the training and experimental phases of the study involved a direction-giving task based on the maps described previously. For each interaction, the AC was provided with a copy of the map on which the route was marked and a copy of the script. The AC was instructed to teach the route and script information to the SC. The SC was provided with a copy of the same map on which only the starting location had been circled, and was instructed to trace the map route as it was learned, marking the stops along the way, but not to make written notes. Each dyad of participants was told that the purpose of the experiment was to see how well the AC could explain the map route and script to the SC, and that their performances would be evaluated based on the accuracy of the route the SC drew on the map and the descriptions the SC provided in the questionnaire. Prior to the start of each interaction session, the SC left the room while the AC reviewed a specific map with the experimenter. The SC then returned to the room, and the participants were informed of the output method condition. Participants were also cautioned not to engage in off-task conversations. For the Word Method and Mixed Methods, the AC was told to converse using the SGD, head nods, head shakes, and simple gestures, but not to mime, use symbolic gestures, or mouth words. For all conditions, the SC was encouraged to ask questions and make comments during the interaction. The dyad then engaged in an interaction in which the AC explained the map route and tasks to be accomplished. No time limit was placed on the interaction. After each interaction the AC left the room, and the SC was asked to complete a questionnaire. Training Phase Each dyad participated in four, 2-h training sessions prior to the experimental phase. This step was taken to ensure the AC’s familiarity with device operation and vocabulary, and the dyad’s familiarity with device mediated interaction and the direction-giving task. Training for the dyad included orientation to the vocabulary on the

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

146

D. J. HIGGINBOTHAM

device and general device operation (20 min), orientation to the direction-giving task (20 min), and several practice interactions (5 h). The AC was also provided with device/vocabulary training (2 h). During device training for words not included on the device, the AC was instructed to spell all words completely and space after each word in both conditions. Dyads completed five training interactions in the speech condition and three in each of the device mediated conditions. Interactions in the speech condition were presented first so that the dyads could easily learn the map-task before having to apply this knowledge in interactions using the device. Following each practice interaction, the experimenter provided the dyad with feedback regarding the appropriateness of their use of gestures, as well as the accuracy and completeness of the SC’s route tracing and questionnaire answers. Experimental Phase The experimental phase of the study consisted of three to four, 2-h sessions. Three to four interactions were conducted within in each session (depending on the duration of each interaction), with the output condition and map presentation order completely randomized across the study. This design yielded a total of ten trials (five per condition) per dyad. Transcription Coding and Analysis Computerized Transcription and Coding System The initial transcript source was comprised of logfiles generated by the AC’s output. These transcripts consisted of text output and associated time codes. The SC’s verbal behavior, as well as the nonverbal behavior of both participants (e.g., head nods, head shakes, and indicatory gestures), were incorporated in the transcripts. Off-task comments regarding task difficulty occurred infrequently and were included as part of the analyses. Computerized transcripts of the communicative interaction were created and formatted according to the codes for the human analysis of transcripts described by MacWhinney (1990). The textoriented coding editor (MacWhinney, 1990) was combined with a computerized video transcription interface (Higginbotham, DeRoo, & Johnson, 1992) to permit rapid transcription of the videotaped interactions. Each participant’s contribution was segmented into communication turns based on the text material and presence of turn regulatory behavior observed in the interaction. Transcripts were then

segmented into utterances to identify a proposition level message constructed during the communicative exchange (Higginbotham, 1989). Utterance identification was based on four criteria: (a) inclusion of a clause level proposition, (b) overt and mutual acceptance of message content by participants, (c) display of turn relinquishing behavior by one of the participants, and (d) subsequent topic shift by participants. Each utterance production was used as a basic analytic unit for the message summary analysis.1 Message Summaries A message summary was generated from each utterance comprising the non-redundant elements of its formulation to represent speaker’s intended communication. Each message summary consisted of a sequence of words, which were (a) independently produced by the AC, (b) independently produced by the SC, (c) collaboratively produced by both participants, and/or (d) elliptical constituents required to form a grammatically complete utterance. Two types of analyses were conducted on the message summaries in order to investigate the contribution of each participant in the dyad for two levels of message construction. First, a wordlevel analysis of message summaries was conducted to examine each participant’s role in the communication of each word in the message summary by coding each word according to whether it was produced by the AC or SC, collaboratively produced, or an elliptical constituent, which was treated as belonging to both participants. Second, the first mention of every referring expression was also analyzed, in order to determine each participant’s role in the communication of new information in the message summary. It was reasoned that referring expressions carried a high information load and were, therefore, critical to successful communication. To perform this analysis, each noun or noun phrase produced in a message summary was inspected to determine whether it was the first mention of a referring expression within that trial. Pronouns, direction words, and parallel nouns were disregarded for this analysis. First mention items were then coded as to whether they were produced by the AC, SC, or were collaboratively produced words or phrases. Checking Procedures for Transcripts and Message Summaries The transcription verification procedures employed by Shriberg, Kwiatkowski, and Hoffman (1984) and Higginbotham et al. (1994) were used

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

THE EFFECT OF COMMUNICATION OUTPUT METHOD

to verify the accuracy of the original transcripts, message summary, and first mention data. Four research assistants participated in the original transcription and coding process. Two research assistants independently reviewed each transcript (with its videotape) that was originally transcribed and coded by another research assistant. Each transcript was checked for accuracy of contents, turn segmentation, utterance production boundaries, and coding. Transcription agreement was 0.953 (SD ¼ 0.011). Potential errors in the transcripts and message summaries were noted by the assistants and resolved in the following manner: If the original transcriber agreed with the transcript correction, the transcript was amended accordingly. If the original transcriber disagreed with the corrected transcript, a consensus was achieved through a meeting involving all the research assistants and the principal investigator. All transcription and coding differences were resolved before proceeding to the analysis phase. Experimental Design Task completion time and frequency data for keystrokes, words, turns, and utterance productions were obtained for each of the five trials for each output method and dyad. In order to control for the inequality of trial durations, all frequency data were divided by the task duration to produce equitable rate measures for comparison purposes. A within-subjects design was employed to assess the effects of output method on task completion time, keystrokes, words, turns, and utterance production rates and message summary data. All paired comparisons were made using Scheffe’s multiple comparison procedure (Marascuilo & Serlin, 1988). The error terms used for the Scheffe’s test were obtained from the residual error generated by the pertinent analysis of variance test. The probability of committing a Type I error was set at P 5 0.05 for each multiple comparison procedure. RESULTS The dependent measures used in this study were first analyzed across experimental trials to determine whether communication performance was affected by additional short-term experience. Using a within-subjects ANOVA, no significant difference was noted for the trial variable, nor did it interact with individuals, dyads, or output methods, F(9, 22), P 4 1 (for all comparisons), indicating that short-term experience did not appreciably affect communication performance.

147

Because of the lack of these significant effects, the trial variable was not considered in the overall analysis of variance. All SCs completed their questionnaires with 100% accuracy; therefore the questionnaires were not used in further analyses. Task Completion Time and Communication Rates First, data were analyzed to determine whether output method affected task completion time or the speed with which the AC produced keystrokes, words and utterances. Significant improvements in task completion time, and word and utterance output rates were found for the Mixed Method versus the Word Method. As shown in Table 1, task completion times for the Mixed Method averaged 20% faster than for the Word Method, F(1, 40) ¼ 13.05, P 5 0.001. There was a 21% increase in the rate of words issued by the AC subjects in the Mixed Method compared to the Word Method, F(1, 40) ¼ 45.69, P 5 0.001. In addition, turn taking rates were 16% higher, F(1, 40) ¼ 14.59, P 5 0.001, and utterance rates were 22% higher, F(1, 40) ¼ 11.07, P 5 0.01. Keystroke rate, however, did not differ significantly between output methods, F(1, 40) ¼ 1.23, suggesting that the source of the differences in completion time, and in word, turn, and utterance rates resides in the manner in which words and messages were produced rather than actual differences in typing speed. Message Summaries A detailed analysis of participant contributions to the construction of AC messages was undertaken to determine whether a relationship existed between the ways messages were constructed (i.e., independently versus collaboratively) and the output method employed. The word level analysis assessed the four ways in which participants produced individual words in the message summaries (i.e., AC, SC, collaborated, omitted).

TABLE 1 Mean task completion times and production rates (in words per minute) across output method conditions. Output method

Task completion time

Rate Word

Turn

Utterance

Keystroke

Mixed M SD

17.03 4.44

5.11 0.51

13.21 3.26

1.21 0.36

10.31 1.10

Word M SD

21.33 7.39

4.21 0.58

11.34 4.5

0.99 0.33

10.05 0.86

148

D. J. HIGGINBOTHAM

The first mention analysis evaluated participant roles in producing the first mention of a referring expression in message summaries (AC, SC, word collaboration, phrase collaboration).

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

Word-level Analysis Table 2 and Figure 3 display the results of the word-level analysis of message summaries across output methods. Significant main effects were found for Output Method with omitted words, F(1, 160) ¼ 13.73, P 5 0.001, and without omitted words, F(1, 120) ¼ 10.99, P 5 0.01, and for Contribution Types, F(3, 160) ¼ 209.09, P 5 0.001. The interaction effect was non-significant, F(3, 160) ¼ 0.76. The Mixed Method demonstrated significant improvement in the production rate of words in message summaries as compared to the Word Method, regardless whether omitted words were included in the analysis.

TABLE 2 Mean production rates of utterance summary words across contribution types and output method conditions. Contribution type Output method

AC

SC

CO

OM

Mixed M SD

3.48 0.65

3.90 1.63

0.86 0.25

2.05 0.89

Word M SD

3.03 0.68

3.69 1.97

0.29 0.34

1.80 0.98

AC, produced by the augmented communicator; SC, produced by the speech communicator; CO, collaboratively produced; OM, an omitted or elliptical constituent.

An analysis of contribution types revealed that the AC and the SC independently produced words at an equivalent rate, which was significantly greater than the rate of omitted words. In turn, these contribution types occurred more frequently than shared constructions (Scheffe [7, 160] ¼ 0.756, P 5 0.05). The greatest rate differences within contribution types occurred for words that were collaboratively produced (Mixed Method ¼ 0.86; Word Method ¼ 0.29), but failed to achieve statistical significance in the post hoc analysis. Thus, the word level analysis of the message summaries paralleled the previous rate results, but did not determine the specific interaction strategies responsible for the output method-related performance differences. Message Summaries: First Mention Information Analysis Significant main effects were found for Output Methods, F(1, 160) ¼ 6.16, P 5 0.05, and Contribution Types, F(3, 160) ¼ 52.20, P 5 0.001, and interaction effects F(3, 160) ¼ 22.89, P 5 0.001 (see Table 3 and Figure 4). Comparison of the contribution types showed the participants in the Mixed Method produced collaborative word constructions at a rate three times greater than in the Word Method (Scheffe [7, 160] ¼ 0.201, P 5 0.05). No other significant differences occurred for contribution categories. Unlike the word-level analysis, the relationship among contribution types in the first mentioned analysis significantly differed by output method (Scheffe [7, 160] ¼ 0.201, P 5 0.05). For the Mixed Method, the highest production rates of

Figure 3. Mean scores of number of words per minute in message summaries across output methods. AC, independently produced by the augmented communicator; SC, independently produced by the speech communicator; CW, collaboratively produced word; OM, elliptical (omitted) constituents.

THE EFFECT OF COMMUNICATION OUTPUT METHOD

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

first mentioned words were noted for both wordlevel collaborative constructions and words produced by the SC, which in turn, produced first mention words at significantly greater rates compared to words issued by the AC or words collaboratively constructed at the phrase-level. For the Word Method, the rate of first mention words independently issued by the SC was significantly greater than those issued by the AC, which was significantly greater than those of the word or the phrase-level collaborative constructions (see Figure 4). DISCUSSION Overall Results Results obtained show an overall advantage for the Mixed Method over the Word Method.

TABLE 3 Mean production rates for the first mention of referring expressions across contribution types and output method conditions. Contribution type Output method

AC

SC

CP

CW

Mixed M SD

0.40 0.13

0.74 0.28

0.26 0.14

0.66 0.23

Word M SD

0.52 0.99

0.75 0.40

0.31 0.18

0.22 0.26

Total M SD

0.46 0.18

0.74 0.34

0.29 0.16

0.44 0.33

AC, independently produced by the augmented communicator; SC, independently produced by the speech communicator; CP, collaboratively produced phrase; CW, collaboratively produced word.

149

Despite equivalent keystroke input rates, the Mixed Method was significantly faster for all main effect comparisons, including task completion, word output rate, turn taking, utterance production, message summaries, and first mentioned words. As noted in the results, these rate differences were both statistically and practically significant, with the average effect size differences for task completion times, word, turn, and message summary rates between 16 and 22%. The relative speed advantage of the Mixed Method may be due, in part, to the ability of participants to co-construct single words, as evidenced by the elevated rates of word-level collaborations in the message summary analyses. The high rate of word-level collaborations for first mention referring expressions shows that collaborative strategies may be a common way to establish mutual understanding for critical information. Additional evidence can be obtained from a taxonomy of word construction strategies used by the ACs in this study (Scally, 1994). A total of ten different word construction strategies were identified and employed differentially in the two output method conditions (see Figure 5). Whereas the ACs employed a complete spelling strategy for the majority of their word constructions in the Word Method, the same speakers utilized three different strategies in the Mixed Method: partial spelling (i.e., spell part of the word), initial letter strategies, and complete spelling. Both partial and initial letter strategies (available only in the Mixed Method) took fewer keystrokes, saving the participants’ valuable time by allowing the partners to make guesses during word construction. Likewise, speakers in the Word Method sometimes used an initial syllable strategy (i.e., type syllable and then hit space key to output) with varying

Figure 4. Mean scores of number of words per minute of the first mention of referring expressions across output methods. AC, independently produced by the augmented communicator; SC, independently produced by the speech communicator; CP, collaboratively produced phrase; CW, collaboratively produced word.

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

150

D. J. HIGGINBOTHAM

Figure 5. Median percent of word construction strategy use by output method.

frequency to speed up their message productions. However, the initial syllable strategy was not as efficient as partial spelling, as it requires an extra keystroke and is more costly to repair if the SC misunderstands it. During participant debriefing after the experiment, participants who comprised the five dyads mentioned the problematic nature of the slow communication output and indicated their preference for the Mixed Method of output. Reasons cited for preferring the Mixed Method included increased overall speed of the strategy and greater opportunity for guessing, which aided memory and facilitated SC involvement. Comparison of the Communication Models The relative advantage of the Mixed Method over the Word Method in this experiment favors the Interaction Model of augmented communication. Coupled with the reports from the participants, the Mixed Method appears to have offered participants increased opportunities for minimizing the communication costs associated with this

communication medium (e.g., production, reception, understanding, delay) by allowing joint focusing on single letters and letter sequences, instead of only words. Overt co-construction was evident in the elevated rates of shared contributions and first mention word co-constructions in the Mixed Method. In both experimental conditions, the SC actively participated in the construction of the AC’s utterances by providing entire words and phrases. Interestingly, when denied the ability to use partial spelling as a communication strategy, ACs in the Word Method were all observed to output initial syllables of words to be guessed by their partners. It was also evident that device use resulted in comprehension costs (e.g., delay, intelligibility, understanding) across both conditions. Rather than assuming a passive role, however, the SCs utilized the properties of the communication media (e.g., letter output) as a basis for guessing and other collaborative activities. Comprehension difficulties may have prompted participants to find ways to overcome the limitations imposed by the communication device.

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

THE EFFECT OF COMMUNICATION OUTPUT METHOD

The results of the study stand in sharp contrast with the predictions results of Higginbotham et al.’s (1995) discourse comprehension study in which listener participants comprehended Word Method discourse better than Mixed Method discourse. Differences in the listener/SC roles can account for the discrepancies in the results. In Higginbotham et al. (1995) listener participants were forced into a passive communication role and were unable to affect the communication tempo through their co-participation in the message production process compared to their SC counterparts in the current study. Nor were they able to repair the communication in any manner (e.g., contingent query, request for repetition). Forced into a less-natural, passive role, listener participants may have relied on their rehearsal and memorization skills to recall the messages. In contrast, SC participants were allowed to take part in the communicative interactions involved in the direction giving tasks. They were able to actively deal with the communication constraints imposed by the SGD and address their comprehension needs through the variety of turn taking and repair strategies previously discussed. It is important to note here that the interaction behaviors of the SCs in this study are representative of any individual coping with communication constraint (Clark & Brennen, 1991; Higginbotham & Caves, 2002) and should not be viewed as aberrant. Rather, these interactions were representative of collaborative communication strategies used by a wide variety of individuals – with and without disabilities – during their everyday conversational activities. Limitations of the Investigation It should be recognized that the particular communication task was designed to maximize AC communication output by placing the people using AAC in the position of giving map directions. This aided in collecting enough AC output for suitable analysis. Although common, other types of discourse such as collaborative problem solving, free-form conversation, storytelling, etc., would be expected to result in a different distribution of speaker and listener turns. It may also be expected that differences in training and SGD features also could have impacted on the study. Participants in this study received extensive training in order to minimize performance changes due to learning and familiarization over the course of the experiment. Subjects were selected based on a prior friendship history in order to ensure interaction familiarity. Although one may expect to see similar types of

151

interaction strategies across discourse genre, task type, experience, and interpersonal relationship level, the particular distributional characteristics of these interaction phenomena may be affected by these and other technological, communication, and social variables. The inclusion of a welldefined debriefing interview could have shed light on the reasons why individuals used particular interaction and word production strategies. Although Scally’s (1994) informal interview data was summarized in the Discussion section, a more systematic interview technique could have provided additional information for better understanding the reasons participants chose their particular interaction styles. Implications for Device Design The results of this study offer additional evidence that significant communication performance differences can be produced by simple changes in a user interface, and raise the question as to what other modifications to user interfaces would result in demonstrable performance improvements. To date, most interfaces have implicitly embraced a sentence construction model, in which the user interface has been engineered to produce sentence-level messages. Although such an approach may be important for writing and certain speaking events (e.g., a lecture), these interfaces may not take advantage of the interactive capabilities of the participants and may introduce communication costs that impede and discourage social interaction (e.g., listener forced to wait until the message is complete, speaker issues utterances without warning). It would appear valuable to examine a variety of device features (e.g., impact of a visual display or mutually viewable user interface) and audible utterance characteristics (e.g., addition of aspiration before utterance onset to signal the listener that an utterance will be produced, use of attention maintenance strategies) to facilitate interaction and mutual understanding. Given the media characteristics of augmentative devices, this study suggests that communication performance measurements must necessarily take into account the contributions of all participants and should focus on the adaptive abilities of the speaker with respect to the media characteristics of the augmentative device. Structural measures such as utterance length and syntactic completeness, particularly those based on the AC’s productions, would not be able to be used to measure user performance in this situation (Higginbotham, 2003). In fact, without the speaking partner’s contributions, many utterances would appear fragmentary and

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

152

D. J. HIGGINBOTHAM

ungrammatical (Higginbotham & Caves, 2002). The study of augmentative device use within the context of human – computer/human – human interaction provides a needed framework for addressing obvious and subtle communication problems and determining goals for engineers and computer scientists in our field. If, as Nelson (1992) argues, ‘Performance is the Prize’, it is incumbent upon researchers and manufacturers to develop empirical research paradigms to test the efficacy of new device interfaces. As demonstrated in the current study, simple differences in interface design can result in significant differences in communication performance. Acknowledgements The authors would like to thank Terry Rafalowski Welch and Jennifer Cornish for their efforts in editing various drafts of this manuscript. This work has been funded in part by the National Institute on Deafness and Other Communicative Disorders (NIDCD) under Grant #DC00034-05 and the National Institute on Disability and Rehabilitation Research (NIDRR) under Grant #H133E980026-01. Note 1 A more detailed protocol for turn and utterance segmentation can be found in Higginbotham (1989).

References American National Standards Institute (1969). Specification for Audiometers (ANSI S3: 6 – 1969), NY: ANSI. Beukelman, D. R., Yorkston, K. M., Poblete, M., & Naranjo, C. (1984). Frequency of word occurrence in communication samples produced by adult communication aid users. Journal of Speech and Hearing Disorders, 49(4), 360 – 367. Blau, A. F. (1987). Communication in the back-channel: Social structural analyses of nonspeech/speech conversations. Dissertation Abstracts International, 47(3237a), DA 8629674. Buzolich, M. J., & Wiemann, J. W. (1988). Turn taking in atypical conversations: The case of the speaker/augmented communicator dyad. Journal of Speech and Hearing Research, 31, 3 – 18. Chapanis, A., Ochsman, R. B., Parrish, R. N., & Weeks, G. D. (1972). Studies in interactive communication: I. The effects of four communication modes on the behavior of teams during cooperative problem-solving. Human Factors, 14(6), 487 – 509. Chapanis, A., Parrish, R. N., Ochsman, R. B., & Weeks, G. D. (1977). Studies in interactive communication: II. The effects of four communication modes on the linguistic performance of teams during cooperative problem solving. Human Factors, 19(2), 101 – 126. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.

Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick & J. M. Levine (Eds), Perspectives on socially shared cognition (pp. 127 – 149). Washington, DC: American Psychological Association. Drager, K. D. R., & Reichle, J. E. (2001). Effects of discourse context on the intelligibility of synthesized speech for young adult and older adult listeners: Applications for AAC. Journal of Speech Language and Hearing Research, 44, 1052 – 1057. Duffy, S. A., & Pisoni, D. B. (1992). Comprehension of synthetic speech produced by rule: A review and theoretical interpretation. Language and Speech, 35, 351 – 389. Farrier, L. D., Yorkston, K. M., Marriner, N. A., & Beukelman, D. R. (1985). Conversational control in nonimpaired speakers using an augmentative communication system. Augmentative and Alternative Communication, 1(2), 65 – 73. File, P., & Todman, J. (2002). An evaluation of the coherence of computer-aided conversations. Augmentative and Alternative Communication, 18, 228 – 241. Higginbotham, D. J. (1989). The interplay of communication device output mode and interaction style between nonspeaking persons and their speaking partners. Journal of Speech and Hearing Disorders, 54(3), 320 – 333. Higginbotham, D. J. (2003). Formulating research questions: Linking theory to the research process. In R. W. Schlosser (Ed.), The efficacy of augmentative and alternative communication: toward evidence-based practices (pp. 43 – 55). St. Louis: Elsevier. Higginbotham, D. J., & Baird, E. (1995). Analysis of listeners’ summaries of synthesized speech passages. Augmentative and Alternative Communication, 11(2), 101 – 112. Higginbotham, D. J., & Bedrosian, J. L. (1995). Subject selection in AAC research: Decision points. Augmentative and Alternative Communication, 11(1), 11 – 13. Higginbotham, D. J., & Caves, K. (2002). AAC performance and usability issues: the effect of AAC technology on the communicative process. Assistive Technology 14(1), 45 – 57. Higginbotham, D. J., & Wilkins, D. P. (1999). Slipping through the timestream: Time and timing issues in augmentative communication. In J. Duchan, D. Kovarsky, & M. Maxwell (Eds), The social construction of language incompetence. Hillsdale, NJ: Erlbaum. Higginbotham, D. J., Mathy-Laikko, P., & Yoder, D. E. (1988). Studying conversations of augmented communicators. In L. Bernstein (Ed.), The vocally impaired: clinical practice and research (pp. 265 – 294). New York: Grune and Stratton. Higginbotham, D. J., DeRoo, W. E., & Johnson, K. (1992). Technical report: The social interaction transcription system (SITS) (Version 2.1). Buffalo, NY: State University of New York, Communication and Assistive Device Laboratory. Higginbotham, D. J., Drazek, A. L., Kowarsky, K., Scally, C., & Segal, E. (1994). Discourse comprehension of synthetic speech delivered at normal and slow presentation rates. Augmentative and Alternative Communication, 10(3), 191 – 202. Higginbotham, D. J., Scally, C. A., Lundy, D. C., & Kowarsky, K. (1995). Discourse comprehension of synthetic speech across three augmentative and alternative communication (AAC) output methods. Journal of Speech and Hearing Research, 38(4), 889 – 901. Keenan, J. S., & Brassell, E. G. (1975). Aphasia language performance scales. Murfreesboro: Pinnacle Press.

Augment Altern Commun Downloaded from informahealthcare.com by State University of New York at Buffalo For personal use only.

THE EFFECT OF COMMUNICATION OUTPUT METHOD Kim, K. E. (2001). Effect of speech-rate on the comprehension and subjective judgments of synthesized narrative discourse. Unpublished doctoral dissertation, University at Buffalo, NY. Kim, K. E., Higginbotham, D. J., & Gavin, W. (2005). Effect of speech rate on the comprehension and subjective judgments of synthesized narrative discourse. (2007). Light, J. (1988). Interaction involving individuals using augmentative and alternative communication systems: State of the art and future directions. Augmentative and Alternative Communication, 4(2), 66 – 82. Logan, J. S., Greene, B. G., & Pisoni, D. B. (1989). Segmental intelligibility of synthetic speech produced by rule. Journal of the Acoustical Society of America, 86, 566 – 581. MacWhinney, B. (1990). The childes project: tools for analyzing talk. Hillsdale, NJ: Erlbaum. Marascuilo, L., & Serlin, R. (1988). Statistical methods for the social and behavioral sciences. New York: Freeman. Mathy-Laikko, P. A. (1992). Comprehension of augmentative and alternative communication device output methods. Unpublished doctoral dissertation, University of Wisconsin-Madison, WI. McNaughton, D., Fallon, K., Tod, J., & Weiner, F. (1994). Effect of repeated listening experiences on the intelligibility of synthesized speech. Augmentative and Alternative Communication, 10(3), 161 – 168. Mirenda, P., & Beukelman, D. (1990). A comparison of intelligibility among natural speech and seven speech synthesizers with listeners from three age groups. Augmentative and Alternative Communication, 6, 61 – 68. Mirenda, P., & Beukelman, D. R. (1987). A comparison of speech synthesis intelligibility with listeners from three age groups. Augmentative and Alternative Communication, 3(3), 120 – 128.

153

Mu¨ller, E., & Soto, G. (2002). Capturing the complexity of aided interactions: a conversation analysis perspective. In S. von Tetzchner & J. Clibbens (Eds), Understanding the theoretical and methodological bases of augmentative and alternative communication. Proceedings of the Sixth Research Symposium of the International Society for Augmentative and Alternative Communication. Toronto, Canada: ISAAC. Nelson, N. W. (1992). Performance is the prize: Language competence and performance among AAC users. Augmentative and Alternative Communication, 8(1), 3 – 18. Raghavendra, P., & Allen, G. D. (1993). Comprehension of synthetic speech with three text-to-speech systems using a sentence verification paradigm. Augmentative and Alternative Communication, 9, 126 – 133. Ralston, J. V., Pisoni, S. E., Lively, S. E., Greene, B. G., & Mullenix, J. W. (1991). Comprehension of synthetic speech produced by rule: Word monitoring and sentence-bysentence listening times. Human Factors, 33, 471 – 491. Scally, C. (1994). The effects of synthesized speech output method on interactions involving augmentative systems users. Unpublished master’s thesis, University at Buffalo, NY. Shriberg, L., Kwiatkowski, J., & Hoffmann, K. (1984). A procedure for phonetic transcription by consensus. Journal of Speech and Hearing Research, 27(3), 456 – 465. Sweidel, G. (1991). Management strategies in the communication of speaking persons and persons with a speech disability. Research on Language and Social Interaction, 25, 195 – 214. Talbot, M. (1987). Reaction time as a metric for the intelligibility of synthetic speech. In J. A. Waterworth (Ed.), Speech and language-based interaction with machines: towards the conversational computer. Chichester: Ellis Horwood.