Aug 31, 2010 - The Aachen aphasia test (AAT) was originally developed and published in. German (Huber et ... theoretical bases. Validity has been claimed to be undermined through use of tasks ..... An alternative kitchen machine .... Testing. Conduct of the test followed strictly the directions set out in the test manual and.
This article was downloaded by: [RWTH Aachen University] On: 11 November 2013, At: 08:39 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Aphasiology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/paph20
The psychometric properties of the English language version of the Aachen Aphasia Test (EAAT) a
b
N. Miller , K. Willmes & R. De Bleser
c
a
Department of Speech, University of Newcastle-Tyne, UK b
Department of Neurology-Neuropsychology, RWTH Aachen, Germany c
Department of Patholinguistics, University of Potsdam, Germany Published online: 31 Aug 2010.
To cite this article: N. Miller , K. Willmes & R. De Bleser (2000) The psychometric properties of the English language version of the Aachen Aphasia Test (EAAT), Aphasiology, 14:7, 683-722 To link to this article: http://dx.doi.org/10.1080/026870300410946
PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-andconditions
aphasiology, 2000, vol. 14, no. 7, 683±722
The psychometric properties of the English language version of the Aachen Aphasia Test (EAAT) N. MI LL E R" , K. W ILL ME S# and R. DE BLE SE R$
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
" Department of Speech, University of Newcastle-Tyne, UK # Department of Neurology-Neuropsycholo gy, RWTH Aachen, Germany $ Department of Patholinguistics, University of Potsdam, Germany
(Received 19 January 1999 ; accepted 20 July 1999)
Abstract This article reports results of a standardization study of the English language version of the Aachen aphasia test (EAAT). The EAAT was administered to 135 speakers with and 93 without aphasia. Aphasic speakers were divided into four groups (n ¯ 30) representing EAAT standard syndrome groups (global, amnestic, Broca’s and Wernicke’s aphasia), and 15 speakers who could not be classi®ed into the standard groups. Without aphasia were 24 nonhospitalized and 41 hospitalized speakers with no history of neurological illness or speechlanguage disorder and 28 speakers with a history of neurological illness, but no aphasia. Hierarchical cluster analysis (complete linkage) demonstrated the validity of the linguistically motivated construction of the EAAT. This was further con®rmed for the main subtests through nonmetric multidimensional scaling (smallest space analysis). The property of increasing complexity across subparts of subtests was also con®rmed. Nonparametric discriminant analyses showed the high diåerential validity of the EAAT for distinguishing between aphasia-no aphasia and acceptably high validity for separating out subgroups of speakers. Consistency coe¬cients (Cronbach’s a ) illustrate the high to very high internal consistency of the subtests. We argue for the applicability to the EAAT of the original German reliability studies which showed retest and inter- and intra-rater reliability to be high. We conclude that the EAAT amply meets criterion levels for a psychometrically robust test.
Introduction The Aachen aphasia test is a test of language functioning after brain injury, and as such aims to: reliably identify the presence of aphasia ; provide a pro®le of speakers ’ language functioning according to diåerent language modalities (speaking, listening, reading, writing) and diåerent levels of linguistic description (phonology, morphology, semantics and syntax); and give a measure of severity of any breakdown. Based on this the test oåers indications for more detailed testing or therapeutic intervention. Through its syndrome classi®catory system it can provide a means of assembling speakers into naturally occurring diagnostic groups for clinical or research purposes. The test was developed (and standardized) primarily for speakers with a CVA. For other aetiologies, e.g. head injury, Address correspondence to : N. Miller, Department of Speech, University of Newcastle-Tyne, NE1 7RU, UK. Email : nicholas.miller! ncl.ac.uk ’ 2000 Psychology Press Ltd http:}}www.tandf.co.uk}journals}pp}02687038.html
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
684
N. Miller et al.
degenerative conditions, developmental disorders, the test can still be validly used in a descriptive fashion for individual diagnostic purposes, though the normative data do not apply. The Aachen aphasia test (AAT) was originally developed and published in German (Huber et al. 1983, Huber et al. 1984). Since then versions have appeared in Italian (Luzzatti et al. 1991), Dutch (Graetz et al. 1992) and Thai (Prachritpukdee et al. 1998). This article describes the development and standardization of the English language version of the test (henceforth EAAT). By way of an introduction, before proceeding to examine whether the EAAT meets the claims made above, we ®rst consider some general issues concerning what psychometric standards an (aphasia) test should meet to be clinically objective, valid and reliable, to form some kind of yardstick against which the EAAT might be judged. We then describe the rationale lying behind the EAAT, its structure, and its adaptation into English, before ®nally concentrating on the standardization study.
Properties of a psychometrically sound aphasia test Numerous tests purporting to measure aphasic language performance, and making claims similar to those of the EAAT, have appeared over the past decades. On the surface they all seem to assess very similar aspects of language. However, surface similarities mask considerable variability in the theoretical rationales underlying test construction and more importantly hide marked contrasts in the rigour with which psychometric design principles have been applied during test construction, development and standardization. Aphasia tests in general have been criticized for failing to meet the criteria prerequisite for a psychometrically sound instrument (e.g. Linebaugh 1979, AERA-APA-NCME 1985, Skenes and McCauley 1985, Spreen and Risser 1998) and have fared badly when compared to the relative psychometric strength of other (neuro)psychological test types (Spreen and Risser 1998). Others have questioned aspects of speci®c tests (Martin 1977, Nicholas et al. 1986, Lincoln 1988, Crary et al. 1992, Davies 1993). Reviews have pointed to inadequacies in all areas of objectivity, validity, and reliability. Aphasia tests have been criticized, for instance, for resting on questionable theoretical bases. Validity has been claimed to be undermined through use of tasks too open to in¯uence from non-linguistic variables such as educational level, general intelligence, or other sequelae of brain damage like apraxia, visualperceptual dysfunction and attention or memory di¬culties (Davies 1993, Spreen and Risser 1998). At other times, tasks have been utilized that, although ostensibly linguistic, are criticized as saying little about central language dysfunctionÐe.g. counting, or repetition of proverbs. Some batteries contain subtests that fail to separate out elements of language function, so that a clear picture of ability in particular modalities or levels is impossibleÐfor instance they demand expressive language in completing comprehension tasks, or comprehension of verbs is confounded by variations in syntactic structure. On occasions little attention has been paid to the homogeneity of items in a task set, or sets within a subtest, leading to low internal consistency. Task design has been criticized for ignoring ¯oor and ceiling eåects and an adequate spread of di¬culty of items and subtests to detect the diåerences that exist within and between aphasic language performances.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
685
Some tests have sought to answer the construct validity question by using another aphasia test as an external criterion. This argument has often been circular and ¯awed where the comparator test lacks demonstrable construct validity itself. Others have employed factor analytic or principle component analysis approaches to show that their test does tap linguistic factors and to justify particular constellations of, or relationships amongst subtests. Willmes (1993) has pointed out the shortcomings of this approach, that ignores long-standing discussions about the theoretical problems inherent in the common factor model, in particular that of factorial underdeterminacy (Steiger and Scho$ nemann 1978). Some tests lack standardization data. Others are seen as inadequate due to weaknesses in the composition (size, spread) of the standardization sample, lack of matched control groups or absence of a control group with brain damage but no aphasia (Spreen and Risser 1998). The latter is especially important in lightening the grey area in cut-oå scores between healthy speakers, borderline scores of neurologically impaired speakers without aphasia that result from general eåects of neurological dysfunction, and genuine instances of aphasia. This article examines whether the EAAT possesses the desirable properties of a robust aphasia test in terms of the rationale behind its construction, its adaptation into English, the piloting of earlier versions, the groups and participants used in the standardization procedure, and elements of validity and reliability. Speci®cally objectivity is considered with reference to the administration and scoring conventions of the test. Validity is gauged from the analysis of the construct validity of the test, as well as its predictive or diåerential validity for separating out speakers with and without aphasia and identifying subgroups of speakers within the overall sample. Reliability is addressed through scrutiny of the internal consistency of item sets and subtests and a consideration of the issue of inter- and intrascorer and retest reliability. Rationale underlying the AAT The rationale underlying the (E)AAT is that aphasia represents a dysfunction of central language processes. Breakdown is seen as multimodal, though degree of impairment across modalities need not be equal and on occasions may be con®ned to one modality. Dysfunction may aåect all or any components (syntax, phonology ¼) of language diåerentially, independent of modality, though impairment can vary according to modality. The (E)AAT makes no assumptions about models of language functioning underlying behaviour, aiming to furnish only a linguistic description of an individual’s performance. Explanations for the language picture with which speakers present must be sought by applying to the data a model of language processing. Likewise, while the (E)AAT can assign speakers to diåerent syndromes, the divisions rest on surface language performance, making no assumptions about lesion sites or the functional relationship between symptoms of the syndromes identi®ed (see section headed `Syndrome assignment ’ on p. 692). The syndrome groupings derived from the AAT can supply the material for the analysis of why characteristic group pro®les should emerge, but does not in itself provide explanations. Based on this conceptualization, the construction of test items and their assembly into subparts and subtests re¯ects patterns of linguistic structure. Three
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
686
intersecting facets underlie construction. The (E)AAT is built on the principle of investigating : (1) the levels (phonology, semantics, syntax); (2) units of language (phonemes, morphemes, syntactic structures); and (3) the regularities, or rules, that apply for a given language for the combination and diåerentiation of these units. It also covers the diåerent modalities in which they are used. Hence subtests progress through the various units of analysis (sounds, syllables, morphemes, words, phrases). Within each level a range of regularities of English is sampled. The diåerent subtests vary according to modality (e.g. written, spoken) or transcoding between modalities (e.g. grapheme-phoneme conversion and vice versa ; auditory to spoken conversion). Emphasis on purely linguistic criteria means that subtest types present in many other aphasia examinations are excluded from the (E)AAT. This applies to tasks considered uninformative regarding central language functioning (e.g. carrying out so called automatic sequencesÐcounting, reciting the months of the year ; copying letters); tasks showing wide variation and}or problems of objective scoring even for non-brain damaged speakers (e.g. recall of paragraphs, de®ning words); and tasks susceptible to breakdown because of other neuropsychological dysfunctions (e.g. of praxis, memory, attention, visual perception). Omitted on this basis are digit recall, oral non-verbal movement tests and shape naming and copying.
Format and content of the EAAT The EAAT comprises four subtests (table 1), looking at phonology, reading and writing, naming and comprehension. Each subtest is divided into between three and ®ve parts, giving sixteen parts in total. These examine diåerent units and modalities in the linguistic level concerned. Each subpart is ten items long. These four subtests are preceded by a semi-standardized interview and a modi®ed version of the token test. The semi-structured interview (approximately 10 minutes long, where possible) is to establish rapport with the speaker and to elicit a spontaneous language sample. The same four open-ended questions about the person’s interests and circumstances are always asked, but the exact wording and elucidation of the questions can be varied to suit the individual speaker. The token test (TT), an adaptation by Orgass (1986) of De Renzi and Vignolo (1962), consists of ®ve subparts, each ten items long. There is a pre-test practice task to establish if the person can point to squares of the diåerent colours. If they do not succeed after three tries, the rest of the TT is not administered. The TT was designed to detect mild language comprehension disturbances. The error score provides an indication of the presence or not of aphasia and a measure of overall severity of aphasic symptoms. The subtest repetition has ®ve parts, each involving the repetition after the examiner of a diåerent unit of language. They are : (1) isolated sounds (e.g. }f}, }a :}); (2) monosyllabic, monomorphemic words with consonant clusters (e.g. axe, splodge); (3) monomorphemic words of increasing length but simple (CV) syllable structure (lee to hippopotamus); (4) words of increasing bound morphemes (entry to unconventionality); and (5) sentences with increasing number of constituents. In other language versions of the AAT, an added criterion for word selection in
English language version of the AAT Table 1.
Subtests and subparts of the EAAT
Subtests
Parts
(1) Spontaneous Language Sample: Rated on 6 point scales for :
Communicative behaviour Articulation and prosody Formulaic language Semantics Phonology Syntax Pretest and 5 test parts 1 Sounds 2 Single syll.words, increasing complexity 3 Multisyllable words, increasing length 4 Morphologically complex words 5 Sentences 1 Reading aloud words}phrases 2 Composing words}phrases from blocks 3 Writing words}phrases to dictation 1 Pictured objects 2 Colours 3 Pictured compound nouns 4 Pictured sentences 1 Auditory, words 2 Auditory, sentences 3 Reading, words 4 Reading, sentences
(2) Token Test: (3) Repetition :
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
687
(4) Written Language:
(5) Naming :
(6) Comprehension :
(3) is that they be loan-words into the language. Because of the strong admixture of Germanic and Latinate roots in English, which leads to a blurring of the loannative word distinction, and the desire to maintain frequency and word class variables of the original, this criterion was relaxed to choosing more familiar loan words (salami, bidet) and words with non-English roots. The three parts of the subtest written language investigate respectively the reading of words (letter size 1 cm) and phrases, making up a given word or phrase using plastic letter (3.5¬ 4 cm) and word (3.5¬ 7 cm) tiles and writing to dictation. Before the subpart `composing words and phrases ’ is attempted, speakers have to show they can match the letters, digraphs and words used to the corresponding ®elds on a printed sheet and pass a demonstration}practice item. To enable a precise comparison across tasks, items in the three parts parallel each other in syllable}phrase structure and as closely as possible in word frequency. Table 2 illustrates the internal structure of these three subparts. Naming is examined through four tasksÐnaming pictures (page size 20¬ 29.5 cm) of common simple nouns (e.g. table, belt); colours ; compound nouns (e.g. hairdryer ; typewriter); and producing a sentence to describe ten pictured situations. The latter, as for repetition, are designed to elicit phrases with an increasing number of obligatory constituents. Words are chosen to cover a range of semantic categories and to ful®l criteria of unambiguousness of lexical label and visual-perceptual distinctness. The compound nouns are made up from noun-noun and noun-verb combinations where correct naming cannot be achieved by naming only a subfeature of the picture or only one part of the compound.
N. Miller et al.
688
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Table 2. Word and phrase structure of the subparts of the subtest written language (numbers in brackets refer to number of letters in word ; C in brackets represents ©rª which is not pronounced in English non-rhotic accents) Item
Syllable} word structure of items
1 2 3 4 5 6 7 8 9 10
CVC (4) CCVCC (5) CCVC (6) CVCV(C) (6) CV(C)CVCV (7) 1 word (1 free 2 bound morphemes) 1 compound word (N-N-N) Pron. V poss. Pron. Nobj Pron V be adj. Wh-Q aux}modal pron V pron pron
The ®nal subtest covers, in four separate parts, comprehension of spoken and written words and sentences. Speakers must point to the target from a selection of four pictures (page of 20¬ 29.5 cm divided into 4), one other of which is a close semantic, phonemic, or syntactic distracter ; one a picture thematically related to the target, but not close phonemically, semantically or syntactically ; and the fourth unrelated in any way. For the sentence stimuli, items are constructed so that the target cannot be deduced by attention to a keyword in the sentence. Again, in order to maximize comparability across subparts (and languages), the corresponding auditory and reading tasks are structured to parallel each other in the linguistic feature being probed. Amongst categories covered are discrimination between semantically similar words (e.g. according to function or the semantic ambiguity of homonymous words); and inference of sentence meaning from highly pronominalized phrases. The administration of subtests and subsections takes the order of table 1, with the exception of writing to dictation which comes between naming and comprehension. Subtests are ordered such that ones generally di¬cult for speakers with aphasia are followed by ones found generally easy. Further, the modality of presentation and response is alternated. Within subtests the sequence of parts is intended to follow a gradient of rising di¬culty (cf. below).
Scoring The spontaneous language sample is evaluated according to six criteria (Table 1). The communicative behaviour scale (adopted from the Boston diagnostic aphasia examination, Goodglass and Kaplan 1972) gives an indication of overall communicative success. The other scales pertain to: (1) segmental and suprasegmental aspects of articulationÐin general designed to highlight the presence and degree of dysarthria or speech apraxia ; (2) the extent to which the speaker relies on formulaic language, or is restricted to recurrent utterances or echolalia ; (3) semantic production ; (4) phonological production ; and (5) syntactic production.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
689
Each level is rated on a six point scale, where 0 represents non-scorable and 5 normal performance. Intermediate scores are clearly de®ned in the manual on the basis of qualitative symptoms, their frequency and severity. Each TT section is marked on a right-wrong basis and the score, following the conventions set out by Orgass (1986), interpreted from the total number of errors. All items in the subtests repetition, written language, naming and comprehension are scored on a four point scale, where 3 represents normal performance and 0 no response, preservation, automatism, or totally unrelated to the target. In general 2 represents a correct response, but after a long reaction time, hesitancy, self correction, or with minor deviation from the target ; 1 represents in general a response with weak similarity to the target. Examples and de®ning criteria for these distinctions are provided in the manual. The scale is intended to add scoring sensitivity that would not be available with a simple pass-fail convention. In this way mild disruptions can be detected that do not necessarily lead to a high number of incorrect responses, but where performance with its slowness, hesitations and self corrections is clearly diåerent to the person’s premorbid behaviour. Similarly, changes in performance characteristics that would not register on a completely right vs any type of wrong scale can be picked up. Adaptation into English All languages employ sounds, combined in turn into words and these in turn into phrases. However the precise range of sounds and words and the rules that dictate their contrasts, combination, and use clearly diåer across languages. For this reason it is not possible to simply translate a test from one language into another and still expect to test the same language variables. For example, an item investigating word ®nal morpheme contrasts in the diåerentiation of number or tense in one language may end up a test of function word usage, and}or involve complex adjective-nounverb agreement across a whole phrase in another language. A lexical item with a high frequency and simple syllable structure in German may have a complex multisyllable structure in English and demand the choice of a low frequency word. The tactic in deriving the EAAT from the German original was therefore not to simply translate items. Rather, the aim was to preserve the underlying rationale for the division of subtests from each other, as well as the characteristics of subtasks within a subtest. The tasks themselves, though, were adapted to the structure and regularities of English. Some examples illustrate the point. For single word items, German words were translated directly only if this did not alter variables being tested in a particular subpart and did not aåect the frequency or picturability of the item. Where a new item had to be chosen, the new word was if possible in the same semantic ®eld as the original. For instance, KuX hlschrank, an item in the compound word naming section, is a compound in German, but the translation refrigerator is not. Close equivalents such as icebox or coolbox lack familiarity and unambiguous picturability. An alternative kitchen machine (dishwasher) therefore needed to be chosen. It was possible to directly translate DosenoX åner as tinopener}canopener. However, the picture needed to be changed as the standard German tinopener looks diåerent to the standard British one. In other sections it was possible to translate some German words, e.g. in auditory comprehension, SchuX ssel to bowl, Star to star. However, the foil pictures
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
690
on these occasions required a change, ®rstly to form a minimal phoneme foil for `bowl ’ and secondly to create a semantic foil for `star ’ re¯ecting its alternative meanings in English in contrast to the diåerent semantic associations in German. The word Viereck used in the TT provides an illustration of several issues in adapting a test. Viereck is the high frequency word used in the German AAT for the right-angled shapes of the TT. It translates into English strictly speaking as quadrilateral, since it speci®es a four sided ®gure, but is neutral as regards length of sides and angles. Such a low frequency word would be unacceptable for the EAAT. Rectangle was chosen as an appropriate translation, but piloting showed not all English speakers knew this low frequency word. Hence the shape was changed to square, a word which piloting showed was recognized by everyone. In the repetition sections syllable structure, morpheme count and word class are crucial variables. Adaptation strove to match these elements across languages. Ast (branch) in the second part of the repetition subtest could have been replaced with a syllabically and segmentally close `oast ’. This would have contravened the word frequency criterion ; `oats ’ would have produced two morphemes instead of one. Stern (star) and Schokolade (chocolate) appear in the German repetition of words sections. Stern could have been rendered as stern (albeit diåerent meaning) in the EAAT, except that would have altered the criterion of having words all of the same class. Chocolate would have preserved meaning and class, but altered the syllable structure and stress pattern of the item. In other subparts the crucial variables centre around phrase and sentence level constituents. The preservation of these variables was also attended to in the adaptation into English. Where the German AAT was tapping speakers ’ ability to comprehend, read or repeat, for example, a relative clause structure, the English item followed this, even though the precise morpho-syntactic expression of relative clauses is diåerent in English to German. In the subtests written language and comprehension corresponding items across all subtasks parallel each other exactly. This element of design was retained using English words and structures that did not always translate the German meaning but did keep the elements of word class, word structure, syntactic class (e.g. interrogative, relative clause), and as nearly as possible the properties of length, complexity and frequency.
Pilot versions After initial adaptation the preliminary versions of the EAAT were piloted on groups of healthy speakers and speakers with aphasia. In particular, items in the repetition subtest were monitored to insure that English speakers from a variety of social and geographic backgrounds pronounced the words uniformly in the expected fashion without phoneme or syllable alterations ; words in the written language subtest were checked for pronunciation and spelling agreement ; the naming and comprehension subtests were surveyed for picture recognition and label agreement ; and instructions were assessed for clarity. For several items a number of English possibilities had been chosen, so the early pilot versions had longer subparts than the ®nal version. Items were retained if they showed full label and picture agreement and could be read}spelled}pronounced by all participants. Only items ful®lling the constructional criteria for a given subtest were piloted, so suitability of frequency, syllable}word}phrase structure and class
English language version of the AAT
691
was assured. Apart from elimination of a range of candidate items, the pilot testing also occasioned the alteration of the TT shapes, mentioned above. None of the speakers involved in piloting participated in later testing. The remainder of the article describes the standardization procedure based on the ®nal version of the EAAT. Methods and materials
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Participants One hundred and thirty ®ve speakers with aphasia and 93 without aphasia participated voluntarily in the standardization study after giving informed consent. Gender, age and education details are given in table 4, together with time post onset at which testing took place for speakers with neurological illness. Criteria for inclusion in the group with aphasia were: age 18±75 years ; monolingual English speaker ; single, left hemisphere stroke ; right handed prior to CVA for all day to day activities and not forced to change hands as a child. At the time of testing 79 of the 135 speakers with aphasia had a right hemiparesis or hemiplegia. It was not systematically recorded whether they used their left or hemiplegic}-paretic arm when pointing, manipulating tiles and writing. Exclusion criteria were: hearing and vision considered inadequate by testing clinician to carry out the test (even after correction with aid or glasses); unable to join in full testing session (approx. 60±90 mins) due to other consequences of the stroke (e.g. demotivation, inattention, disorientation) or concurrent illnesses ; unable to read or write prior to the stroke ; presence of other neuropsychological sequelae of brain injury likely to interfere with test performance (e.g. severe memory or visual perceptual disturbance); previous history of neurological illness (including previous strokes with or without apparent persisting impairment ; signs of dementias or psychiatric disorders); previous history of speech language disorder ; on medication known to cause drowsiness. Site of lesion and stroke diagnosis were based on computerized tomographic scan evidence where available (n ¯ 65), otherwise on detailed neurological examination and case history. The de®nition of monolingual included speakers who had learned a foreign language at school, but not if they had used another language at any time during their lives for regular communication or spent anything more than a holiday in a foreign language speaking environment. Handedness and literacy levels were based on self and relatives ’ reports. Previous medical history and information on other sequelae of the stroke were based on medical notes as well as self and relatives ’ reports. Details of medication were taken from current medical notes. There were three control groups. Group 1, designated healthy speakers, were 24 speakers who met the same criteria as the group with aphasia, except they had no history of central nervous system damage nor aphasia and were not currently hospitalized. Group 2, labelled hospitalized, consisted of 41 participants meeting these same criteria, but at the time of testing they were hospitalized for nonneurological complaints (orthopaedic and dermatology wards). This group was included to cover possible variation in test scores that may derive simply from the stress of being ill and a patient in hospital. People were excluded from this group if they had had any kind of anaesthetic in the previous 5 days. Group 3 consisted
N. Miller et al.
692
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
of 28 speakers who had some kind of neurological illness (including stoke, spinal cord injury, multiple sclerosis) but were judged on the basis of clinical interview by a speech language pathologist and other (non-EAAT) aphasia testing to have no aphasia. The 135 speakers with aphasia were people who had been referred to 10 speech and language therapy departments in Scotland, the north of Ireland and the north Midlands and south of England. The 24 speakers in the ®rst control group were close relatives of the speakers with stroke being tested. This was to insure a matching of socio-economic background. The remaining control participants were selected from the same district hospitals and community health services as the participants with aphasia. Testing Conduct of the test followed strictly the directions set out in the test manual and administration book (identical to the German original, Huber et al. 1983). Testing was carried out by one of the authors (NM) and other quali®ed speech and language therapists experienced at working with people with aphasia. Therapists with no prior experience of the EAAT received three days training in the background, administration and scoring of the test, as well as later carrying out several test sessions to familiarize themselves with administration and scoring before including speakers for this study. They also received copies of the manual for the EAAT. Testing took place in hospital speech and language therapy departments or in the participant’s home, according to the local service delivery policies. Each tester contributed speakers to each control group as well as to the group with aphasia. Scoring was veri®ed by one of the authors (NM) and two other clinicians with extensive experience in using the AAT. Syndrome assignment It is possible to assign speakers to diåerent aphasic syndrome groups based on their (E)AAT score pro®le. Although the (E)AAT retains the `classical ’ labels of Broca’s, Wernicke’s, global and amnestic (anomic) aphasia, de®nitions for the syndromes are not to be understood in `classical ’ terms. Syndromes are de®ned according to the surface linguistic behaviours typical of these diåerent syndromes. In the other published versions of the AAT, speakers are assigned to syndrome groups on the basis of their six spontaneous language rating scale scores and raw scores on the ®ve subtests. A non-parametric discriminant analysis program (ALLOC, Habbema et al. 1974) is routinely used for classi®cation of speakers with aphasia (Willmes 1985). If, based on this program, the probability for a speaker being classi®ed into one of the four main syndromes according to their performance pro®le is " 70%, then they are assigned to that syndrome. Where probability is ! 70% the speaker remains non-classi®ed vis-a[ -vis the main syndrome. For EAAT standardization purposes agreement between clinicians ’ impression of syndrome membership was checked against syndrome membership derived from the results of the discriminant analyses carried out on the formal test scores. The criteria by which clinicians judged syndrome membership were based on features derived from clinical and experimental investigations of the diåerent
English language version of the AAT
693
Table 3. External clinical criteria for assigning speakers to EAAT syndrome groups on the basis of performance on the spontaneous language sample
Language production Articulation
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Prosody
Sentence structure
Word ®nding
Sound structure
Comprehension
Amnestic
Wernicke’s
Broca’s
Usually ¯uent
Fluent
Signi®cantly slowed
Global
Sparse or absent or recurrent utterances Usually not Usually not Often dysarthric} Usually dysarthric} aåected aåected apraxic apraxic Usually well Usually well Often monotone} Often monotone} retained retained pitch and or pitch, but may be scanning retained on automatic utterances Hardly aåected Paragrammatic Agrammatic Single words. (doubling, (simple structures. Jargon. Recurrent blending of parts) Absence of utterances. function words and in¯ections Circumlocution or Many semantic Relatively Extremely substitution paraphasias, often restricted. Hardly restricted strategies when a far oå target ; any semantic vocabulary ; word cannot be semantic paraphasias semantic found ; some neologisms ; in paraphasias way semantic severest cases oå target paraphasias semantic jargon Some phonemic Many phonemic Many phonemic Very many paraphasias paraphasias, paraphasias phonemic sometimes leading paraphasias and to phonemic neologisms jargon and neologisms Mildly impaired Severely impaired Mildly impaired Severely impaired
aphasia types (Poeck et al. 1975). These criteria, summarized in table 3 were external to the test itself. No assumptions are made about links to lesion sites nor possible underlying disruption to language processing (Poeck 1983). Clinicians made their judgement of which syndrome a speaker belonged to after the completion of the semi-structured interview, but before any rating scales had been completed and before any further testing was carried out. It is also possible to identify with the EAAT the `non-standard ’ syndromes of conduction aphasia, transcortical aphasias, pure alexia, alexia with agraphia, pure agraphia and an isolated motor speech disorder. Speakers with these latter syndromes were not included in this standardization study. Statistics Data were stored and analysed using SPSS, program SSA-I from the GuttmanLingoes Nonmetric Program Series (Lingoes 1973) for the nonmetric multidimensional scaling analysis ; program ALLOC80 (Hermans et al. 1982) for the
N. Miller et al.
694 Table 4. Descriptive statistics for 228 participants in EAAT study. NA ¯
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Sex
Age (years)
Time since CVA (months)
not applicable
Schooling (years)
Group
N
F
M
Mean
Range
Mean
Range
Mean
Range
Global Wernicke’s Broca’s Amnestic Nonclassi®able aphasia All speakers with aphasia CNS no aphasia Hosp. No CNS, no aphasia Healthy controls All speakers no aphasia
30 30 30 30 15
8 9 12 9 3
22 21 18 21 12
62.7 64.4 57.0 58.1 55.9
41±74 48±72 30±72 27±73 29±69
10.5 11.3 29.3 12.4 18.7
0±81 0±41 0±151 0±39 1±82
9.3 8.8 9.1 9.5 9.5
8±12 7±12 8±12 4±13 7±14
135
41
94
60.0
29±74
15.9
0±151
9.2
4±14
28 41
12 23
15 18
55.4 54.8
23±72 21±70
9.0 NA
3±20 NA
9.8 9.1
8±22 8±12
24 93
16 51
8 42
51.8 54.2
24±70 21±72
NA NA
NA NA
11.1 9.8
8±17 8±22
nonmetric discriminant analysis ; and software provided by one of the authors (KW) for carrying out the permutation test procedures (Willmes 1987 ; Edgington 1995). Results Details of the main speaker variables for control groups and those with aphasia are summarized in table 4. Table 5 gives the ranges of scores of all groups on the spontaneous language rating scales and all subtests of the EAAT, together with means, medians and standard deviations. Assignment to syndrome groups used here was as detailed above (table 3). Pearson correlation coe¬cients were computed for all aphasic speakers for the relationship of their EAAT rating scale and subtest scores with age, time since onset of aphasia, years of education and male vs female participants. There were no signi®cant correlations between scores and duration of aphasia, nor years of schooling, nor signi®cantly diåerent correlations for male and female participants. The rating scales for semantics (r ¯ ® 0.22) and phonology (r ¯ ® 0.23) and the subtests written language (r ¯ ® 0.24) and comprehension (r ¯ ® 0.24) showed correlations with age signi®cant at p ¯ ! 0.01. The overall communication rating scale (r ¯ ® 0.20) and the subtests repetition (r ¯ ® 0.22) and naming (r ¯ ® 0.21) displayed a signi®cant correlation with age at p ¯ ! 0.05. However, even here there is never more than 6 % of variability explained by a linear relationship, giving no justi®cation for an age correction to scores based on this sample. Construct validity To test if the linguistically motivated construction of the EAAT was empirically valid the intercorrelation matrix (using the monotonicity coe¬cient l ÐShye 1985 : # 72Ðas a measure of a monotone, not necessarily linear relationship amongst
English language version of the AAT
695
Table 5. Median (top left quadrant), range (top right), mean (bottom left) and standard deviations (bottom right) per subtest (TTO, REPO etc.) and subpart (numbered as in table 1) for speakers with aphasia and control groups
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
All aphasics (n ¯ 120)
Wernicke’s Global aphasia aphasia (n ¯ 30) (n ¯ 30)
Amnestic Broca’s aphasia aphasia (n ¯ 30) (n ¯ 30)
CNS no aphasia (n ¯ 28)
Hospital no aphasia (n ¯ 41)
Home no aphasia (n ¯ 24)
Spontaneous Language rating scales COM 2.0 (0±5) 1.0 (0±1) 2.15 (1.30) 0.6 (0.50) ART 4 (0±5) 1.0 (0±4) 3.20 (1.63) 1.2 (1.47) FOR 4 (0±5) 0.0 (0±4) 3.16 (1.74) 0.77 (1.13) SEM 3 (0±5) 0.0 (0±2) 2.51 (1.61) 0.23 (0.50) PHO 3 (0±5) 0.0 (0±2) 2.74 (1.68) 0.37 (0.71) SYN 3 (0±5) 0.0 (0±1) 2.29 (1.59) 0.17 (0.38)
2 2.07 4 4.23 4 3.73 3 2.83 4 3.30 3 2.90
(0±4) (0±87) (2±5) (0.73) (2±5) (0.74) (0±4) (0.99) (1±5) (0.95) (0±4) (0.88)
2.5 2.37 3 3.13 4 3.73 4 3.27 3 3.07 2 2.27
(1±4) (0.81) (1±5) (1.17) (0±5) (1.41) (0±5) (1.31) (0±5) (1.14) (0±4) (1.26)
4 3.57 4 4.23 4 4.40 4 3.70 4 4.23 4 3.83
(2±5) (0.82) (3±5) (0.68) (3±5) (0.62) (3±4) (0.47) (3±5) (0.57) (3±5) (0.59)
5 4.89 5 4.82 5 4.96 5 5.0 5 5.0 5 5.0
(4±5) (0.31) (3±5) (0.48) (4±5) (0.19) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0)
5 5 5 4.98 5 5 5 5 5 5 5 5
(5±5) (0.0) (4±5) (0.16) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0)
5 5 5 5 5 5 5 5 5 5 5 5
(5±5) (0.0) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0) (5±5) (0.0)
Token Test TT0 32 29.57 TT1 2 3.45 TT2 5 4.77 TT3 8 6.42 TT4 10 7.37 TT5 9 7.52
(0±50) (15.73) (0±10) (3.50) (0±10) (3.67) (0±10) (3.54) (0±10) (3.32) (0±10) (3.17)
48 45.37 8 7.33 10 8.73 10 9.60 10 9.83 10 9.87
(31±50) (5.46) (2±10) (2.68) (4±10) (1.74) (8±10) (0.77) (8±10) (0.53) (8±10) (0.43)
38 32.63 4 3.87 6 5.33 8 7.03 10 7.97 10 8.43
(0±50) (13.49) (0±10) (2.92) (0±10) (3.10) (0±10) (3.07) (0±10) (3.11) (0±10) (2.70)
25 25.43 1 1.80 3 3.53 6 6 7 6.93 8.5 7.17
(1±46) (12.93) (0±7) (2.09) (0±9) (2.87) (0±10) (3.31) (0±10) (3.36) (1±10) (3.18)
13.5 14.87 0 0.97 1 1.5 4 3.03 4 4.77 4 4.60
(0±50) (11.26) (0±10) (2.30) (0±10) (2.21) (0±10) (2.77) (0±10) (3.17) (0±10) (2.84)
1 1.89 0 0.04 0 0.32 0 0.04 0 0.64 0.5 0.86
(0±11) (2.51) (0±1) (0.19) (0±3) (0.72) (0±1) (0.19) (0±4) (1.16) (0±4) (1.11)
1 1.85 0 0 0 0.07 0 0.12 0 0.54 1 1.12
(0±7) (2.01) (0±0) (0.0) (0±1) (0.26) (0±2) (0.40) (0±3) (0.95) (0±4) (1.31)
0 0.54 0 0.08 0 0 0 0 0 0.25 0 0.21
(0±3) (0.88) (0±1) (0.28) (0±0) (0.0) (0±0) (0.0) (0±2) (0.61) (0±2) (0.51)
Repetition REP0 57 89.77 REP1 27 21.62 REP2 23 20.08 REP3 19 19.07 REP4 17 15.72 REP5 14 13.27
(0±149) (46.81) (0±30 (9.87) (0±30) (9.87) (0±30) (10.28) (0±30) (10.29) (0±29) (10.03)
25 32.53 7 9.87 6 8.20 3.5 7.17 2.5 4.70 0 2.60
(0±115) (32.70) (0±28) (9.26) (0±30) (8.46) (0±24) (7.75) (0±24) (6.28) (0±18) (4.35)
98.5 92.03 26 22.27 22.5 21.43 22.5 19.30 15.5 16.13 12.5 12.90
(1±147) (38.32) (0±30) (8.41) (0±30) (7.84) (0±30) (9.10) (0±30) (9.61) (0±28) (9.49)
106.5 101.50 28 26.03 23.5 22.90 23.5 21.63 17 16.53 15.5 14.40
(47±144) (26.71) (12±30) (4.49) (13±30) (5.80) (8±30) (6.66) (2±30) (7.20) (0±28) (7.80)
137.5 133.03 29 28.33 29 27.80 29 28.17 27 25.53 25 23.20
(65±149) (16.40) (10±30) (3.61) (13±30) (3.86) (18±30) (2.68) (10±30) (4.95) (13±29) (4.63)
148 147.14 30 29.61 30 29.54 30 29.64 30 29.43 30 28.93
(140±150) (3.06) (27±30) (0.83) (23±30) (1.37) (27±30) (0.87) (26±30) (1.20) (25±30) (1.56)
148 147.39 30 29.66 30 29.63 30 29.68 30 29.56 29 28.85
(137±150) (3.28) (26±30) (0.79) (27±30) (0.70) (28±30) (0.61) (26±30) (0.90) (23±30) (1.49)
149 149.17 30 29.87 30 30 30 29.92 30 29.96 30 29.42
(146±150) (1.01) (29±30) (0.34) (30±30) (0.0) (29±30) (0.28) (29±30) (0.20) (27±30) (0.83)
Written language WRI0 40 (0±89) 40.19 (30.17) WRI1 19 (0±30) 16.27 (11.12) WRI2 13 (0±30) 13.14 (11.00) WRI3 7 (0±30) 10.77 (10.30)
4 8.13 0 3.5 0.5 2.80 0 1.83
(0±38) (10.07) (0±22) (5.96) (0±16) (4.49) (0±17) (3.49)
30 31.77 16.5 15.47 8 9.33 5 6.97
(0±81) (22.07) (0±29) (9.75) (0±28) (8.56) (0±25) (7.37)
51 49.47 23 20.30 18 16.57 15 12.60
(12±82) (22.07) (1±29) (7.98) (2±29) (8.29) (0±27) (8.81)
76.5 71.40 27 25.83 26.5 23.87 24.5 21.70
(0±89) (20.69) (0±30) (5.71) (0±30) (8.75) (0±30) (8.16)
88.5 86.75 30 29.64 29.5 29.36 29 27.75
(60±90) (5.75) (27±30) (0.78) (27±30) (0.78) (0±30) (5.65)
88 86.51 30 29.54 30 29.10 29 27.88
(61±90) (5.07) (25±30) (0.87) (22±30) (5.58) (12±30) (3.25)
89.5 89.04 30 29.87 30 29.71 30 29.46
(86±90) (1.27) (29±30) (0.34) (29±30) (0.46) (26±30) (1.06)
Naming NAM0 63 58.34 NAM1 19 16.37 NAM2 21 17.46 NAM3 12 13.33 NAM4 12 11.40
(0±114) (40.23) (0±30) (11.24) (0±30) (11.15) (0±30) (10.88) (0±28) (8.84)
2.5 9.83 0 3.30 0 3.17 0 2.70 0 0.67
(0±65) (14.76) (0±21) (5.40) (0±17) (4.64) (0±23) (5.67) (0±5) (1.42)
49 51.90 13 13.80 20.5 18.00 5.5 9.23 11 10.87
(2±107) (29.34) (0±29) (9.53) (0±30) (8.14) (0±28) (8.66) (0±25) (5.94)
89.5 76.60 24.5 22.43 26 22.03 20.5 18.43 15.5 13.70
(27±109) (27.19) (8±30) (6.76) (3±30) (8.38) (1±30) (7.94) (0±23) (6.48)
103 96.00 28 25.97 29 26.63 25 23.03 21 20.37
(7±114) (22.54) (0±30) (6.01) (5±30) (5.66) (0±30) (7.31) (2±28) (5.74)
116.5 114.57 30 29.43 30 29.25 30 28.61 29 27.29
(77.120) (8.23) (20±30) (1.91) (24±30) (1.29) (13±30) (3.34) (15±30) (3.64)
116 115.46 30 29.32 30 29.05 30 29.49 28 27.61
(108±120) (3.10) (27±30) (0.96) (24±30) (1.50) (27±30) (0.87) (21±30) (2.53)
119 117.29 30 29.42 30 29.67 30 29.50 29 28.71
(108±120) (3.41) (25±30) (1.28) (24±30) (1.24) (27±30) (0.88) (26±30) (1.40)
Comprehension COMP0 83 81.14 COMP1 24 23.57 COMP2 23 21.28 COMP3 21 19.46 COMP4 17 16.84
(29±120) (24.88) (9±30) (5.09) (0±30) (6.72) (2±30) (8.04) (0±30) (8.45)
53.5 54.30 20 18.93 16 15.13 11 11.30 11.5 8.93
(29±104) (16.09) (9±29) (5.15) (0±29) (7.07) (2±29) (5.11) (0±17) (6.36)
75 75.00 22 22.17 21 20.30 21 18 14 14.53
(42±104) (17.20) (14±30) (3.73) (9±28) (4.96) (2±26) (7.41) (0±25) (6.28)
95 94.03 26.5 26.23 27 24.87 23.5 23.33 19.5 19.60
(56±114) (15.53) (21±30) (2.50) (14±30) (4.52) (11±30) (4.94) (6±30) (6.37)
107.5 101.23 28 26.93 26 24.80 28 25.20 26 24.30
(43±120) (19.24) (10±30) (4.06) (11±30) (4.89) (8±30) (6.24) (7±30) (6.09)
114 112.18 30 28.64 28 27.29 29.5 28.64 28 27.61
(76±120) (8.38) (17±30) (2.78) (19±30) (2.98) (18±30) (2.48) (19±30) (2.45)
115 113.71 28 28.12 29 28.56 29 28.61 29 28.41
(90±120) (6.20) (19±30) (2.00) (22±30) (1.99) (21±30) (2.04) (22±30) (2.24)
118 117.25 30 29.12 30 29.37 30 29.54 30 29.21
(110±120) (3.0) (26±30) (1.30) (25±30) (1.21) (27±30) (0.88) (27±30) (1.02)
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
696
N. Miller et al.
Figure 1. Grouping of item sets according to hierarchical cluster (complete linkage coe¬cient l # ) n ¯ 120. Numbers at end of branches refer to subtest subparts.
Figure 2. Smallest space analysis for subtests of the EAAT (two dimensional solution. Alienation coe¬cient 0.16). Numbers refer to subtest subparts.
variables) of the results from the 120 speakers with clearly classi®able aphasia was subjected to a hierarchical cluster analysis (complete linkage). In this procedure a variable is added to a group if its similarities are closer to all the elements of that cluster than to any other existing cluster. If the construction is valid then the
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
697
Figure 3. Smallest space analysis for items for subparts of the subtest repetition (two dimensional representation of 3±D solution. Alienation coe¬cient 0.19). Numbers (and subsequent ®gs) refer to subpart item numbers.
intercorrelations between item sets deemed similar linguistically should be higher than ones considered unrelated. The linkages that emerged for the 21 EAAT parts are summarized in ®gure 1. Numbers at the ends of branches represent the subparts of subtests listed in table 1. The major division lies between subtests that require some form of spoken expression and those requiring comprehension. Within these, major branching separates repetition from naming, and auditory comprehension (TT and word and sentence comprehension) from reading comprehension and written language (composing words}sentences with blocks and writing to dictation). Reading aloud is shown as close to naming, and repeating single sounds assumes a somewhat independent position. These latter branches are considered further in the discussion section. In general construct validity is justi®ed through these results. Construct validity was also examined through the pattern of intercorrelations between results of the 120 aphasic speakers on the main subparts of the EAAT using nonmetric multidimensional scaling (smallest space analysis). The outcome is illustrated in ®gure 2. The map of the two dimensional solutions for the smallest space analyses clearly illustrates the coalition of item sets with each other. For this and the other smallest space solutions in ®gures 3±6 the measures of ®t, expressed by the alienation coe¬cients (Borg and Lingoes 1987 ; the closer to 0 the better) all represent
N. Miller et al.
698
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
(a)
(b )
Figure 4A. Figure 4B.
Smallest space analysis for items of subparts of the subtest written language (two dimensional solution. Alienation coe¬cient 0.19). Smallest space analysis of the `superitems ’ of the subtest written language (2 dimensional representation. Alienation coe¬cient 0.16).
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
699
Figure 5. Smallest space analysis for items of the subparts of the subtest naming (®rst two dimensions of 3±D solution. Alienation coe¬cient 0.23).
acceptably good levels. In ®gure 2 the TT sections occupy a fairly central position, with other subtests emanating from them in a wedge-like fashion. Reading aloud still falls within the space occupied by the naming tasks. There is a clear separation between reading and auditory comprehension. This time repetition of sounds falls within the space occupied by the other repetition tasks, although somewhat separated from the main cluster. This can be viewed in more detail on the two dimensional projection of the three dimensional analysis in ®gure 3. Repetition of sounds and of sentences occupy relatively independent spaces, while there is overlap of items from repetition of single and multisyllable words on the one hand and multisyllable words and words of increasing bound morphemes on the other. The smallest space analysis for written language ( ®gure 4a) illustrates the separation of reading aloud from other tasks. In each of the subparts there is a general trend to a separation between producing words versus phrases, most obvious in the composing words}phrases item set. The division of words from phrases is con®rmed in the smallest space analysis of `superitems ’ ( ®gure 4b), which shows words and phrases occupying clearly separated spaces. The ten written language aggregate items, or `superitems ’, were created by summing up the scores per person of the equivalent items across each of the three written language subtestsÐi.e. `superitem ’ 1 derives from adding scores of item 1 of reading aloud, plus item 1 of composing words, plus item 1 of writing to dictation. Superitem 2
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
700
Figure 6.
Smallest space analysis for `super items ’ from the subtest comprehension (two dimensional solution. Coe¬cient of alienation 0.14).
is derived from subpart items 2 2 2, superitem three from 3 3 3 etc. The only exceptions to the general trend of increasing complexity across the item sets involves compound nouns (e.g. toothache cure) appearing more complex than the simplest phrase (e.g. she wants my radio). The smallest space analysis for naming portrayed in ®gure 5 indicates that sentences and colour words behave as clusters on their own, but there exists overlap of items from simple and compound nouns. Finally, for comprehension smallest space analysis (®gure 6) con®rms the separation of reading versus auditory comprehension and the items deemed simple (1±5) versus those considered more complex (6±10).
Increasing linguistic complexity The parts within each subtest of the (E)AAT are designed to be of increasing complexity, and, by implication, di¬culty. To establish if this construct principle was valid a series of analyses was conducted. Figures 7±11 graph the mean scores for each of the syndrome and control groups per subpart of the diåerent subtests. Apart from colour naming being signi®cantly easier (1% level) for speakers from the Wernicke’s aphasia group (and non-
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
701
Symbols for speaker groups for Figures 7–11.
Figure 7. Mean error totals of aphasia and non-aphasic speaker groups on the 5 sections of the token test.
signi®cantly easier for speakers from the global and amnesic groups) the di¬culty progression across item sets within subtests for the aphasic speakers is clear. To ascertain which diåerences between successive item sets in a subtest were signi®cantly diåerent for each of the main syndrome groups, the sequentially rejective multiple test procedure of Ryan-Einot-Gabriel-Welsch (see Westfall and Young 1993 : 87) was employed with strong familywise type I error control (Westfall et al. 1993), using permutation tests for the k-dependent problem (Petrondas and Gabriel 1983, Willmes 1987). Table 6 illustrates the outcome. Item sets are ordered according to increasing mean di¬culty (scores). Sets linked with an underline do not diåer signi®cantly (1% level) from each other. The gradation of di¬culty for items within subparts of subtests, in terms of mean scores per item for the 120 speakers with aphasia combined, can be seen in ®gures 12±16.
Diåerential validity To be diåerentially valid the EAAT should separate speakers with aphasia from those without aphasia and, within the aphasic groupings, distinguish speakers with diåerent syndrome pictures. Nonparametric discriminant analyses (Program ALLOC, Hermans et al. 1982) were employed to examine whether the diåerential validity claims of the EAAT are justi®ed.
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
702
Figure 8. Mean raw scores of aphasic and non-aphasic speaker groups on the 5 sections of the repetition subtest.
Figure 9. Mean raw scores of aphasic and non-aphasic speaker groups on the 3 sections of the written language subtest.
Aphasia-no aphasia All speakers with aphasia (n ¯ 135) were compared to all without aphasia (n ¯ 93). In a stepwise analysis the statistical procedure selected the TT and the subtest repetition as strongest discriminants. Classi®cation based solely on the scores from
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
703
Figure 10.
Mean raw scores of aphasic and non-aphasic speaker groups on the 4 sections of the naming subtest.
Figure 11.
Mean raw scores of aphasic and non-aphasic speaker groups on the 4 sections of the comprehension subtest.
these two subtests is shown in table 7. A distinction is shown in the table between highly probable assignments with a posterior classi®cation probability & 80% and doubtful classi®cations (? in table) with a probability of 50% to 80%. The overall classi®cation rate in accordance with the clinical classi®cation is 93.9 % including doubtful assignments. Entering all the subtests into the equation gives the results in table 8. Spontaneous language ratings were not entered into the test because the uniform maximum scores of all groups without aphasia rendered the scales unuseful for discriminatory purposes. The distinction is again made between highly probable and doubtful classi®cations. Including doubtful assignments, the overall agreement of the classi®cation rate with the clinical group assignment is 93.9 %.
N. Miller et al.
704
Table 6. Diåerences in di¬culty between item sets per subtest of the EAAT for each speaker group. Scores on item sets linked by an underline do not diåer signi®cantly. Numbers relate to subparts of subtests listed in table 1 Speaker group
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Global Wernicke Broca Amnestic CNS no aphasia Hospital, no CNS Healthy speakers
Token Test 1 1 1 1 1 1 2
2 2 2 2 3 2 3
3 3 3 3 2 3 1
4 4 4 5 4 4 5
5 5 5 4 5 5 4
Written L. Global Wernicke Broca Amnestic CNS no aphasia Hospital, no CNS Healthy speakers
Figure 12.
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
Repetition 1 1 1 1 3 3 2
2 2 2 3 1 1 4
Naming 2 2 1 2 1 3 3
1 1 2 1 2 1 2
3 4 3 3 3 2 1
3 3 3 2 2 2 3
4 4 4 4 4 4 1
5 5 5 5 5 5 5
Comprehension 4 3 4 4 4 4 4
1 1 1 1 1 3 3
2 2 2 3 3 2 2
3 3 3 2 4 4 4
4 4 4 4 2 1 1
Mean error score (max 1) per item of the 120 speakers with classi®able aphasia on the 5 sections of the token test.
Aphasia syndrome classi®cation The same non-parametric discriminant analysis procedures were employed to examine the diåerential pattern derived from the EAAT test scores compared to the syndrome assignment pattern derived from the clinicians ’ classi®cation of speakers on the basis of the spontaneous language sample from the semi-structured interview, which had been evaluated according to the external criteria listed in table 3.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
705
Figure 13.
Mean score per item (max 3) of the 120 speakers with classi®able aphasia on the 5 sections of the repetition subtest.
Figure 14.
Mean score per item (max 3) of the 120 speakers with classi®able aphasia on the 3 sections of the written language subtest.
Utilizing the spontaneous language rating scores selected by the discriminant analysis, viz. syntax, communicative behaviour and formulaic language, the assignment to groups of the 120 speakers with a classi®able aphasia and the 15 with unclassi®able aphasia gave the picture seen in table 9. The overall classi®cation rate in agreement with the clinician assignments, including doubtful assignments, was 79.2 %. A slightly lower classi®catory agreement (75.0 %) is attained if all spontaneous language ratings are entered into the equation, as shown in table 10. In tables 9 and 10 a distinction is drawn between highly probable syndrome
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
706
N. Miller et al.
Figure 15.
Mean score per item (max 3) of the 120 speakers with classi®able aphasia on the 4 sections of the naming subject.
Figure 16.
Mean score per item (max 3) of the 120 speakers with classi®able aphasia on the 4 sections of the comprehension subtest.
English language version of the AAT
707
Table 7. EAAT selection properties aphasia/no aphasia based on TT and repetition subtests. Bold type ¯ highly probable assignments (base rate aphasia :no aphasia ¯ 80 : 20). ? ¯ assignments with probability 50%±80%. ncl¯ Nonclassi®able aphasia group ; amn ¯ amnestic aphasia group ; cns ¯ speakers with CNS damage without clinically diagnosed aphasia ; hos ¯ hospitalized speakers with no CNS involvement, nor aphasia
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
EAATÐselection properties based on 2 selected subtests 1. Token tests 2. Repetition ALLOC Classi®cation
Aphasia Aphasia ? No aphasia? No aphasia
Clinical classi®cation Aphasia (n ¯ 135)
No aphasia (n ¯ 93)
126
3 (1 cns, 1 hos) (1 cns, 1 hos) 7 (4 cns, 3 hos) 81
Ð 5 (2 ncl, 3amn) 4 (1 ncl, 3amn)
Table 8. EAAT selection properties aphasia/no aphasia using all 5 subtests. A distinction is made between highly probable assignments (posterior classi®cation probability & 80 %) and doubtful classi®cations (?) (probability of 50 % up to 80%). Abbreviations, see table 7 Clinical classi®cation EAATÐselection properties based on all 5 subtests ALLOC Classi®cation
Aphasia Aphasia ? No aphasia ? No aphasia
Aphasia (n ¯ 135)
No aphasia (n ¯ 93)
129
6 (3 cns, 3 hos) 2 2 (2 hos) 83
Ð 2 (2 ncl) 4 (2 ncl, 2amn)
assignments with a posterior classi®cation probability of & 70 % and doubtful classi®cations (? in the table) with a probability of 25% to 70 %. For the doubtful cases the second most probable syndrome is given as well. The classi®cation matrix in table 11 shows the results using all the spontaneous speech-language scales and the 5 subtests. The distinction between highly probable and doubtful syndrome assignment is again set at & 70 % and 25 %±70% respectively. The second most probable syndrome of assignment for doubtful cases is also illustrated. The overall classi®cation rate in accordance with the clinical classi®cation is 79%, including doubtful assignments. A similar (re)classi®cation rate (79.2 %) is achieved using the three best discriminating parts selected by the statistical procedure, viz. syntactic structure rating scale and TT and naming subtests.
N. Miller et al.
708
Table 9. EAAT syndrome assignment properties for the 4 standard aphasia syndromes, based on 3 selected spontaneous speech rating scales. Bold type ¯ highly probable syndrome assignments (posterior classi®cation probability & 70 %) ? ¯ doubtful classi®cations (probability of 25 % to 70 %). For the doubtful cases the secondmost probable syndrome is given as well. A ¯ amnestic, B ¯ Broca’s, G ¯ Global, W ¯ Wernicke’s aphasia group
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
EAAT syndrome classif. Based on 3 selected spontaneous speech ratings 1. Syntactic structure 2. Communication 3. Formulaic lang. ALLOCClassi®cation"
Global Global? Wernicke Wernicke?
Clinical classi®cation Global (n¯ 30)
Wernicke (n¯ 30)
Broca (n¯ 30)
Amnestic (n¯ 30)
Non-class. (n¯ 15)
25 4 G}B 1
1
4
Ð
1
20 1 W}B 3 W}A Ð 2 B}W 1 B}A 1 1 A}W
1 2 W}B
2 1 W}B 1 W}A Ð 1 B}W 2 B}A 19 1 A}W 3 A}B
2 2 W}A
Broca Broca ?
Ð
Amnestic Amnestic ?
Ð
13 1 B}W 5 B}A 1 3 A}B
1 2 B}W 1 B}A 6
" Base-rate : 25 % for each of the 4 standard aphasia syndromes.
Table 10. EAAT syndrome assignment properties for the 4 standard aphasia syndromes based on a nonparametric discriminant analysis for the six spontaneous speech ratings. Bold type ¯ highly probable syndrome assignments (probability & 70 %) ? ¯ doubtful classi®cations (probability of 25%±70%). For the doubtful cases the secondmost probable syndrome is also given Clinical classi®cation EAAT syndrome classif. Spontaneous speech ratings ALLOCClassi®cation"
Global (n ¯ 30)
Wernicke (n ¯ 30)
Broca (n ¯ 30)
Amnestic (n ¯ 30)
Non-class. (n ¯ 15)
Global Global? Wernicke Wernicke? Broca Broca ?
25 2 G}B 1
Ð 1 G}W 17 3 W}B 1 1 B}W
2
Ð
1
3 1 W}B Ð 2 B}A
1 1 W}B 1 1 B}W
Amnestic Amnestic ?
Ð
1 1 W}B 15 3 B}W 1 B}A 1 6 A}B
18 3 A}W 3 A}B
5 2 A}W 1 A}B
2
2 2 A}W 3 A}B
" Base-rate : 25 % for each of the 4 standard aphasia syndromes.
Table 12 gives the classi®cation using the scores from the 5 subtests only, with the distinction between highly probable and doubtful syndrome assignment set at & 70% and 25 %±70 %. The level of agreement with the clinical classi®cation rate in this instance reaches only 59.2 %.
English language version of the AAT
709
Table 11. EAAT syndrome assignment properties for the 4 standard aphasia syndromes based on a nonparametric discriminant analysis for the six spontaneous speech rating scales and the 5 subtests of the EAAT. Bold type¯ highly probable syndrome assignment (posterior classi®cation probability & 70 %) ? ¯ doubtful classi®cations (probability of 25%±70%). For the doubtful cases the secondmost probable syndrome is given as well Clinical classi®cation EAAT syndrome classif. 5 subtests
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
ALLOCClassi®cation"
Global (n ¯ 30)
Wernicke (n ¯ 30)
Broca (n ¯ 30)
Amnestic (n ¯ 30)
Non-class. (n ¯ 15)
Global Global? Wernicke Wernicke? Broca Broca ?
28
Ð 1 G}W 19 4 W}B 2
Ð
Ð 4
Amnestic Amnestic ?
±
2 1 W}B 19 1 B}W 1 B}A 4 A}B
Ð 1 G}A Ð
1 1
4 1 A}B
1 1 B}W 1 B}A 21 2 A}B
3 2 B}W 3 B}A 4 1 A}W 1 A}B
" Base-rate : 25 % for each of the 4 standard aphasia syndromes.
Table 12. EAAT syndrome assignment properties for the 4 standard aphasia syndromes based on nonparametric discriminant analysis for all 5 subtests. Bold type ¯ highly probable syndrome assignments (classi®cation probability & 70%) ? ¯ doubtful classi®cations (probability 25%±70%). For doubtful cases the secondmost probable syndrome is also given Clinical classi®cation EAAT syndrome classif. 5 subtests
Global (n ¯ 30)
Wernicke (n ¯ 30)
Broca (n ¯ 30)
Amnestic (n ¯ 30)
Non-class. (n ¯ 15)
Global Global?
19 2 G}W 3 G}A 3 2 W}G 1 W}B Ð
4 2 G}W
Ð 1 G}W
1
Ð
4 5 W}B 1 W}A 2 1 B}G 6 B}W 1 2 A}W 2 A}B
3 1 W}G 3 W}A 5 6 B}W 4 B}A 3 4 A}B
3
1 1 W}A
Ð 3 B}A
1 2 B}W 3 B}A 5 2 A}B
ALLOCClassi®cation"
Wernicke Wernicke ? Broca Broca ? Amnestic Amnestic ?
Ð
19 1 A}W 3 A}B
" Base-rate : 25 % for each of the 4 standard aphasia syndromes.
Another way of looking at the discriminatory power of the EAAT is to examine the scores of the diåerent speaker groups on the 6 Spontaneous Language ratings and the 5 subtests. To investigate which subtests signi®cantly diåerentiated speaker groups the permutations test, analogous to a one factorial ANOVA with subsequent multiple pairwise comparisons, guaranteeing strong family-wise type I error control, was conducted according to the Ryan-Einot-Gabriel-Welsch
N. Miller et al.
710
Table 13. Diåerences between standard aphasia syndrome groups and control groups on EAAT subtests (G ¯ Global, W ¯ Wernicke’s, B ¯ Broca’s, A ¯ Amnestic, L ¯ CNS lesion, no aphasia, P ¯ inpatient, no CNS lesion, no aphasia, N ¯ healthy controls). Groups linked with an underline do not diåer signi®cantly from each other Group differences Subtest
Type-I error 1%
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Token test Repetition Written language Naming Comprehension
G G G G G
W W W W W
B B B B B
A A A A A
L L P L L
Type-I error 5% P P L P P
N N N N N
G G G G G
W W W W W
B B B B B
A A A A A
L L P L L
P P L P P
N N N N N
Table 14. Diåerences between standard aphasia syndrome groups (abbreviations see table 13) in spontaneous language ratings. Groups joined by an underline do not diåer signi®cantly from each other Group differences Type-I error 1% Type-I error 5%
Spontaneous language rating scales Communication Articulation and prosody Formulaic speech Semantic structure Phonemic structure Syntactic structure
G G G G G G
W B W W B B
B W B B W W
A A A A A A
G G G G G G
W B W W B B
B W B B W W
A A A A A A
procedure (Westfall and Young 1993 : 87) adapted to permutation tests (Petrondas and Gabriel 1983 ; Willmes 1987). Table 13 summarizes the main results for the aphasia syndrome groups and control groups based on the 5 subtests. Table 14 shows the results using the same statistical test based on the 6 spontaneous speech rating scales. As there was no variability in the scores of the non-aphasic speakers on these scales (they were almost uniformly rated `5’, normalÐsee table 5) they were excluded from this computation. Two diåerent type I error levels (1 % ad 5 %) per subtest} scale are illustrated. The speaker groups are ordered with respect to increasing mean ratings (cf. ®gures 7±11). Groups not diåering signi®cantly are joined by an underline, while all unconnected groups do diåer signi®cantly.
Reliability To ascertain if each of the subtests and parts was measuring the same linguistic variable from speaker to speaker, consistency coe¬cients (Cronbach’s a coe¬cient) were calculated for each subtest and its subparts. Table 15 displays the results of this analysis. With the exception of the ’comprehension ’ subparts when viewed separately, all subtests and parts show a high to very high consistency, demonstrating that the EAAT has a high consistency reliability.
English language version of the AAT Table 15.
Consistency coe¬cients, Cronbach’s a , for subparts and subtests of EAAT
Token test :
Repetition :
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
711
Written Language:
Naming :
Comprehension :
Part 1 Part 2 Part 3 Part 4 Part 5 Token test overall Sounds Single syllable words Multisyllable words Morphologically complex words Sentences Repetition overall Reading aloud (words, phrases) Composing words}phrases Writing words}phrases to dictation Written language overall Pictured objects Colours Pictured compound nouns Pictured sentences Naming overall Auditory, words Auditory, sentences Reading, words Reading, sentences Auditory overall Reading overall Comprehension overall
0.904 0.906 0.908 0.915 0.911 0.976 0.957 0.961 0.960 0.961 0.971 0.989 0.970 0.969 0.968 0.948 0.950 0.955 0.951 0.963 0.986 0.724 0.788 0.862 0.871 0.849 0.924 0.939
Inter- and intra-rater reliability The only published version of the AAT for which intra- and inter-rater reliability has been intensively studied is the original German version (Huber et al. 1984). The rationale for not conducting a separate full scale scorer reliability study here is taken up in the discussion below. The details of the German reliability study (Huber at al. 1984) are as follows. Two user groups participated in an inter-judge reliability study for the spontaneous language rating scales. They were 29 speech-language therapy students in their third year of study who had been given an introduction to the scoring system of the AAT but not previously carried out testing themselves, and 20 quali®ed speech language therapists who had approximately 6 months experience of test use. Each student rater listened to around 4 minutes audio recording of the initial interview for 12 speakers from each of the four main syndrome groups (i.e. each listener rated 48 tapes), quali®ed therapists listened to 6 from each group (each listener judged 24 tapes). Results were analysed according to Krippendorå’s (1970) method for determining levels of agreement for each of the rating scales and the source of any observed variance in terms of systematic diåerences in scale use, and unexplained residual} random error. Mean percentage agreement between raters and sources of error for the two judging groups are given in table 16.
N. Miller et al.
712
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Table 16.
AAT Interrater agreement for scoring spontaneous language scales. Figures represent mean percentages (Krippendorå’s model 1970)
`Experienced ’ rater % agreement Systematic error Unexplained residual error Student rater % agreement Systematic error Unexplained residual error
Table 17.
Communicative behaviour
Articulation
Formul. Lang. Semantics Phonology Syntax
82.8
74.5
74.8
80.2
76.8
85.9
3.8 13.3
8.5 17.1
9.8 15.5
5.5 14.4
7.0 16.1
2.8 11.3
76.4
68.0
66.2
72.2
74.8
79.2
9.3 14.3
11.8 20.2
14.5 19.3
10.1 17.7
8.3 16.8
5.9 14.8
AAT Interrater agreement for AAT subtest totals. Figures represent mean percentages (Krippendorå’s model 1970)
Inter-rater agreement Systematic error Unexplained residual error
Repetition
Written lang.
Naming
98.7 0.3 1.0
99.6 0.1 0.3
99.1 0.2 0.7
Interrater agreement for the subtests was examined on the basis of 48 (12 speakers per main syndrome) unscored tests, by 18 students for repetition and 16 for written language and naming. Agreement for the TT was 100 % as the rater only has to count the number of correct} incorrect pointings. In the comprehension subtest it was also 100 % since here too the scorer has only to judge whether the person pointed to the correct target, the distracter or other picture. The mean percentage agreement levels obtained for the subtests repetition, written language and naming, together with sources of error according to Krippendorå’s process, are shown in table 17. Scores all represent very high levels of agreement. Test-retest reliability The original German AAT is again the only version that has been subjected to strict test-retest scrutiny. There, forty speakers with aphasia (20 deemed in the acute stage, i.e. ! 4 mths. duration ; 20 deemed chronic stage, i.e. " 4 mths duration) were retested by the same examiner within two days. The tests were scored by a third examiner, `blind ’ to speaker, duration and test or retest, to establish possible learning, training or retention eåects. For the 240 ratings (40 speakers¬ 6 scales) on the spontaneous language rating scales 24 ratings (10 %) showed a one point improvement, 18 (7.5 %) a deterioration of one point. There were no signi®cant diåerences between acute and chronic groups. Rank correlation coe¬cients for the test-retest scores of speakers on the AAT subtests are illustrated in table 18. The coe¬cients show acceptably high
English language version of the AAT Table 18.
AAT Test-retest reliability using Spearman rank correlations
Subtest
Aphasia duration ! 4 mnths
Token test Repetition Written lang. Naming Comprehension * Diåerence signi®cant at p !
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
713
0.97 0.99 0.96 0.96 0.86*
Aphasia duration " 4 mnths 0.96 0.92 0.96 0.97 0.97*
0.05 Wilcoxon two sided.
agreement. Two sided Wilcoxon tests for diåerences in ranks showed diåerences beyond chance only for comprehension, where there was signi®cant but small improvement for the acute group and small but signi®cant deterioration in the chronic group. Discussion The following considers the results in the light of criteria claimed to be necessary for a valid and reliable test of language functioning in aphasia (Spreen and Risser 1998), to gauge whether the EAAT ful®ls the requirements of a psychometrically sound test. Discussion centres on elements of test construction, reliability and discriminatory power. Construct validity The linkages revealed between the subtests and subparts using the complete linkage hierarchical cluster analysis ( ®gure 1) support the validity of the linguistically motivated construction of the EAAT. For the claim to be valid intercorrelations between item sets deemed to be linguistically similar should be higher than ones considered unrelated. This was in general found. The main division concerned a split between essentially receptive and expressive tasks. Two exceptions were composing words and phrases and writing to dictation which correlated more highly with the receptive tasks. Within the overall expressive branching, repeating sounds clusters with the naming tasks rather than the other repetition item subsets. It is reasonable to interpret the clustering of the two written language subsets with the receptive tasks, and more speci®cally with reading comprehension. This does not re¯ect a major deviance from expected linkages. It is interpreted as a re¯ection of the degree of comprehension required to accomplish}monitor performance in these tasks, arguably more likely in the English version of the AAT where phoneme-grapheme correspondence is less transparent in comparison to the Italian, German and Dutch counterparts. Items become too long to achieve successfully through any direct auditory input-graphemic output route that may exist, without the facilitatory eåect of deeper processing. The linkage of repetition of sounds to the naming tasks, albeit at a relatively high node, rather than the repetition of words is less clearly accounted for. The
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
714
explanation probably lies in the direction of the relatively independent nature of this task (in the German, Dutch and Italian versions it also assumes a position separate from other repetition subsets) combined with an artefact of the levels of correlation with the bordering subsets. The illustrations of the two dimensional solutions for the smallest space analyses also support the claim of high construct validity. Subtests deemed diåerent linguistically occupy clearly separated regions in space. The one exception remains reading aloud which falls within the space occupied by the naming tasks. The stronger in¯uence of spoken lexical output factors, in particular the inconsistent nature of English grapheme-phoneme correspondence, on this written language task would appear to be the factor placing this subset apart from the written output tasks in the subtest to which reading aloud belongs. A similar ®nding exists for the Thai-AAT (Pratrichpukdee et al. submitted) where there is a more opaque relationship between written and spoken word-forms, in contrast to more transparent spoken-written form relationships in Italian, German and Dutch. The smallest space analysis for naming indicates that sentences and colour words behave as clusters on their own, but there exists overlap of items from simple and compound nouns. This may re¯ect a diåerence compared to the German original in the productivity of the compound noun class in English versus German, and} or a suggestion that compound nouns are processed diåerently by English speakers (Hittmair-Delazer et al. 1994). Increasing complexity A design feature of the (E)AAT is that successive subparts in subtests should be of increasing complexity} di¬culty. Comparisons of mean scores for the separate sections in each subtest ( ®gures 7±11; table 6) demonstrate that this design principle is realized. Within each subtest the earlier subparts are easier than the subsequent ones, though diåerences do not necessarily always reach statistical signi®cance for the separate groups with aphasia. There was a predictable ceiling eåect for the control speakers without aphasia, so that no diåerences in sensitivity to di¬culty emerged for them. There were no ¯oor or ceiling eåects for the groups of speakers with aphasia, indicating that the EAAT is sensitive across the broad range of performance levels for such speakers. Following expectations, in the TT more errors are produced with succeeding subsections ; in repetition sounds are easier to repeat than single syllable words, in turn these are easier than multisyllable words and sentences. Reading words and phrases is easier than composing words and this is easier than writing them to dictation in the written language subtest. Within this subtest, the ’superitem ’ analysis (®gure 4b) illustrated that words were easier than compound words and phrases. In the naming subtest naming single nouns is easier than compound nouns, they in turn are easier than sentences. In the comprehension subtest auditory comprehension is easier than reading comprehension, single words in general pose less problems than sentences. The only clear exception to the general increasing complexity criterion comes from colour naming. It receives higher mean scores than naming objects for Global, Wernicke’s and Amnestic aphasia groups, though the diåerence only reaches signi®cance for the Wernicke’s aphasia group. This might be explained on the basis of colour naming involving a closed class of words with less scope for
English language version of the AAT
715
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
semantic deviation than for objects. Colour naming was incorporated into the AAT to pick out speakers with the syndrome of pure alexia and colour naming disturbance, believed to arise from lesions of the visual cortex in the language dominant hemisphere when the commisural connections between the intact visual association and projection regions of the non language-dominant and the language regions of the dominant hemisphere are disturbed. In the repetition subtest the diåerence between single complex syllable word and single multisyllable word repetition is not statistically signi®cantly diåerent. The items here are either not sensitive enough to pick out diåerences in length versus complexity eåects, the scoring conventions are not tuned to pick out such nuances, and} or individual dissociations between the two are lost in the group data, or the hypothesis of a length-complexity contrast is not valid and these data con®rm that. Reliability To be reliable tests should show high internal consistency. Subparts within subtests of an assessment claimed to tap a given behaviour should be shown to all relate to that variable, as should items within the subparts. The results from the calculation of Cronbach’s a indicate that all subtests and subparts of the EAAT (with the exception of the comprehension subparts) have a very high internal consistency. The language comprehension subtest overall shows an acceptably high consistency coe¬cient. The ®nding of lower levels in the individual subparts may stem from several sources. The subparts are each designed to screen a variety of quite diåerent linguistic regularities, in contrast to the more homogeneous nature of subparts in other subtests. Several of the items require a degree of metalinguistic processing untypical of items elsewhere in the test. The lower coe¬cients may relate to the diverse nature of comprehension processes and the more diåuse ’localization ’ of comprehension in the brain. Inter rater reliability and test retest reliability As remarked in the results section, no separate full scale tester reliability study was conducted for the EAAT. In that respect this study represents an incomplete validation and a con®rmatory investigation is awaited to establish that claims for the German original are indeed applicable to the EAAT. However, we argue that test users may nevertheless proceed in the meantime in the con®dence that tester reliability for the EAAT attains the level required of a standardized test. We maintain this on the following grounds. Firstly, the AAT and its adaptations into other languages are all constructionally highly similar. The few diåerences that do emerge are unlikely to in¯uence speaker performance pro®les to a signi®cant degree. More pertinently, the scoring criteria themselves are identical across languages and they operate independently of features speci®c to diåerent languages. Scoring criteria relate rather to the quality of the response in terms of the speaker’s performance (e.g. self correction, hesitant) and the nearness of the response to the target, again de®ned in language independent terms, as opposed to factors dependent on constructional properties. Even if one were to contend that minor constructional divergences may alter performance, we put forward that this does not undermine the tester reliability
N. Miller et al.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
716
aspects, since the factors that relate to scorer reliability, i.e. the scoring conventions, remain unaltered across AAT versions. Hence the performance of clinicians who participated in this validation should not be expected to be signi®cantly diåerent to their German, Italian, Dutch and Thai colleagues. A further factor argued to support transferability of ®ndings from the German AAT reliability studies to the EAAT concerns the rigour with which the AAT was tested. The clinicians used in the study were novices and people without extensive experience of using the AAT, not members of the investigation team. They based their judgements on audio recordings and written transcriptions of responses only, without any visual or situational cues to performance. The statistical power of the procedures used for analysis were su¬cient to detect even slight variations in rater reliability. We maintain that this stringency if anything would produce an underestimation of interrater agreement rather than overestimation, giving nonboosted coe¬cients of agreement that should be easily accomplished by average users. The same arguments are claimed as valid for omitting a full test-retest study. Firstly, the constructional properties of the EAAT and AAT are so similar that one might expect the AAT to provide a reasonable estimate for other languages. Secondly, the test scoring conventions are identical and independent of language speci®c variables. Thirdly, the strictness of the original study was su¬cient to support the view that the results there did not represent an underestimation of performance. Even in the comprehension subtests where some signi®cant diåerences were found, test-retest match was high. The fact that both improvement and deterioration of scores were found suggests that diåerences may be interpreted as arising from ¯uctuations of attention rather than other eåects. In arguing that the German ®ndings can be generalized to the EAAT, we argue also that users should be con®dent that, even over the short period of two days, learning, training or retention eåects are not found when using the test. A separate intra-tester reliability study was not carried out on this occasion, once more for similar reasons to above. Additionally, we would contend that if rater reliability is going to be poor, this would manifest itself more strongly in interrather than intra-rater divergences. We contend that the highly reliable inter-rater ®ndings for the AAT are generalizable to the EAAT, and on this basis maintain that intra scorer reliability should also be acceptably high. Discriminatory power Two dimensions for the EAAT’s ability to separate out groups of speakers were examinedÐdivision of speakers with and without aphasia and diåerent aphasia syndrome groupings within the sample of speakers with aphasia. When comparing the 135 speakers with aphasia to the 93 without, the whole test achieved a 94% separation, the same as using the TT and repetition sections alone. Unlike many standardization samples the present study included three groups likely to challenge test sensitivity within the borderline regions between aphasia-no aphasia. The group with non-classi®able aphasia consisted largely of people who evidenced mild or isolated language disturbances, both from their EAAT scores and on clinical impression. The group with CNS damage but no aphasia would pick out those parts of the test that were susceptible to impairment from the generalized eåects of brain}CNS damage as opposed to the more localized eåects
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
717
believed to lead to speci®c language disruptions. Similarly the control group of hospitalized speakers without CNS involvement would highlight any performance variables linked to the overall stresses of being ill, staying in hospital and being subjected to a test. This represents a strength of the (E)AAT compared to many other aphasia batteries and makes the distinction between aphasia-no aphasia both more realistic and clinically more meaningful, illustrating that there is a true grey area between the realms of de®nitely not aphasic and de®nitely aphasic. Six of the 93 speakers without aphasia were misclassi®ed on the basis of subtest performances with a 80% probability, two more with a 50 %±80% probability. Three of these speakers belonged to the neurologically impaired without aphasia group and ®ve to the hospitalized but without neurological involvement group. It may be the case here that for some people test performance was in¯uenced by illness and hospitalization. Amongst the neurologically impaired but non-aphasic participants, although they were screened for the presence of attention, memory, perceptual and praxic di¬culties, the screening procedure may have not detected some subtle eåects that nevertheless in¯uenced test performance. Alternatively, or as well, the three speakers ’ scores may be in¯uenced by the stress of illness. While at the 1 % con®dence level all control groups scored signi®cantly better than the aphasic groups (with the exception of amnestic speakers not being diåerentiated on comprehension from non-aphasic neurologically impaired) table 13 shows that at the 5 % con®dence level the healthy group scored signi®cantly better than the hospitalized and neurologically impaired groups on the token, repetition and comprehension tests. This too illustrates the bene®ts of including these control groups and shows possible implications for the setting of cut-oå scores. To be considered de®nitely aphasic (with the EAAT) the cut-oå score should be taken as the lowest performance of all the non-aphasic groups rather than the lowest score of the healthy non-hospitalized speakers. It is conceivable that other aphasia tests that have not been standardised employing control groups between entirely healthy, non-hospitalised and neurologically impaired and aphasic falsely identify speakers as aphasic who have depressed scores for these more general reasons. Results indicated that there were no signi®cant correlations between time since onset of aphasia and test results, neither did years of education or gender signi®cantly aåect scores. There were some signi®cant correlations for the aphasic sample as a whole between age and some subtests. However, none of these correlation levels indicates a degree of relationship with age that suggests age explains a signi®cant level of the variance in scores. Taken together these observations suggest that EAAT scores are not biased by age, duration of aphasia, years of education or gender variables. Additionally, the fact that none of the nonhospitalized, neurologically healthy speakers, some of whom were towards the upper age limit for inclusion (75 years), and had minimum years education, was reclassi®ed even as a borderline aphasic case, also suggests that EAAT scores are not strongly biased by educational level or age. An aim of the initial piloting was to ®nd words and sentences for the test items that were agreed on across all the dialect} accent areas represented by the referring clinics, excluding items where there was marked regional variation or where they were simply unknown in a region. Where pronunciation is a criterion for normalnonnormal performance judgement, the scoring conventions direct that evaluation is made in relation to the person’s (expected) premorbid pronunciation, not the
N. Miller et al.
718
standard language. In this far dialect variation should not be a factor in¯uencing test scores. A systematic study, however, of this extremely complex question was beyond the scope of this investigation.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Syndrome group assignment Within the sample of people with aphasia there were 120 identi®ed on the basis of clinical impression who ®tted into one of the four main aphasia syndrome categories recognized (on language performance criteria) by the (E)AAT. Predictably, when only the quantitative scores from the ®ve subtests were entered into the discriminant analysis, the assignment match with the clinician ratings was relatively low (60 %). The explanation is doubtless that speakers may attain the same quantitative score for a variety of qualitatively diåerent reasons, and these are not re¯ected in raw scores for individual item sets. By contrast, using the best discriminating spontaneous language scales (syntax ; communicative behaviour ; automatic speech), where qualitative aspects of performance enter decision making, classi®cation was much higher (79 %). Where misclassi®cations occurred, these could generally be understood aphasiologicallyÐe.g. severe Broca’s aphasia has many a¬nities with global aphasia ; the dividing line between severe anomia and Wernicke’s aphasia can be very blurred. Using just the three spontaneous language scales, less straightforwardly explainable disagreements arose between group assignment by clinicians on the basis of the external criteria compared to groupings based on the discriminant analysis. Five individuals assigned on the external clinical criteria to the Wernicke’s group were classi®ed by the discriminant analysis into the Broca’s aphasia groupÐalthough on a group basis the two sets of speakers remained signi®cantly divided at the 5 % level. This misclassi®cation ®gure sank to two when all spontaneous language scales were entered. Nevertheless, such a reclassi®cation raises some questions. Several explanations for the apparent mismatch might be advanced. Firstly, while the external criteria are based on summaries of numerous studies into the surface speech-language characteristics of the aphasia syndrome groups and hopefully strongly re¯ect clinician impressions, they do not represent a `gold standard ’. Indeed it is di¬cult to envisage what might be taken as an external gold standard reference point for syndrome assignments. Therefore it is unsurprising that disagreement would exist between clinicians ’ overall impressions and discriminant analyses of test scores. Turning the point on its head one might remark that it is surprising that there would be such a high degree of relationship between clinician impressions from a spontaneous language sample and a discriminant analysis of formal test scores. This might be argued as further support for the constructional validity and soundness of the (E)AAT. Another explanation for the mismatch in classi®cations may be seen in the fact that using all the spontaneous language scales achieved a lesser misclassi®cation. This hints at the importance of considering articulation and prosody output in distinguishing Broca’s versus Wernicke’s groupings. This was the only scale on which they were statistically signi®cantly separated at the 1 % level (though at the 5 % level the groups as a whole were divided by the syntax scale). A further source of misclassi®cation may lie in the nature of the clinical rating
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
English language version of the AAT
719
scales. Studies of rater reliability with the AAT point to high agreement. Nevertheless, the AAT method of syndrome assignment has not been without criticism (de Jonge et al. 1996). On occasions there can be poor agreement between clinician judgements based on the spontaneous language rating scales and the results of the ALLOC classi®cation (Hermans et al. 1982). A crucial variable according to de Jonge et al. (1996) appears to be in particular the syntax rating. A variation in one point may swing the syndrome classi®cation according to ALLOC to another syndrome. While this may have consequences for studies utilizing (E)AAT derived syndrome groups, the criticism was not seen as detracting from the strength of the (E)AAT in giving a valid and reliable picture of an individual’s language performance. Graetz et al. (1991) have also, within the debate on single case versus group descriptions, raised the argument of the importance or not of being able to assign speakers to a speci®c syndrome category. For them the main aim of the AAT is to deliver insights into an individual’s linguistic strengths and weaknesses for the purpose of guiding further assessment and instigating therapy. The syndrome assignment issue is not seen to compromize the (E)AAT’s ability to do this. When all the subtest scores and the spontaneous language ratings were combined a very acceptable level of reclassi®cation (79 %) was achieved. This suggests that the (E)AAT can be used as a valid and reliable way of separating speakers into linguistic syndrome groups for large scale studies or reviews of clinical pro®les, with the provisos mentioned regarding borderline cases between Broca’s and Wernicke’s type aphasia. Another way of viewing discriminatory power that was considered, concerned whether and in which ways clinically established syndrome groups diåered on the diåerent rating scales and subtests. The results illustrated in tables 13±14 show that the subtests and rating scales reveal signi®cant diåerences amongst groups. In particular there is complete separation between the control and aphasic groups on all subtests except comprehension, where there is overlap between the amnestic group and the control group of neurologically impaired speakers without aphasia at the 1 % level of signi®cance, though not at the 5 % level. This overlap presumably re¯ects the good comprehension level of (mild) amnestic speakers. Within the groups of speakers with aphasia there is a non-signi®cant diåerence between the speakers with Broca’s and Wernicke’s aphasia on the TT and repetition subtest. The former may be in¯uenced by the proportion of speakers with severe Broca’s aphasia who have more marked comprehension di¬culties and milder Wernicke’s aphasia cases where comprehension is not so severely aåected. While these two groups are not separated on quantitative scores of the repetition subtest for their spoken output, the qualitatively based articulation and prosody scale in the spontaneous speech-language ratings does separate them signi®cantly, even at the 1 % level. Finally, face validity was not a factor addressed speci®cally in the standardization study. However, in earlier pilot versions and during the normalization process participants were invited after the test to pass any comments they felt relevant regarding the content and presentation of the EAAT. Apart from some comments on pilot versions that resulted in the elimination of some ambiguous pictures there were no criticisms from participants questioning the suitability of the EAAT as a language test. Neither were there any comments concerning the inappropriateness or unacceptability of materials for adult speakers across the age range.
N. Miller et al.
720
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
Conclusions Results from this study lead us to conclude that the EAAT amply meets criterion levels for a psychometrically sound and sensitive tool for the detection and description of aphasia, and measurement of degree of severity of impairment across the range of modalities and linguistic levels covered by the test. While some weak links were pinpointed in the reclassi®cation of speakers on the basis of their raw test scores compared to clinical judgement ratings, which may be a factor to take into consideration when conducting large scale studies where speakers are assigned to the diagnostic groups recognized by the EAAT, this in no way compromizes the strength of the EAAT to deliver a robust description of individual speaker performance on a test with high internal consistency, strong construct validity and high discriminatory power, qualities that have been established for but a few of the many existing aphasia examinations. A full-scale rater reliability and retest study was not carried out for the EAAT. Rigorous studies were conducted with the German AAT and reliability found to be high. We argue that because the AAT and EAAT use identical scoring conventions, and their constructional properties are extremely close, we are justi®ed in our assumption that one can generalize reliability ®ndings across versions and in turn our contention that reliability of the EAAT is likewise acceptably high. Furthermore, we maintain that the EAAT provides the basis for a clinically and experimentally valuable test that should not only be able to detect diåerences and changes in performance across and between subtests, but should be sensitive to changes within them. We base this on the ®ndings of high internal consistency of subparts and subtests, absence of ¯oor and ceiling eåects in the aphasic sample, demonstrable gradation of di¬culty across subparts within the subtests, trends towards gradation of di¬culty of items within subparts and the construction of the scoring system, which takes into consideration the quality of responses rather then relying on a strict right-wrong dichotomy. The EAAT also permits a comparison of spontaneous language performance along a range of dimensions with formal test scores in related areas. The strength of the AAT as a clinical tool has been already demonstrated (e.g. Willmes 1985 ; Poeck et al. 1989 ; Huber et al. 1997). This study con®rms that the EAAT should be able to deliver equal service in single case or case series studies. The fact that the EAAT has been shown to be comparable structurally and psychometrically to the versions in other languages, places it in an ideal position for use in the conduct of cross-language and bilingual studies.
Acknowledgements We gratefully acknowledge guidance and helpful suggestions from W. Huber, K. Poeck and D. Weniger, authors of the original German version. Thanks go to F. Stewart who assisted with the initial pilot versions of the EAAT and later data collection. We are thankful too to the following speech language therapists who also kindly supplied tests : A. Cameron, T. Catcherside, C. Davison, J. Douglas, C. Finlayson, M. Goodger, J. Goodson, C. Heåer, F. Kevan, E. Khairuddin, R. March, M. Metcalf, C. O’Neill, J. Roberts, M. Robinson, L. Rodriguez,
English language version of the AAT
721
F. Wendon and N. Woodyatt. The research for this project was supported in part by UCB Pharma, Belgium.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
References AERA-APA-NCME. 1985, Standards for educational and psychological testing, American Psychological Association, Washington DC. Borg, I. and Lingoes, J. 1987, Multidimensional Similarity Structure Analysis (New York : Springer). Crary, M., Wertz, R. and Deal, J. 1992, Classifying aphasias. Cluster analysis of Western Aphasia Battery and Boston Diagnostic Aphasia Examination. Aphasiology, 6, 29±36. Davis, A. 1993, A Survey of Adult Aphasia, 2nd ed. (Englewood Cliås : Prentice Hall). de Jonge I., van der Sandt-Koenderman, M. and van Harskamp, F. 1996, Afasiediagnostiek met de akense afasietest. Stem- Spraak- en Taalpathologie, 5, 89±104. De Renzi, E. and Vignolo, L. 1962, The Token Test. Brain, 85, 665±678. Edgington, E. 1995, Randomization Tests, 3rd ed. (New York : Marcel Dekker). Goodglass, H. and Kaplan, E. 1972, Boston Diagnostic Aphasia Examination (Philadelphia : Lea Febiger). Graetz, P., De Bleser, R., Willmes, K. and Heeschen, C. 1991, De Akense afasie test. Logopedie en Foniatrie, 63, 58±68. Graetz, P., De Bleser, R. and Willmes, K. 1992, Akense Afasie Test. Nederlandstalige Versie (Lisse : Swets and Zeitlinger). Habbema, J., Hermans, J., van den Broek, K. 1974, A stepwise discriminant analysis program using density estimation. In G. Bruckmann (Ed.) Compstat 1974, Proceedings in Computational Statistics (Wien : Physica Verlag). Hermans, J, Habbema, J., Kasanmoentalib, T. and Raatgever, J. 1982, Manual for the ALLOC80 discriminant analysis program, Dept of Medical Statistics, University of Leiden, Netherlands. Hittmair-Delazer, M., Andree, B., Semenza, C., De Bleser, R. and Benke, T. 1994, Naming by German compounds. Journal Neurolinguistic s, 8, 27±41. Huber, W., Poeck, K., Weniger, D. and Willmes, K. 1983, Aachener Aphasie Test (Goettingen : Hogrefe). Huber, W. Poeck, K. and Willmes, K. 1984, The Aachen aphasia test. Advances in Neurology, 42, 291±303. Huber, W., Willmes, K., Poeck, K, Van Vlymen, B. and Deberdt, W. 1997, Piracetam as an adjuvant to language therapy for aphasia: a randomized double blind placebo controlled pilot study. Archives of Physical Medicine and Rehabilitation, 78, 245±249. Krippendorff, K. 1970, Estimating the reliability, systematic error and random error of interval data. Educational and Psychologica l Measurement, 30, 61±70. Lincoln, N. 1988, Using the PICA in clinical practice: are we ¯ogging a dead horse? Aphasiology, 2, 501±506. Linebaugh, C. 1979, Assessing the assessments: the adequacy of standardized tests of aphasia. In R. Brookshire (Ed.) Clinical Aphasiology (Minneapolis: BRK). Lingoes, J. 1973, The Guttman-Lingoes Nonmetric Program Series (Ann Arbor: Mathesis Press). Luzzatti, C., De Bleser, R. and Willmes, K. 1991, L’Aachener Aphasie Test (Firenze: Organizzazioni Speciali). Martin, A. 1977, Aphasia testing : a second look at the Porch Index of Communicative Ability. Journal Speech and Hearing Disorders, 42, 547±561 Nicholas, L., MacLennan, D. and Brookshire, R. 1986, Validity of multiple sentence reading of comprehension tests for aphasic adults. Journal Speech & Hearing Disorders, 51, 82±87. Orgass, B. 1986, Der Token Test (Weinheim : Beltz). Petrondas, D. and Gabriel, K. 1983, Multiple comparisons by randomization tests. Journal American Statistical Association, 78, 949±957. Poeck, K. 1983, What do we mean by aphasic syndromes. Brain and Language, 20, 79±89. Poeck, K., Huber, W. and Willmes, K. 1989, Outcome of intensive language treatment in aphasia. Journal of Speech & Hearing Disorders, 54, 471±479. Poeck, K., Kerschensteiner, M., Stachowiak, F. and Huber, W. 1975, Die Aphasien. Aktuelle Neurologie, 2, 159±169. Prachritpukdee, N., Phanthumchinda , K., Huber, W. and Willmes, K. 1998, The Thai version of the German Aachen aphasia test: description of test and performance in normal subjects. Journal of the Medical Association of Thailand, 81, 402±412.
Downloaded by [RWTH Aachen University] at 08:39 11 November 2013
722
N. Miller et al.
Prachritpukdee, N., Phanthumchinda , K., Huber, W. and Willmes, K. submitted, The Thai version of the Aachen aphasia test (Thai-AAT). Journal of the Medical Association of Thailand. Skenes, L. and McCaulay, R. 1985, Psychometric review of nine aphasia tests. Journal of Communication Disorders, 18, 461±474. Shye, S. 1985, Multiple Scaling (Amsterdam : North Holland). Spreen, O. and Risser, A. 1998, Assessment of aphasia. In M. Taylor Sarno (Ed.) Acquired Aphasia, 3rd ed. (San Diego : Academic press). Steiger, J. and Scho$ nemann, P. 1978, A history of factor indeterminacy. In S. Shye (Ed.) Theory Construction and Data Analysis in the Behavioural Sciences (San Francisco : Jossey-Bass). Weniger, D., Willmes, K., Huber, W. and Poeck, K. 1981, Der aachener aphasie test : reliabilita$ t und auswertungsobjektivita$ t. Nervenarzt, 52, 209±277. Westfall, P. and Young, S. 1993, Resampling Based Multiple Testing (New York : Wiley). Willmes, K. 1985, An approach to analysing a single subject’s scores obtained on a standardized test with application to the Aachen aphasia test. Journal of Clinical and Experimental Neuropsychology, 7, 331±352. Willmes, K. 1987, Beitra$ ge zur theorie und anwendung von der uni- und multivarianten datenanalyse, PhD Thesis, University of Trier, Germany. Willmes, K. 1993, Diagnostic methods in aphasiology. In G. Blanken et al. (Eds) Linguistic Disorders and Pathologies (Berlin : De Gruyter).