David Canter and Joanne Chester. Department of Psychology, University of Liverpool. ABSTRACT Although there has been detailed criticism of the Cusum ...
From Canter, D. & Chester, J. (1997). Investigation into the claim of weighted cusum in Authorship Attribution Studies. Journal of Forensic Linguistics, 4 (2) 252-261.
Investigation into the claim of weighted Cusum in authorship attribution studies David Canter and Joanne Chester Department of Psychology, University of Liverpool ABSTRACT Although there has been detailed criticism of the Cusum technique for authorship attribution the claims of its proponents are so attractive to the legal profession that any suggestion that the technique can be `improved' requires careful consideration. One such suggestion is that the arbitrary nature of the judgements made of Cusum charts can be removed by `weighting' the calculations of the Cusum values. To test the claims of weighted Cusums three texts from seven different authors were subjected to the weighted Cusum analysis; each of the three groups consisted of twenty-one samples. The results obtained in this experiment showed that only three out of sixty-three texts were identified as being written by more than one author. However, these three texts were derived from singleauthored material. Further, none of the twenty-one multiple-authored texts produced significant results that would have led to the identification of more than one author. Therefore, the weighted Cusum technique did not reliably discriminate between texts of single and multiple authors. KEYWORDS psycholinguistics, Cusum, stylistics, authorship, contested documents
INTRODUCTION The philosopher's stone of forensic linguistics is to be able to demonstrate whether two small samples of text are incontrovertibly the work of one author, or to establish the complementary question that a small section of text is not authentically the product of the person who generated the rest of the text. From time to time there have been claims that a new alchemy has been discovered that will perform these discriminations. By far the most spectacular claims have come for a procedure known severally as `Cusum' or `Qusum'. The Cusum technique was proposed by Morton and Michaelson in the `Qusum Plot' (1990) for the purpose of discriminating between single and multiple-authored texts. They proposed that each individual displays a unique `habit' which can be identified by non-lexical language components within a sentence and/or a combination of these components. Examples of such components include two and three letter words and words beginning with a vowel. The Cusum technique compares, for each sentence, the cumulative summation of the differences from the average of the habit in question with the cumulative summation of the differences from the average of the total number of words in each sentence. The Cusurn analysis utilizes a cumulative sum chart as a means of graphicall y presenting the authors' habit throughout a piece of text. It shows how a series of observed values change with respect to their average. Cusum charts have been presented to several courts of law in cases involving allegedly falsified confessions. Although there is no independent endorsement of the Cusum technique for authorship attribution the possibility offered by its proponents in their one journal publication (Morton & Farringdon 1992) is so inviting that if there were any merit at all in their claims it would be of enormous significance to forensic linguistics. Much as a single parapsychology experiment that clearly demonstrated the scientific validity of telepathy would challenge much of conventional science, so one un challengeable demonstration that Cusum was valid would raise fundamental questions about
language production. It is therefore of value to explore any claims that the Cusuri technique could be made reliable by modification of the statistical analysts on which it is based. One of the most contentious issues is the interpretation of the Cusum chart (Canter 1992). The inspector of a Cusum chart, at an observed point of possible change in the chart, has to decide in a rather arbitrary manner whether the divergence was due to a real shift in authorship or whether the divergence was due to underlying variability in the data. There is no obvious method of testing the significance of a discrepancy in a Cusum chart and the interpretation is based completely on the sub jective decision of the inspector. Hilton and Holmes voice this opinion: `The subjectivity of the technique in evaluating differences between parts of superimposed cumulative sum charts does give cause for concern' (1993: 75). This method of interpretation has caused controversy a mong many other experts in the field who have demanded a rigorous statistical procedure to be incorporated into the process to help to eliminate such subjectivity. For example, Canter introduced the Rho value (Canter 1992) to establish a measure of objectivity in the interpretation of the Cusum chart. In response to this central challenge to the Cusum technique, Cusum proponents have presented an alternative based on what they call 'weight ed' cumulative sums. Weighted Cusum uses sentence length as an explanatory variable and gives unusually large or small samples the correct weighting, thus making the analysis robust to variations in element size. Weighted Cusum is said to embody a cumulation of element sizes, taking into account greater information available over the whole of the sample (Bissell 1990) thus removing the influence of sentence length from the Cusum analysis. Weighted Cusum removes the heavily criticized subjectivity of the chart interpretation as it uses rigorous, well established statistical testing procedures. This method reduces the possibility that a divergence in a Cusum chart materializes from the natural variation within the language data and increases the likelihood that any shifts are in fact due to actual differences in authorship. Hilton and Holmes (1993), in an experiment conducted to test the reliability of the weighted Cusum technique found that it did not yield reliable results. They suggested the reason for this is that authors do not follow habits as rigidly as is required for the Cusum technique to be able to determine authorship correctly. However, helpful as their study was, Hilton and Holmes (1993) focused on novels and similar literary material and concatenated a sample of text from one novel with a sample from another. Thus they did not have any direct-test of whether material made up of the same texts in different ways would be open to discrimination by the weighted cusum technique. Nor did they have any indication of whether less formal or prepared writing would be sensitive to weighted Cusum analysis. Thus, from a forensic point of view, it is important to be absolutely clear as to whether inserted text, such as might be expected to occur within a falsified police statement, can be identified using weighted Cusum analysis. The present study, therefore, set out to examine directly whether the weighted Cusum technique can reliably identify `alien' text inserted into single and multiple-authored texts. MATERIAL Seven different authors each contributed three pieces of text to this investigation. Texts were taken from novels, statements to the police, academic books and police transcripts. This variety was deliberately chosen in order to establish whether the technique could be sensitive for some types of text if not others. There was a total of twenty-one texts, which were divided into three groups. The first group, the single-authored text group, consisted of the texts in their original form. The second group, single authors/inserted text group, consisted of texts where eight sentences from one text were inserted into another text by the same author. The inserted material in the third set of texts was taken from different authors, randomly. All insertions were made from the eighth to the fifteenth sentence of the original text. The
sixteenth sentence of the text was thus the original author again. These three sets of text therefore allowed comparisons of insertions with the natural flow of the texts when insertions would be hypothesized to be non-discriminable because they would be expected to be in the same `style' as they were written by the same author when they were not. The material used was not commissioned or written specifically for this study - it was drawn from extant material. It was believed that this would reduce any chance of attempted falsification. Nor were the authors in contact with each other when material was written. PROCEDURE. All of the prepared texts were subjected to the weighted Cusum analysis, with sentences 8--15 (all insertions were made at this point) in every text being analysed further. According to the Cusum proponents the statistical analysis will discriminate between those texts which are written by one author by yielding a non-significant result, whereas texts which contain sentences written by a different author should yield a significant result. Statistically this is the hypothesis that the divergence in the Cusum chart is due to differences between authors and not due to underlying variations within a single author's writings. To investigate differences between two texts the average variation within each text must be calculated. This is achieved by calculating the expected average of language components for each sentence in the whole text, then the actual occurrence of the language component in each sentence is counted. Deviations of the expected average from the actual occurrence is then summed giving a weighted cumulative output for each text. Having calculated the variation within each text, `t'-tests are calculated to evaluate the statistical degree to which the variations in the two texts differ from each other. (The statistical details are given in the Appendix, pp. 259-61) The `habit' used was two or three letter words. This is the one most commonly used by Morton and his colleagues. However, they give no criteria for deciding which `habit' to use with a particular text, so it may be assumed that although some `habits' may be proposed as more sensitive than others the one most commonly used by Cusum proponents should reveal some differences in the present study. Another `habit' proposed by Cusum proponents is words beginning with a vowel, so calculations were also made using this curious suggestion. The hypothesis being proposed, if the underlying assumptions of the Cusum test are reliable, is that if the probability resulting from the t-test is greater than 0.05, then the insert would have come from the same author, whereas if the probability is equal or less than 0.05 the insert and the rest of the text originate from different authors. The analysis consists of calculating t-tests on every one of the three sets of twenty-one samples of text, sixty-three tests in all for each of the two habits, based on either the two or three letter word component or the initial vowel word component. Taking the 5 per cent level of probability as a generous criterion it would be expected that chance variation alone would produce three significant results. RESULTS Tables la-c show the number of t-tests that met the 5 per cent level for each of the three groups of text. For words starting with a vowel only three of the tests were significant as would be expected if the results were produced entirely by chance. Thus the results show for two and three letter word `habits' there, were no signifi cant differences. Chance based calculations would have produced slightly more significant results. So the statistical Table la File name
Weighted Cusum analysis for single author text group Vowel words
2/3 letter words
Z value
Prob. Signif.
Z value
Prob. Signif.
BABB1
-1.05
0.3
NS
0.25
0.81
NS
BABB2 BABB3 BANKS1 BANKS2 BANKS3 BURNN1 BURN2 BURNN3 FIRE1 FIRE2 FIRE3 MURD1 MURD2 MURD3 PITCH1 PITCH2 PITCH3 SENHIS2 SENHIS3 SENHIS4 Notes:
0.45 0.74 0.3 -0.96 0.70 0.06 0.75 -1.69 1.49 -0.45 1.4 -1.07 0.41 -1.18 0.63 -0.49 0.79 -0.39 -1.18 -1.15
0.65 0.46 0.76 0.34 0.48 0.95 0.46 0.09 0.14 0.65 0.16 0.29 0.68 0.24 0.53 0.62 0.43 0.7 0.24 0.25
NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
-0.59 0.11 -1.86 -0.71 0.64 -0.71 0.74 0.32 1.00 0.07 -0.34 -0.02 -1.24 -0.52 0.04 1.34 -0.78 -1.25 -1.14 0.98
0.56 0.91 0.06 0.48 0.52 0.48 0.46 0.75 0.32 0.95 0.74 0.98 0.21 0.6 0.97 0.18 0.43 0.21 0.25 0.33
NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
BABB is a police transcript of a confession BURNS and PITCH are statements made to the police FIRE3 is from an academic book MURD3 is from a novel SENHIS is a fictional monologue confession
tests are such that weighted cusum produces chance variations. Against that framework it is possible to examine how the results of the main hypotheses could be interpreted and the consequent basis for confusion in studies that do not have controls as in the present study. The single-author text group results (Table la) are all non-significant for both the two and three letter word component and the vowel word component. These results could mistakenly be interpreted to mean that these texts were correctly identified as being the work of just one author. However, by comparison with the other results the error of this assumption is clear. The single author/inserted text group results (Table 1b) reveal a slight ly different pattern. Although the two and three word component results were all non-significant, as would be expected if the claim of Cusum is correct, only eighteen out of the twenty-one vowel words results were non-significant. That means that one in seven of the texts were incorrectly identified by the cusum analysis as having been written by more than one author when in fact they were written by the same author. Table l b File name
Same author/inserted text group Vowel words
2/3
Z I'rob. Signif. value
7 value
BABB12
-0.07 0.94
NS
-0.03
BABB23 BABB31 BANKS 12 BANKS23
-0.17 -2.14 -0.96 0.95
NS S NS NS
0.11 000.680. -0.71 68 0.50
0.87 0.03 0.34 0.34
letter words P r o b. 0. 9 0. 8 9 0. 1 4 0. 9 4 0. 8 6 0
Signif. NS NS NS NS NS
BANKS31 BURNS12 BURNS23 BURNS31 FIRE12 FIRE23 FIRE31 MURD12 MURD23 MURD31 PITCH12 PITCH24 PITCH31 SENHIS23 SENHIS34 SENHIS42
0.41 1.56 -0.84 0.06 -0.45 1.97 1.33 -0.75 -2.07 0.94 -0.49 1.33 1.44 -1.47 -0.88 -0.38
0.68 0.12 0.4 0.95 0.65 0.05 0.18 0.46 0.04 0.35 0.62 0.18 0.15 0.14 0.38 0.71
NS NS NS NS NS S NS NS S NS NS NS NS NS NS NS
-1.81 0.71 0.48 -0.71 0.07 -0.49 1.00 -1.16 -0.69 0.02 1.34 -0.26 0.13 -1.15 -0.98 -1.25
0. 0 0. 7 4 0. 8 6 0. 3 4 0. 8 9 0. 5 6 0. 5 3 0. 2 0. 5 4 0. 9 0. 9 1 0. 8 0. 8 0. 9 2 0. 5 3 0. 3 2 1
NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
Finally, the results in Table 1c demonstrate a clear pattern pertaining to the mixed author group. For every text and for both components results are all non-significant (exactly the same outcome as the single-authored group), supporting the mistaken assumption that each individual text was the work of just one author, when in fact all of the texts were the work of multiple authors. These results do not confirm the hypothesis; all of these texts were wrongly identified by the Cusum analysis. Further, none of the twenty-one t-tests was at all close to significance at 0.05. Indeed, if there was any trend towards statistical significance it was to show that the single-authored texts had differences within them that could be mistakenly interpreted to indicate mixed authorship. Table 1c Mixed author text group File name
Vowel words
2/3 letter words
Z Prob. Signif. value
Z Prob. Signif. value
BABBANK1
-0.8
0.42
NS
0.78
0.44
NS
BABBANK2 BABBANK3 BABBURN1 BABBURN2 BABBURN3 MURBURN1 MURBURN2 MURBURN3 PITSEN2 PITSEN3 PIT1SEN4 FIRMUR1 FIRMUR2 FIRMUR3 SEN2FIR1 SEN3FIR2 SEN4FIR3 BANPIT1 BANPIT2 BANPIT3
-0.96 1.11 1.69 0.84 -1.74 -0.12 0.65 -0.19 -0.38 -0.53 -0.04 -0.66 0.41 -0.69 1.49 -0.45 1.4 0.63 -0.21 1.49
0.34 0.28 0.09 0.40 0.08 0.91 0.51 0.85 0.71 0.6 0.97 0.51 0.68 0.49 0.14 0.65 0.16 0.53 0.84 0.14
NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
-0.6 0.52 0.26 -0.13 0.48 -0.52 0.77 0.44 -1.25 -1.31 -0.98 0.13 -1.24 0.58 1.0 0.07 -0.34 0.04 1.34 1.0
0.55 0.60 0.8 0.89 0.63 0.60 0.44 0.66 0.21 0.19 0.33 0.9 0.21 0.56 0.32 0.95 0.74 0.97 0.18 0.32
NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS
DISCUSSION Results obtained in this investigation indicate that the weighted Cusum technique does not
reliabley discriminate between single and multiple authored texts. Non-significant results were obtained for every analysis on both single- and multiple-authored texts, thus there is not even a trend toward discrimination. These results complement those found by Hilton and Holmes (1993), that the weighted Cusum method is not reliable in discriminating between single- and multiple-authored texts. Even if the results were to be used in the capacity that Bissell (1990) suggests `for general guidance' they are more likely to guide the court in the wrong direction. However, it is important to emphasize that the results for the single-authored texts on their own could mistakenly be taken to indicate that the lack of significant difference shows an appropriate lack of discrimination for single-authored texts. This is a chance set of results no different from those found for mixed texts. Yet it may be this confusion in the interpretation of nondiscriminating results that has misled cusum proponents into believing that their technique had some value. Consider the application of these results if the Cusum technique were being used in court to determine the authorship of a statement or a confession. A statement that had been modified by another person, and thus had multiple authorship, is just as likely to be identified as having been written by one person as a statement that is the work of one person only. This investigation was based on very different types of material; novels, police statements, academic books and police transcripts. Careful consideration of the results shows no differences in the statistical significance levels for the different types of material. The only discernible benefit provided by the weighted Cusum procedure is that it allows a decision to be made using a statistical procedure compared to the rather arbitrary decision by the observer of a Cusum chart in determining authorship attribution. However, although this procedure is more objective, as it is based on established statistical methods, the whole concept of Cusum is still founded on the premise that every individual has a unique habit in their speech/writing determined by their use of language components in a sentence. There is no generally accepted support for such an assumption. Thus, although the weighted Cusum method may seem more rigorous, it is no more valid than the original method for determining authorship attribution. APPENDIX Did not copy formula’s do as pictures plus text
REFERENCES Bissell, A. F. (1990) `Weighted Cusums-method and applications', Total Quality Management, 1(3): 391-402. Canter, D. C. (1992) `An evaluation of the "Cusum" stylistic analys is of confessions', Expert Evidence, 1(2): 93-9. Hilton, M. L. and Holmes, D. I. (1993) `In assessment of cumulative sum charts for authorship attribution', Literary Linguistic Computing, 8(2): 73 80. Morton, A. Q. and Michaelson, S. (1990) `The Qusum Plot', Internal Report CSR-3-90, Department of Computer Science, University of Edinburgh. Morton A. Q. and Farringdon, M. G. (1992) `Identifying utterance', Expert Evidence, 1(3): 84-92. Cusum and authorship attribution 261