User Modelling for Error Recovery: A Spelling Checker for ... - CiteSeerX

0 downloads 0 Views 71KB Size Report
One hundred years ago. Morgan (1896) ... a person whose reading and writing age is at least two years behind what would be expected considering ... spelling correction programs cannot cope with more than one error per word. At the same ...
User Modelling for Error Recovery: A Spelling Checker for Dyslexic Users Roger I. W. Spooner and Alistair D. N. Edwards Department of Computer Science, University of York, York, Great Britain

Abstract. In the pursuit of a remedy for the poor spelling of dyslexic writers, a software system has been developed which features centrally a user model of the writer’s spelling error patterns. Dyslexic writers are poorly catered for by m ost spelling programs because of their diverse errors. In this system, Babel, a user model directs the search towards likely corrections considering the writer’s common errors. Babel includes the novel idea of more complex rules which describe permutations of errors typically made by dyslexic writers. It also measures and improves the accuracy of its user model using as feedback the choices made from its list of suggestions. Samples of writing from dyslexic users have been used to show successful detection and correction of typical errors, consistent between samples by the same writer and distinct from other writers. The user model allows, for some users, more diverse errors to be corrected more effectively than in systems without individual user models.

1 Introduction For centuries literacy has been gradually increasing in importance. One hundred years ago Morgan (1896) documented the case of a boy who seemed intelligent in conversation and reasoning but was extremely poor at reading and writing. Dyslexia is now often diagnosed in a person whose reading and writing age is at least two years behind what would be expected considering their age, education and non-linguistic intelligence. Work on dyslexia has gathered pace recently with myriad papers on many topics including the causes and diagnosis of the condition (Ellis, 1984; Coltheart et al., 1993; Boder, 1973; Brown and Ellis, 1994; Galaburda et al., 1994) as well as more practical works on available remediation. Research work on the association between computing and dyslexia perhaps began with neural networks modelling language dysfunctions (Harley, 1993; Hinton and Shallice, 1991). Nicolson and Fawcett (1993) proposed teaching methods to assist dyslexic children and Elkind and Shrager (1995) looked for the optimal input modality for computer usage. While spelling correction has been of interest for many years in computer science there has been little work on applying those techniques to assist dyslexic writers. One of the reasons for this may be that the spelling patterns of individuals with dyslexia are peculiar. Thus a coventional spelling checker may not cover the kinds of errors made by such writers. In particular, many spelling correction programs cannot cope with more than one error per wor d. At the same time a dyslexic writer needs a spelling corrector which will suggest a small number (ideally one) of corrections, rather than a long list, which will be hard for the person to select from. The individuality of errors made by dyslexic writers implies that there is a need for the spelling checker to embody a user model of the writer, of their particular spelling patterns.

The program described in this paper, Babel (named after the communication problems of people in the Tower of Babel) addresses these problems. It can cope with multiple errors in words and embodies a model of each user’s spelling patterns. This paper describes the system that has been built and some preliminary results of testing it on samples of dyslexic writing.

2 User Model Operation Production rules are a familiar concept in computer science and artificial intelligence (see for example Young and O’Shea, 1981, Brown and Burton, 1978). A production rule takes the form: rule condition whereby in a situation in which the condition is true, the rule is said to fire. Some complex systems can be described by a set of such rules, known as a production system. At any time in some system, if its state indicates that the condition for a rule is met, then that rule may fire. That firing will change the state of the system, so that another rule may now be eligible and so on. Young and O’Shea (1981) showed that arithmetic subtraction could be described by a production system. Furthermore, they demonstrated that common errors made by children performing subtraction sums could be modelled by adding mal-rules to the system. The rules of subtraction may be thought of as the method taught by the teacher and the mal-rules are mutations of them which are the rules as understood by the child. The child applies the mal-rule in the belief that it will lead to a correct calculation. The rules for English spelling are more complex than those of arithmetic. Hence the approach taken here is to model the mal-rules only. That is to say that Babel does not attempt to derive correct spellings from scratch, but when a spelling error is considered, Babel attempts to identify rules which will generate the correct word from the mis-spelt one. The rules are derived from a cognitive model. In the case of Babel, the model is that of Patterson and Shewell (1987) although the same user model architecture could be used in a different application by being based on a different model. This model (shown in Figure 1) describes the flow of information between processing units in the brain. It includes stages such as Orthographic Output Lexicon (to convert word meanings to spellings), Phoneme to Orthography Conversion (to convert sounds about to be said into spellings) and Sub-word Level Orthographic to Phonological Conversion (to pronounce written words in parts). The suggestion is that some (if not all) of the errors made by dyslexic people can be attributed to deviations within this model. For instance, some errors can be explained by errors in the phoneme-to-orthography conversion stage. That is to say that the user correctly spells a word which they identify wrongly; they believe it to be one which sounds similar to the correct word. An example would be mis-spelling thought as fought. An example of a specific error is the frequent substitution of the letter b in place of its mirror image, d, due to an error in the orthographic-analysis or writing output buffer stages of the Patterson model. Another example would be the writing of a single consonant within a word in which that letter should be doubled (e.g. writing leter instead of letter). This can be traced to an error in the orthographic-output-lexicon or phoneme-to-orthography-conversion stages of the model. The rules must describe every possible error made by the user. Specific rules such as these are the most valuable since they give the most information about the individual, but to ensure

!

Spoken Word

Written Word

Acoustic Analysis

Orthographic Analysis

Auditory Input Lexicon

Acoustic to Phological Conversion

Cognitive System

Phonological Output Lexicon

Buffer Speech

Orthographic Input Lexicon Sub-Word Level Orthographic to Phonological Conversion

Orthographic Output Lexicon

Phoneme to Orthography Conversion

Buffer Writing

Figure 1. The universal dual route model of language production from Patterson and Shewell (1987)

complete coverage more general rules are also included. At the extreme any word can be transformed into any other word by deleting all the letters of the first word and inserting all those of second! As far as possible the rules should represent expected fundamental errors rather than string transformations which have been observed. The derivation of the rules is not important to the user model, however. If, once executed, a rule is found to be used many times it may be possible to sub-divide it to allow the user model to gain a more detailed image of the user. This evolutionary procedure is not haphazard but systematically experimental (Mitchell and Welty, 1988, offer a good discussion of the unfortunate lack of experimentation in Computer Science). Each rule is implemented in a routine that takes a spelling permutation (a data structure described below). The rules perform a number of preliminary tests and produce zero or more new permutations in addition to the one presented as input. A routine will be called by the user model if required. Each permutation is a data structure which contains the current letter sequence of the word it describes as well as a record of the rules used. Rules often (but not invariably) exist in pairs, a positive and negative form. This is because, when a person is composing the spelling of a word, he or she will apply spelling rules which may or may not be appropriate. For example, some people append the letter e to the ends of words where it is not required, while some people fail to write trailing e’s where they are required. According to the earlier description of production systems, at any time, any rule whose condition is met may fire. In any complex system it may well be that at any time there may be more than one rule whose condition is met. There are a variety of ways of deciding which rule should fire in such a situation. In Babel the rules are weighted. That is to say, each rule is tagged with a number which represents how likely that rule is to be appropriate. If more than one rule

is eligible then the one with the lightest weight will be selected. The weights on the rules evolve with the use of the system, with lighter weights on more common rules. In this way they come to represent the probability of the user making the associated error. Hence the set of rule weights in time comes to characterise the individual and so embodies the model of that user. Rules can be inhibited by previous rule applications according to routes in the cognitive model. That is, applying rules to permutations moves them along the lines in Figure 1. From each point in the model, only some rules can meaningfully be applied so the others are inhibited. In the composition of the spelling within the writer’s mind, the cognitive model describes certain stages which are passed in sequence. If an error transformation rule from one stage is applied to a certain permutation in the software, the location of the permutation in the cognitive model is defined, limiting the variety of other transformation rules which can then be applied. However, because the same error pattern can be caused by several different parts of the cognitive model, and because only the error patterns can be observed by the computer, it is not always clear where in the model’s processing of a particular permutation is. The system has been designed with a liberal attitude for these cases, so it is more likely that a rule will be tried inappropriately than that one will be excluded when it should be considered. The inhibition is enforced by a set of boolean flags indicating which rules may be applied to a permutation. As a change is made, the flags are updated in the new permutation. 2.1 Generality Score As noted above, some rules such as wildcard operations are more general than others, and it is the more specific (less general) ones which are most informative. This is captured by a generality score, which ensures that a less general rule will be chosen where possible in preference to a more general one. Features to be captured by the user model which are general cases of others are not a new concept, although most user models based on hand-designed cognitive architectures generally do not allow overlap between cases. Webb and Kuzmycz (1996) note which features are general cases of others and have prepared rules preventing their combination at run-time. Finlay (1990) uses automatic classification processes which do not lead to general rules. Babel, unlike existing systems, allows the summation and comparison of a sequence of transformations on a string; it will select the transformation path with the lowest total number of general operations rather than operating on individual rules which might lead to a heuristically less desirable result. The system produces as output a list of transformation paths which use the transformation rules to convert written to correct word spellings. In some cases several paths will exist because several different combinations of rules can achieve the same final spelling. In those cases which employ more specific rules, other permutations will exist which employ more general rules (for example where a b has been changed to a d in one permutation, another will exist which uses a generic letter substitution). Each rule has been given a generality score by the designer. Those rules which are general cases of others (such as wildcard rules) have higher scores. Those rules which are unrelated to each other (the majority) have no predetermined relationship in their generality scores. It is important to take care when assigning generality values, a process done at the time of creating the rule set. Only three levels of generality have been used. In normal system use, when

a word’s spelling has been corrected and a number of different paths all lead to the same spelling, the generality values for each rule used in each permutation are added. Only the permutations with the smallest summed value are considered further. This excludes unnecessary wildcards in corrections which will almost always appear. 2.2 Weight Adjustment The key to the user model is the weight of each rule. When a rule is applied to a spelling permutation its weight is added to those already used; the final permutation therefore has a weight equal to the sum of the rules used in constructing it. The weights are adjusted (a lower weight is used for a more common rule) for each person to represent the errors they have been found to make. A permutation with a lighter summed weighting is more likely than a heavier one; this fact can be used in pruning the search space and in ordering suggestions to the user. Once the system has been offered to a new user it is important to model their errors as quickly as possible. This is proposed in two stages; a short period of training in which a proficient user (such as a teacher) helps the new person to learn how to use the system and ensures that the correct transformations are chosen from the list of suggestions (i.e. that the user does not select the wrong word from the list of suggestions, which would reinforce the wrong set of rules) followed by an ongoing refinement of the model using the ordinary search with feedback from the user as to choices made. During training, the system is simultaneously given a word as written by the author and a corrected spelling of the same word. The directed search is then applied to find a path between these two spellings. This is described further below. In normal use of the system, a word is given to the system which then produces a list of suggestions consisting of words from the dictionary found using the most likely transformation rules. When one is selected, the rules used in that permutation are modified (the weights are reduced) and the weights for rules used in the remaining incorrect permutations are increased, so lessening their future usage. For each correction word pair (written/correct spellings) a number of paths may exist. However, only one sequence of rules is required to account for an error, so the most appropriate single path for each error must be found. A heuristic method described below will be used to find the smallest set of rules used in the largest number of correction pairs, operating over a substantial number of corrected error words. For those rules which have been used very rarely, the number of occurrences may not be representative of future errors. The number required to trust the rul e weightings should be a constant, and the value 3 has been suggested in Webb and Kuzmycz (1996). 2.3 Path Choice Heuristic After a number of words have been corrected, there will be a set of string pairs (written, correct) and one or more different transformation paths employing one or more rules to achieve each correction. The task then is to select the most appropriate single transformation path for each correction and to strengthen the rule weights used in that path. The other possible paths not selected are of no further interest.

For each word pair, the number of rules used in the transformation paths are counted. Any path with more than the minimum number of rule applications is discarded on the grounds that spelling errors are most likely to be simple (and it is the responsibility of the rule set designer to capture the errors with sufficient descriptive power). This will probably leave more than one transformation path for many word pairs; these must be reduced to exactly one in all cases. The entire set of remaining paths is then searched for rules used more frequently and which are less general than others used. Where there are several rule paths which can be used to correct the same written/correct word pair and at least one uses the most common rule, the other paths for that word pair are discarded. The process is repeated with the second most common rule and so on until only one correction permutation exists for each word pair. 2.4 Directed Search In the event of the target (correct) word being known and no transformation path having been found using the ordinary search method, this evolutionary method will be used. This means in practice that it will be used during training after meeting a new user. It disregards the rule weights which are known to have failed, and uses a simple local search to find a solution. The local search (Kerninghan and Lin, 1970) is a predecessor to genetic algorithms. There is much literature about these algorithms, for example Goldberg (1989). Local search was designed for graph partitioning but contains the essence of an evolutionary approach. Starting with the written word, all rules are applied once to produce new permutations. The similarity between each permutation and the target word is found, and all but the most similar ones are deleted. The process then repeats with the remaining permutations. The similarity between each permutation and the target word is measured using trigrams; clusters of three letters. The number of matching trigrams between two permutations can be counted and used as a fitness function for the search. Trigrams are used for the similarity measure because they are quite unlike the other methods used in the system and so should perform differently (and hopefully better) for difficult cases. 2.5 Normal Use The ordinary case of real-time spelling correction, in which the written word is known and a correction list must be produced for presentation to the user is similar to the preferred case during training. This is a novel heuristic search unlike most existing algorithms because it uses feedback from previous cases to modify the weights of the rules which will be used in future cases. It applies transformations to permutations and tests them with a simple boolean fitness function: whether the new permutation is in the dictionary of known words. The list of permutations produced so far is searched for the one with the smallest summed weights (of the rules applied to it so far). Initially the written word is the only permutation present with no rules applied and a weight of 0. The chosen permutation then has all applicable rules applied to it, to produce new permutations with higher total weightings. The search is stopped after several words from the dictionary have been reached, after a certain large number of permutations have been considered or the summed weights of all permutations exceed a limit, or after all allowable rules have been applied.

If other information were available such as the grammatical correctness of each word or the appropriateness of a word in the context of discourse, such information could be used to exclude some words from the list of suggestions and then re-order it. The incorporation of such grammatical information might form the basis of future work.

3 Dyslexic Writing Samples The dyslexic writing samples are from school children aged 13–16 in schools in York, Surrey and Edinburgh, and from university students in York and Hull being assessed for dyslexia. They have varying degrees of dyslexia, although those university students who were found not to be dyslexic have been excluded in the sample. The samples are mostly of hand writing. It is the most easily available form because many school age dyslexic pupils do not write with computers (and, indeed, do not write much at all). Hand writing may be different from typed material in a number of ways which will be tested for formally later in this research project, but it can initially be assumed to be similar in spellings to typed text. When writing by hand, some dyslexic writers attempt to shape letters ambiguously so as to have a better chance of appearing to have used the right letter. Considering the ambiguity in letter shapes, proof reading a document becomes more difficult for a dyslexic writer (who has difficulty reading anything). Unreadable words have been marked as such and are not included in further study. To allow automated analysis of the writing samples, they have been transcribed into a machine-readable format including the intended words (as judged by a human reader) for each erroneous spelling. A total of 160Kbytes of material by more than 30 people has been transcribed from nearly 80 documents, including more than 2700 simple mis-spelt words (not multi-word errors). Efforts to obtain further samples continue, although this remains something of a problem considering how little dyslexic writers produce. Although this may seem like a substantial sample, and is more than that collected for other projects reported in the literature (PalSpell, Wright and Newell (1991), was based on 221 sample words from 4 people), it is still not enough for a comprehensive analysis. With the best samples in this project at present, there are approximately 400 mis-spelt words per person. Since the purpose of the project is to correct the errors of individuals, the quantity of text from each person is crucial and so data collection is an ongoing process. An on-line experiment (Spooner, 1996) has been prepared on the World Wide Web in the programming language Java, inviting users to type some sentences to dictation. This is bringing in more material with the advantage of being first-generation typed material without transcription errors. Also, the timing of keystrokes is available for inspection, and the same words have been written by each person. At the time of this writing (March 1997), only a small number of people have participated.

4 Testing and Results The software system Babel, having been built as described above, has been tested using the corpus of sample texts. The corpus contains several documents from each of several students. On the first pass the model operates with its directed search and builds a user model for each

WO1 WO2 WO3

WE1 WE2 WE3

PO1 PO2 PO3 PO4 PO5

MI1

MA1 MA2 MA3 MA4 MA5 MA6 MA7 MA8 MA9

HA1 HA2 HA3 HA4 HA5 HA6 HA7 HA8 HA9 HA10 HA11 HA12 HA13 HA14

EM1 EM2 EM3 EM4

CO1 CO2

BU1 BU2 BU3 BU4

BR1 BR2 BR3

document. The path choice heuristic is then applied to reduce the user model to a characteristic minimum of features, and then the system may be re-run in its normal search mode to attempt to correct actual errors. User models for different documents by the same author can be compared to establish consistency.

WO3 WO2 WO1 WE3 WE2 WE1 PO5 PO4 PO3 PO2 PO1 MI1 MA9 MA8 MA7 MA6 MA5 MA4 MA3 MA2 MA1 HA14 HA13 HA12 HA11 HA10 HA9 HA8 HA7 HA6 HA5 HA4 HA3 HA2 HA1 EM4 EM3 EM2 EM1 CO2 CO1 BU4 BU3 BU2 BU1 BR3 BR2 BR1

460 words

Figure 2. Correlations between user models built from documents by various authors. The ideal would be small boxes at the intersection of documents by the same author and large boxes elsewhere.

Some errors which were expected have not been made in the sample text, for example confusions between G and Q (which look similar). Other rules have proved surprisingly common, for example the consonant doubler (in which the writer misses one of a double consonant, as in realy really). This error occurs not only in the mis-application of suffices to root words, but at all positions in words. The misuse of final e’s occurs moderately, as expected; writers both insert and remove e’s from the ends of words incorrectly, omitting them slightly more often.

0

If the system is to achieve a signficant improvement over other spelling checkers then it must identify characteristics of each user which are notably different from those of other users. It is a hypothesis on which the rest of the work rests that a person’s spelling errors contain consistencies over long time periods. This is implicitly accepted by people who work with dyslexic texts but has never before been measured. Figure 2 shows the similarities between user models built by Babel, based only on the frequency of application of rules. A number of documents written by a number of people are compared. The size of each box indicates the strength of the correlation between the user models of the two samples; a small box indicates a good correlation. The shading of each box indicates the number of words considered in the smaller document and hence the reliability of the result; a dark box indicates many words and thus a more reliable model. Several things can be seen from this comparison. Except in a few cases, the number of words compared is small (typically around 70). Some documents have a poor match with almost all other documents although often a better match with pieces by the same author. The correlation is measured as a distance between two points in a multidimensional space. If each error pattern rule represents a dimension then any particular user model is a point in that space. The vector length between two points is the difference between the user models. The dimensions are first scaled so that each has the same maximum. Thus instead of one unit being one application of the transformation rule, it is proportional to the total number of applications of that rule. By this scaling, rarely used rules can still form an important part of the characterisation. The number of documents written by each author varies simply because only the material available for research could be used. Those with more material were welcomed and those with less were not excluded. Documents with less than 25 misspelt words have been excluded. Choosing a statistical analysis for the results has proved somewhat troublesome, and at the time of this writing a conclusion cannot be drawn with confidence. Research on this project is continuing, and will presently include a proper analysis of the results. A preliminary analysis suggests that two authors out of the population of nine have user models for their individual documents which are more similar to each other than to those of other authors. These are shown in Figure 2 as users MA and HA. This is weak but positive support for the suggestion that a dynamic user model can help in spelling correction for some writers’ error patterns. If the user mo dels are not significantly different from each other then there is no point having a dynamic model and one might as well use an ordinary spelling checker. After running the software in its normal mode (instead of the directed search), it was able to correct about 80% of the error words from the dyslexic authors. Research is continuing into the nature of the errors corrected and missed, and unfortunately more detailed information is not available at this time.

5 Discussion The user modelling system has so far not yielded convincing correlations of the type that were expected. This may be because the spelling errors are not consistent in the ways examined by this user modelling system, although they may well be consistent in other respects. However, it is impossible to be definite about the cause of the low correlation at this point since the samples are rather small. The collection of more data, and especially of large samples

from individuals, thus remains a priority. Research will also continue in investigating correlations of other sorts such as position in sentences, word types and so on. Needless to say, the collection of more data, and especially of large samples from individuals, remains a priority. It remains likely that certain errors are typical of certain people; for example in this study one author frequently used left-right mirroring within letters, in letters within words and in words within a phrase. Other writers typically confuse b for either d or p. A number of methods could be employed to improve accuracy, such as weighting suggestions by frequency, recency, or discourse domain. This will be done presently to enhance the research work, but at this stage they would obscure the underlying performance of the rules. The system is in the process of refinement and current results are somewhat preliminary. Although the results are encouraging, it is clear that for many of the sampled writers, the rules in Babel are too general. Nevertheless, results so far suggest that the individualised spelling correction that it can provide based on its user model will provide a more valuable spelling aid to dyslexic users than general conventional spelling checkers for at least some users. The creation of new transformation rules would help refine and improve the remaining errors although the necessary coarse phonetic transformations are not simple. The position of rule use within each word, and the choice of letter where a wildcard is used, will be incoporated into the system soon. A number of possibilities exist for the continuation and generalisation of this work. As was mentioned earlier, it may be that the system’s accuracy might be enhanced by the incorporation of natural language processing which may improve the identification of intended spellings. There are also a number of user interface questions which could be addressed once a working system has been established. For instance, how does one present information about spelling correction to a person who has difficulty reading? This work also has wider implications in that Babel’s user modelling techniques might also be applied to other “strings”. Instead of letters these might for instance represent sequences of actions, where a “mis-spelling” might represent an incorrect action sequence. For instance, the sequence of actions undertaken by an aircraft pilot performing a standard task such as taking off might be encoded as a string. Any deviation from the normal sequence should be investigated further; it might represent a dangerous error.

References Boder, E. (1973). Developmental dyslexia: a diagnostic approach based on three atypical reading-spelling patterns. Developmental Medicine and Child Neurology 15:663–687. Brown, J., and Burton, R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science 2:155–192. Brown, G. D., and Ellis, N. C. (1994). Handbook of Spelling: Theory, Process and Intervention. New York: John Wiley. Coltheart, M., Curtis, B., Atkins, P., and Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review 100(4):589–608. Elkind, J., and Shrager, J. (1995). Modeling and analysis of dyslexic writing using speech and other modalities. In Edwards, A. D., ed., Extra-Ordinary Human-Computer Interaction: Interfaces for Users with Disabilities. New York: Cambridge University Press. Ellis, A. W. (1984). Reading, Writing and Dyslexia. London: Lawrence Erlbaum.

Finlay, J. E. (1990). Modelling Users by Classification: An Example-Based Approach. Ph.D. Dissertation, University of York, UK. Galaburda, A. M., Menard, M. T., and Rosen, G. D. (1994). Evidence for aberrant auditory anatomy in developmental dyslexia. Proceedings of the National Academy of Sciences 91:8010–8013. Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Hillsdale, New Jersey: Addison Wesley. Harley, T. A. (1993). Connectionist approaches to language disorders. Aphasiology 7(3):221–249. Hinton, G. E., and Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review 98(1):74–95. Kerninghan, B. W., and Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49:291–307. Mitchell, J., and Welty, C. (1988). Experimentation in computer science: An empirical view. International Journal of Man-Machine Studies 29:613–624. Morgan, W. (1896). A case study of congenital word blindness. British Medical Journal 2:1378. Nicolson, R. I., and Fawcett, A. J. (1993). Computer based spelling remediation for dyslexic children using the selfspell environment. In Wright, S., and Groner, R., eds., Facets of Dyslexia and its Remediation. Elsevier Science Publishers B.V. 551–565. Patterson, K., and Shewell, C. (1987). Speak and spell: Dissociations and word-class effects. In Coltheart, M., Sartori, G., and Job, R., eds., The Cognitive Neuropsuchology of Language. London: Lawrence Erlbaum Associates. 273–294. Spooner, R. I. (1996). On-line typing experiment. http://www.cs.york.ac.uk/dyslexia/. Webb, G. I., and Kuzmycz, M. (1996). Feature based modelling: A methodology for producing coherent, consistent, dynamically changing models of agents’ competencies. User Modeling and User Adapted Interaction 5:117–150. Wright, A., and Newell, A. (1991). Computer help for poor spellers. British Journal of Educational Technology 22(2):146–149. Young, R. M., and O’Shea, T. (1981). Errors in children’s subtraction. Cognitive Science 5(2):153–177.