Computer Detection of Errors in Natural Language Texts: Some Research on Pattern-Matching Author(s): Glynda Hull, Carolyn Ball, James L. Fox, Lori Levin, Deborah McCutchen Reviewed work(s): Source: Computers and the Humanities, Vol. 21, No. 2 (Apr. - Jun., 1987), pp. 103-118 Published by: Springer Stable URL: http://www.jstor.org/stable/30200077 . Accessed: 30/11/2011 16:45 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
Springer is collaborating with JSTOR to digitize, preserve and extend access to Computers and the Humanities.
http://www.jstor.org
Computersand the Humanities21 (1987) Paradigm Press, Inc.
Computer Natural Some
Research
Detection
of
Errors
in
Texts: Language on Pattern-Matching
GlyndaHull, CarolynBall, JamesL. Fox, Lori Levin, and DeborahMcCutchen
writing course called Basic Writing--to edit or correcttheiressays for errorsin syntax, punctuation, grammar,usage and spelling. We have chosen to focus on editing, the process of error location and correction,both because it is a universal partof writinginstruction,particularlyfor students,and because it is a part under-prepared thatteachersoften do not enjoy and almostnever do adequately.Happily, it is also a task which computershave the potentialto performwell. We are an interdisciplinarygroup, consisting of cognitivepsychologists,compositionteachers, computerspecialists, and linguists drawn from the Departmentof English,theLearningResearch and DevelopmentCenter,and the Departmentof Linguisticsat the Universityof Pittsburgh.Our modus operandihas been to combinewhatsome of us know abouthow studentscan best be taught to edit, with what othersof us know abouthow computerscan best be programmedto do this teaching. In this reportwe describesome of the work we have done thus far-in particular,the researchwe areconductingon errordetectionvia First,however,we give a brief pattern-matching. accountof the instructionaltheorythatdrivesthe programswe are developing.
Each year, studentsat colleges and universities acrossthe countrybegintheirclasses with a disadvantage,for they don'tpossess the literacyskills requisite for participationand success in an academicenvironment.These students,most of whom have done little readingand writing,must quickly acquire those language skills that will allow them to carry out academic work. They must, for example, learnto write papersthat are coherentandconvincing,butthey mustalso learn to writepapersthatarecorrect-and this last task is ironicallyquite crucial, for teachersgenerally havelittlepatiencefor mistakesandofteninterpret them as signs of students'generalignorance. At the Universityof Pittsburgh,by means of grants from the Ford Foundationand Digital EquipmentCorporation,we are engaged in research that we believe will be a help to such studentsas theybeginthe taskof learningto write. Specifically, we are developing computer programsthatwill teach low-performingcollege students- studentsrequiredto enroll in a remedial GlyndaHull teaches in the GraduateSchool of Education, Universityof Californiaat Berkeley. CarolynBall, a Ph.D. candidate in the DepartmentofEnglish; JamesL. Fox, a systemsanalyst at the Learning Research and Development Center;andLoriLevin, a computationallinguist in the Departmentof Linguistics, are all at the Universityof Pittsburgh. Deborah McCutchen is an assistant professor at the University of Washington. Theresearchreportedin this article was sponsored by grantsfrom the Ford Foundationand Digital EquipmentCorporation.
A Pedagogyfor Editing
Many errorsmade by beginningwritersare systematic, resulting from either an idiosyncratic or fromdialectand grammarand/ororthography, oral language interference(Shaughnessy,1977). To learn to correct those errors, studentsmust imagine other, more conventionalways of constructingsentences so that they can modify or replacetheirown erroneousrulesandprocedures.
103
104
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
One way to developthisfacilityis to give students feedback on the presence of an error("There's an errorin this sentence.")and on theirattempts to correctit ("No, that'snotquiteit;tryagain."). Figure 1 contains a typed copy of an essay handwrittenby a student at the beginning of a semester in a basic (remedial) writing class. Labeledin the essay are sentence-levelproblems that basic writing studentshave-errors in syntax, grammar,mechanics, usage, and spelling. The firstthingthatis noticeableabouttheseerrors is their abundance.There are some forty errors in this essay, which is itself only about two hundredwords long. A second thing to notice is that many of the errorsare systematic.That is, the same kinds of errorsappearover and over, and theirregularitysuggeststhattheirsourcelies in a writer'sintentionratherthanhis carelessness. A good example of a systematicerroris the first sentence of the first paragraphand the first sentence of the last paragraph,instancesof whatwe call a "topic/comment" error.In these sentences, the writerannouncesa topicin a phrase(Finishing the story about Ruth James and My response from this story) andthenjuxtaposesthisphraseto a completesentence(she was the type of person who believed in herself and I felt that she was pressured by her own peirs). It appearsthatthis writeris not yet able to managethe subordination called for by this rhetoricalmove, and he simply resortsto juxtaposition,sayingwhatthe topicis in a phraseand then going forwardto elaborateit. The resultis a queerkindof syntaxthatthe writer will need to edit, but a syntax that has its own logic in the contextof this student'sgrammar. The errorsin students' writing can, in fact, often be shownto be systematicderivativesfrom an idiosyncraticgrammarand/ororthography.A student might, to give other examples, put a comma before every and in a sentence, having over-generalizeda rule for punctuating compound sentences. Or,being uncertainabouthow to managethe subordination of ideasin a complex sentence, he might attachany dependentclause to a main clause with an all-purposeconnector like in which. If errorsarethe resultof systematic rules, a pedagogyto teachwritersto correctthem mustprovidean occasionfor writersto recognize anerrorandto modifytherulethatproducedit.
One way to provide this occasion (see Bartholomae, 1979, for a more detailed explanation of this pedagogy) is to give studentsthe task of finding and circling their own errors. By so doing, a teacher will learn about her students' skills as editors; she will, that is, learn which errorsthey can and cannot find and correct. In the case of those errorsa studentcannot locate, a teachercan assist him or her with the process of editing by highlighting the areas within an essay thatcontainerrors,so thatthe studentcan begin to search again. In this manner,students can develop and use editingproceduresfor identifying structuresthat they perceiveto be unconventionaland proposingalternativesto them. Editing, in otherwords, calls on skills in reading and problem-solving.Errorhighlightingallows students to search for problems on their own andto learnto see patternsof error,like the topic/commentproblem illustratedin Figure 1. Initially,however, they need to have the search constrained,just as novice problem-solversin any domainof knowledgeand skill need to have beginning problems well-structured.The highlightingof errorenablesa studentto acquirebasic skills, and as these develop, moreproblemareas can be included and the student can handle broaderbandsof error.Instructionin editingcan, then, be sequenced;it can move from one stage to another,where the stages are defined by the degreesof skill involvedin findingandcorrecting certaintypes of errors. Computerized Instruction in Editing We are developingcomputerprogramsthat will teachstudentsto edit theirpapersforerrors,using the pedagogical approachdescribed above. We want our programsto: * direct a student'ssearchfor errorby setting up an editing session once all otherworkon a paper is complete; * assist that student'ssearch by highlighting in boldface or reverse video the area within which an erroroccurs; * allow the studentto propose changes and interactwith the studentwhen the change is not a correction; * chartthe errorsthe studentcannotfind andcannot correct;
DETECTION OF ERRORSIN NATURALLANGUAGE TEXTS COMPUTER
Finishing the story about Ruth James she was the type of person who believed In herself and never Liked to be pressured by her peers. I admired her because she had a sense of Looling, right, through people and Knowing what there all about, for instance the priest, an.d the crewLeader they both wanted to take advantage of ruth I felt she was a strong mentally young women who wanted something more out of life than Just been a tmgrant worker
she knew when to say "no" to people, such as that crew
Leader. I felt he was a disgusting, arrcgant, and undersuper ego person I ever saw in my Life. I admired Ruth for her courage and strength against that crewleader. I felt the most significant part of the story was, when everylbodywanted rutb to tare her sister's children and marry the husband, It seem that everyond wanted her to get married because she had no husband. So what that's one of Ruth problems nobody pressed the migrant workers to the farmland. I'm glad Ruth stood on hev on two feet and said "no", Ruth choose a Life that she wanted to Live. My response from this story, I felt that she was pressured by her own peirs who never understood Ruth's viewpoint.
Sometimes frustration on work the boss only sees It from his own
viewpoint, not the reality of pressure on his employees on makinz money and how they feel.
Figure 1. An Essay by a Basic WritingStudentMarkedfor Sentence Level Errors
105
106
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
* and lead the student througha more detailed searchby clusteringerrorsaccordingto type. The crux of this researchis to build programs with errordetectioncapabilities.
RelatedError DetectionPrograms Several programsare available which provide feedback on text characteristics,including the presence of errors, although, for the most part, such errordetectionis limitedin scope and quite often focuses on mattersof style andusagerather than grammaticalcorrectness.The best known of these is Bell Laboratories'WRITER'S WORKBENCH(Cherry, 1980; Kiefer & Smith, 1983), which analyzesthe style of documentsaccording to a set of parametersfor what constitutesgood technical style. WORKBENCH can, for example, determinehow accessible or readable a text is, calculate the percentageof passive verbs, and flag offending usages. It should be noted, however, that these text characteristicshave to do with style, not correctnessor error.A text that has a poor readabilityscore, a large percentage of passive verbs, and many ize verbs could still be a correcttext. In fact, WORKBENCH is predicated on an assumptionthat sentences will be grammatical.Thus, it treatsany groupof words ending with a periodor questionmarkas a complete sentence.Althoughit does providefeedback on certainerrors-unpaired quotationmarks,the accidentalrepetitionof a word, or errorsin using a or an, for example--it is not designedspecifically to deal with ungrammaticaltext. Another programcomparableto WORKBENCH in the sense that it, too, assesses style is HOMER (Cohen & Lanham, 1984). Several other programs provide the capability to flag offending usages andstringsthatmightbe errors.Forexample, they mightflag every instanceof there,since it is often confused with their.Then it is up to the writerto figureout if an erroractuallyexists. The best known programof this type is GRAMMATIK (Thiesmeyer,1984).The sampleprinciple, flagging every occurrenceof a word or phrase thatmightbe an error,is usedin severalpackages, (Von including HBJWRITER, formerly WANDAH
Blum & Cohen, 1984). The troublewith such an approachis thatit may not be helpful to an inexperiencedwriter,who most likely won't know,
once every thereandtheiris flagged, whichform is correctandwhichisn't. Moreworrisome,from a pedagogicalviewpoint, is the fact that calling attention to forms that are correct as possibly incorrectmay confuse the writer unnecessarily. Also, there are many, many errorsthat can't be flagged by searchingfor literal word strings. In contrast to programswhich deal only in style analysis or string matching is IBM'S CRITIQUE, (Heidorn, Jensen, formerly EPISTLE Miller,Byrd,andChodorow,1982). CRITIQUE purportedly detects several kinds of errors--for example, subject/verbagreement, non-parallel form, pronoun agreement--and provides feedback on how such errorsmightbe corrected.The programis still underdevelopment,and it promises to be a significantachievementin terms of error detection in naturallanguage texts. Our project thus shares some aims with CRITIQUE-errordetection capability--but it differs in that what we design is meant to be in educational ratherthanbusiness settings.We'reinterestedin programsthatteach the skill of correctingerrors, whereasCRITIQUE'S thrustis towardmakingbusiness writing more efficient and error-free.
DetectingErrorsin NaturalLanguageTexts
Some errorsin writing are more easily detected by machine than are others. In particular,most spelling mistakesbasicallyrequirealgorithmsfor matchinga word against a dictionaryof correct and incorrectspellings. However, it is a much more complicated matter to detect most other errors,errorsin syntaxand agreementandgrammar,for these errorshave to do, not with single word forms, but with how several word forms are combined into larger units. To detect such errors,we are developingsearchalgorithmsthat have a moresophisticatedpattern-matching capability than do spelling checkers;we call this apOurevenproach"augmentedpattern-matching." tual packagewill also includea parser,a program that analyzes the grammaticalityof a sentence, for as we will shortlydemonstrate,pattern-matching has its limits. We are interestedin pursuing however,becauseof its speed; pattern-matching, runningon a minicomputer,such programswill allow us to providequick feedback (in a matter of seconds) to students on the errors of their
COMPUTER DETECTION OF ERRORS IN NATURAL LANGUAGE TEXTS
107
writing, whereas parsing currently requires a greatdealof processingtime.Tocreatebothkinds of programs, pattern-matchingand parsing, we've triedfirst to extendourknowledgeof sentence-level errors. Some of this researchis reviewed in the next section.
This erroroften co-occurs with errorsresulting from oral languageinterferenceinvolving missing endings.Studentswho aretryingto remember to put ed's on verbs in past tense sometimes over-generalizethis rule and erroneouslyinflect infinitives.
A Taxonomyof Error In the past, researchershavebeen able to work only with very limited errorsamples. Over the past year,however,we have developeda computerdatabasethatnow containssome 1,000 student essays. Thus, for the first time, it is possible to examine systematicallya large corpus to determine whatarefrequentandpersistentandcharacteristic errorsfor basic writers,what happensto these errorsin the courseof instruction,andwhat linguistic cues there are for given error types. Thus, an empirical databasewill drive our instructionalprograms,not a style guide or a manual on properusage. Our first step in this researchhas been to develop a taxonomyof errortypes. Ourprocedure has been to analyze essays from our database, describingas specifically as possible any violations in syntax, grammar,usage, or punctuation. We collect multipleexamples of each errortype and continually refine our category scheme, creatingnew categoriesor subdividingold ones as necessary.Currently,our taxonomyconsists of approximately 75 categories and subcategories, supported by approximately 375 examples. Here are two illustrationsof our approach.
SyntacticErrors with Which: The next error,also not mentionedin grammar handbooks,is a varietyof syntaxerrorthatconsists of a relative clause beginning with which, incorrectlysubordinatedto the mainclause, often by means of a superfluousin:
InflectedInfinitives: The following errortype is rarelymentioned in grammarhandbooks,but it is presentin students'writing:aned addedto aninfinitiveform: * I was just going to handed it to the person who was going to do the hiring. * The next day about9:00 o'clock anotheragent droveJim and myself out to Oaklandto looked for multi-unitedbuildings. * You only get a chance to learned them the hardway when you enter the business world. * We had to designed everythingwe made.
* Each personin every book has a certainway of changing from Maya Angelou and her fight for dignity, to Holden Caulfield and all the phones in which he ran into. * Everyone'slife is effected by the occupationin which they are involved in. * Base stealing and stell spikes were allowed, in which they were not in my previousleagues. * I intendto go to college to try to find another field in which to take. Beginningwritersoften have troublemanaging in writing;thus, theirsentencesget subordination derailedat the point at which they must join a dependentclauseto the restof a sentence.The in whichconstructionis a good exampleof thisproblem. This structureis not one that is commonin speech.Whenstudentsfirstbeginto use it in writing, in an effortto takeon someof theconventions of writtenlanguage,to soundliteratein theirwriting, they bobbleit, addingtoo manyin's or using in which wherethat would be preferred. Such a taxonomy of errordiffers from most traditionaltaxonomies insofar as errorsare described accordingto specific linguistic features or patterns.While this taxonomy shares in the disadvantagesof all taxonomies--the overlapping of some categories and the relianceon interpretationfor categoryassignment,forexample (see Hull, 1987, for a discussion of problems with errortaxonomies)--the advantagesof this method for pattern-matchingare twofold. First, erroris no longer viewed as solely a matterof textual fault, but is insteadseen as a systematic or rule-based production, a shift in emphasis
108
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
which has more to offer in terms of pedagogy. And second, the specificityof descriptionis what makes pattern-matchingfor errorpossible.
Programs Pattern-Matching
One of our approachesto errordetectionis a more sophisticatedkind of pattern-matching-a programwhich will search, not only for literal strings of characters,such as individualwords, but also for part-of-speechpatterns.The crucial issue here is to determinewhich and how errors can be specified as part-of-speechpatterns.To use the first example above, the erroneouslyinflected infinitive form, we can trap most instances of this errorby directingour programto searchfor the patternto + verb in past tense. Such a programmustoperateoff a largedictionary where wordsare labelledaccordingto their attributes--whether,for example, they can be nouns or verbs or both and so on. Our current featurelist or set of word attributesfor our dictionary appearsin Figure 2. We arrivedat this list throughexperimentationwith a comparable list createdfor Bell Labs' WRITER'S WORKBENCH. (See Cherry,1978.) In manycases, we havemade finer distinctions than were present in WORKBENCH, distinguishingsingularandpluralnouns, for example. We also have chosen not to have double categories, for example noun/adjective, for words that can be more than one part of speech like leisure. Rather,leisurewould appear in our lexicon twice, as a noun and as an adjective. Since we are building our own dictionary, we are not boundto a particularset of attributes. It is likely, then, thatour featurelist will change as we continueourresearchon pattern-matching. We might, for example, discovera needto distinguish betweencountandmassnounsor comparative and superlativeforms of adjectivesin order to trapparticularerrors.' In our currentresearch, we are determining the kinds of errors that such pattern-matching programscan detectandthe accuracywith which they can locate them. Our procedureis first to review our taxonomy for potential patternmatching candidates, then to specify an error pattern,and finally to test the taxonomyon our computer database, which consists of approximately 1,000 essays written by basic writing
students. We check the resulting output of the search for accuracy;we want to know, that is, what percentageof actual erroroccurrencesthe pattern flags, and what percentage of the instances it flags are actualerrorsratherthanfalse positives. With this feedback, we can refine the patterns. Thus far, we have identified some 40 error patternsfromthe 75 categoriesin our taxonomy that we can detect by means of augmentedpattern-matching,and we have begun to test these patterns on our database. The error types we have examined include what grammarianscall a comma splice, as in The wall mural is a work of art, it standsfor me and all my energy; omitted apostrophesin possessive forms, as in her mothers boyfriend;superfluouscommas within constituents, as in What she does fear, is that she mighthave to move;hyper-inflections(illustratedabove); fragmentswhich are signalled by participlephrases, as in Picking out items that defined her personality; and homophoneerrors involving the substitutionof there for their and your for you're. For each errortype, we have tried to specify a pattern that our augmented pattern-matchercan use to isolate each instance of error.These patterns vary in the accuracy with which they reflect the underlying errors; some, such as the pattern for hyper-inflected infinitives, match virtually all of the errorsof a particulartype and result in very few false positives. Others,like the comma splice pattern, match a high percentageof the occurrencesof particularerrortypes, butalso flag a high percentage of correct structures.A discussion of the success we havehad with representativepatterns follows. Two disclaimersare in order,however,before we begin. First, the following results are based on a programwhich currentlymatchesonly triplets, three literal word strings or part-of-speech strings. This programdoes not have the more sophisticatedrecursivecapabilitiesneededto detect many kinds of error;for example, there is no "if/then"variablewhich would enable more precise patternspecification. Thus, what we're providing is a ratherrudimentaryand stringent test of a techniquethat could be improvedwith
COMPUTER DETECTION OF ERRORS IN NATURAL LANGUAGE TEXTS
109
Nouns: Singular, Plural Pronouns: Number,Person, Case (Subject/Object/Possessive) Relative Pronouns (i.e.,
which, that, who, whom)
Adjectives Adverbs(including particles) Conjunctive Adverbs(i.e., however) SubordinateConjunctions (i.e., although) CoordinatingConjunctions (i.e., and, for, or) Prepositions Quantifiers (i.e., many,all, some) Determiners: Singular, Plural (i.e., a, the, these) Numbers(i.e., first, three-quarters) End Stop Punctuation (i .e.,
. ? !)
Interjections (i.e., wow,oh) Verbs: Be Verbs, Non-BeAuxiliaries (i.e., have, has, must, do), There Verbs (i.e., seem, exist), Regular/Irregular, Forms(infinitive, 3rd person singular, simple past, past participle, present participle) Figure2. CurrentFeatureList for Pattern-Matching some simple heuristics.We will returnlaterto a discussionof whatthoseheuristicsmightinclude. Second, the results to follow are based on searches employing a lexicon whose attributes aren'talways the same as those we list in Figure 2, which is our most recently revised version. We note these differenceswherethey occur.But again, it shouldbe rememberedthatthese conditions emphasize the limits of pattern-matching more than they highlightits capabilities.
Results Pattern-Matching
In each of the following sections, we (1) describe an errorpatternand give whateverinformation we have on the frequencyof thatpatternamong ourpopulationof writers,(2) illustratethe results of a computersearch for that pattern, (3) note the limitations of the patternand the probable causes of those limitations, and (4) speculate about how to improve the pattern'saccuracy. Throughout,success rate refersto the percentage of structuresflagged that are actuallyerrors. InfinitiveErrors As the examples in our previous discussion
suggest, one common inflectionalerroramong very inexperiencedwritersis an inflectedinfinitive, as in I wanted to learned. The errorapparently is a byproductof students' attemptsto rememberto put ed's on certainverbs, particularly those for which the ed isn't sounded in speech, like asked. In their attemptsto get the ruleright,theyover-generalizeandputed'swhere they don'tbelong, such as on infinitives.To trap this infinitive error,we have searchedour data base for to plus past tense verb. Oursuccess rate is 84%. That is, 84% of the instances flagged are errors,and 16%are false positives. Hereare examples: Errors Flagged * We had TODESIGNED everythingwe made, but before we could make anythingthe design had to... * The way thatI have decidedTOPLAYED the hand has workedout well, because I have managed to maintaingood... * The teacher would met with you every two the problemsthatyou are weeks TODISCUSSED having.
110
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
* It is our choice TOCHOSE if we want to destroy our planet or make it a betterplace to... False Alarms: * If Arnolddid not have the properbody chemistry, he would not have been able TOBECOME Mr. Olympia seven times. * By this he meant that fate acts as an impartial judge TOSETparameterswithin the framework of decisions. * ...and it is also good to have people like parents, teachers, and etc. thereTOPUTrestriction on some choices because, in my opinion, putting a restrictionon... * This was the firsttimeI everhadTOCUT material down. The reason for the false positives is that in some cases the irregularpresentperfectform of a verb (such as become) is the same as the third person singularor base form, which is used with the infinitive. If, as in our currentlexicon, these forms are designatedunderthe generalcategory of "past tense," correctsentences such as those above will be flagged as errors.We could, however, improvethe accuracyof the to + ed pattern by separatingin our lexicon those verbs which form the past tense by addinged from irregular verbs whose perfect form is the same as their base form. (Notice that among the errorsthat were trapped,only one was an errorinvolving an irregularverb-chose-and most likely, this was a spelling errorratherthanan over-generalization of the rule for formingverb endings.)We can't predict 100 percentaccuracy,however,because the patternwould still flag correctusages like I went to learned menfor the answer and People who dared to riskedtheir lives. The ambiguity of the second example sentence could, of course, be resolved with the placementof a comma between to and risked-People who dared to, risked their lives-but, even though the patterncould be made sensitive to such a comma, basic writerscannot always be counted on to make the wisest choice in termsof punctuation. Anotherproblemwith the to + ed pattern is the failureto detectinstancesof theerrorwhich include a modifierbetween the to and the verb, as in I want to really looked. For that error,a
separatepatternwouldneed to be writtenin order to search for a to plus an adverb plus a verb ending in ed. HomophoneConfusions A. THEIR/THERE
Several homophone mistakes are frequent among inexperienced writers: confusions between wordsthat soundalike but are spelled differently,such as their,they're, there;you're and your; and in some dialects then and than. Our databaseindicatesthatthe most commonmisuse of their for there occurs in the existential there construction,which accounts for 83% of these mistakes. Other mistakes involving the use of their and there as a signal of location or the use of their for they're are much less frequent. To trapthe existentialtheirerror,we can search for their plus a form of to be, and when we do, we trapclose to 100 percentof the instancesof this particularerrorwith no false positives: * THEIR AREtwo ways you can endue a force on
him. * THEIR WEREfive main stages of construction:
the seat, the legs...
* Now THEIR IS a movementback towardhigher standardsand betterquality. of us that like to ski together, and we wereall haveingnice runsdownthe side slopes. * Tosolve anyproblemTHEIR AREfour basicsteps. * THEIRWERE four
If we searchfor "thereverbs,"or other verbs besides to be that can occur after there, and auxiliaries, we will also traperrorslike the following:
* The audiancewas superbTHEIR APPEARED to be 15thoudsandpeoplefromall overthecountry.
CommaErrors A. COMMA-BE ERROR
The most commonmisuse of a comma among our populationof writersis a superfluouscomma betweenthe subjectof a sentenceandsome form of the verb to be: Invention,is a resultof mental creativity (This errormight resultfromthe ruleof-thumb,"Puta commawhen you pause.")One
COMPUTER DETECTION OF ERRORS IN NATURAL LANGUAGE TEXTS
can searchfor this errorsimply by looking for comma + a form of to be. The results of such a search indicate a 52 percent success rate;48 percentof the time, the sentences flagged were correctin terms of "comma-be"usage. Errors Flagged: The sentences containing"comma-be"errors which were correctly flagged fall into five categories: 1) Long Subject:The most frequentuse of a commabeforebe occurs when the subjectof the sentence is fairly long, especially when it includes a prepositionalphrase: * To the best of my knowledgeperformingone's work as best as possible, is performingto one's limit. * Rebecca'sWest'sexpressionon her life, is that she wants... * In general,any way of connectingthe dots with no morethan4 straightlines anddo not separate the pencil from the paper,is correct. 2) Subject-Verb: For whatever reasons (perhapsbecause they'repausing), studentswill place a comma between the subjector introductory phraseand the verb even if the subject or phraseis one or two words: * A discovery,is the act... * In second, was a girl... * GertrudeStein, is saying in her passage... 3) What Introductions:Often when students begin their sentenceswith a "what"clause, they add a superfluouscomma: * What I mean by this, is that usually... * Butwhatshe does fear,is that,herupholstery... * What I think responsibilityis, is something. (This is an example of a tricky case, since the commais reallyhelpfulhere.A judgmentcould be madeabouthowto handlecases of "be, be.") This error,however,is less frequentthanthe previous patterns. In this pattern,whichalso is 4) Overrestriction: less frequentthanpatterns1 and 2, studentsplace commasaroundphrasesor clauseswhichshouldn't be separatedoff fromthe noun they modify:
Ill
* I foundthatthreeof the solutions,thatI hadfigured out, were listed as possible solutions... * Bushes and flat stones, for a walkway, were added... * The experience, that I feel related to Eddington's position, was the time that... erroris trapped 5) Questions:Anotherinfrequent by "comma-be"searching,but the errortype is not actually of the "comma-be"variety, but rather,is a comma splice: * If you eat out, where, and also how will that food effect your body, is it good for you? * Whytaketimeto organizeby writingforjustthe first draft, isn't the first draft suppose to tell what should come when? False Alarms: 1) By far, the most frequentlyflagged correct sentences contain some kind of interrupting word, phrase, or clause which shouldbe set off by commas: * The use of languagein solvingproblems,I feel, is important. * The lastof the presentedsolutions,althoughit's not very systematicor probable,is also acceptable with the pencil... * Edison'sinvention, for example, was creating an imitation... * Daedalus,Icarus'father,was alwayscautious. * The back section, which includesthe long thin spindles that rise up to the headrest, was definitely... * This, however, wasn'tvery extensive... One way to preventsuch false positiveswould be to write a "second pass" strategy into the pattern;that is, afterflagging sentencescontaining a commafollowed by be, a "secondpass"is made throughthe set of sentencesflagged in the "first pass." The purposeof the second pass is to look for a designatedvariable, in this case correctlypunctuatedinterrupters,or at least, for certain interrupterswhich should be set off by commas: for example, however, on the other hand, first of all, or a pronounfollowed by a verb. These could then be discardedas correct. The remainingcategories, relative clauses and appositives, would be harderto recognize. At
112
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
present, as noted earlier,this if/then strategyis not available. 2) Questionsin Dialogue:Rarely,but on occasion, the "comma-be"search will call up correctly punctuateddialogue: * First I thought, Is he serious; * Then he said, Are you happy? * And she replied, "Is that your dog?" One way to weed out these false alarmswould be to look for quotationmarksin a second pass, although, as the above examples illustrate,student writerscan'tbe dependedupon to use quotation marks. The other possibility is to find a patternthat works:propernoun or personalpronounfollowedbycertainverbslikesaid orreplied. B. THAT-COMMAERROR
Some pronounspose particularproblemsfor beginning writers who often have formulated quirky rules for punctuating that and which clauses: "a comma after all uses of that."When searching for the pattern that + comma, we flagged errors65% of the time. Errors Flagged: 1) Quotations: Inexperienced writers often place a comma after a "that"which precedes a direct quotation: * Afterthinkingaboutthe solutionwriterb states that, "the solution had to be somethingmuch more deep, somethingunexpected." * From the assignmentthe passage claims that, "two avaitorsDeadalusand Icarusprocuredto themselves wings." 2) Noun Clauses: When using noun clauses beginning with "that," inexperienced writers often place a comma betweenthe "that"and the noun or pronounfollowing it: * I shouldhaveknownthat, thatbabywouldwant to come when I just fell asleep! * I decided that, that year I would make Christmas gifts and save some money. * What I think is kind of neat, though, is that, I watchedlife cease, and on the otherhand... * I imaginethat, my classmates'essays mightbe able to tell me what it really means...
3) Miscellaneous: * But passage d can only be relatedto the other passages, in that, it involved some thoughtand a decision to be made. * This is the most dangerouspartandit provedto be just that, while we wereworkingon the roof my fatherand I decided to move... False Alarms: 1) InterruptingPhraseor Clause: * This coercionhad such a deterringeffect that, afterI hadpassedthe finalexamination,I found the... * They feel that, because they speakEnglishas a native language,they can write it. * Toconclude, I can say that,basedon the papers I have read, satisfactionis a key in the... * I will say that, consideringthe class papers, I have made some interestingdiscoveries. * I found it interestingthat, in these papersanyway, decisions were made and the making of decisions... Once again, a second-pass procedurewould help to eliminatethesefalse positives.The easiest way to cull these would be to searchfor paired commas and then eliminatethose sentencescontaining them as correct. The success of this strategydepends, however, on correct punctuation of interruptingphrasesor clauses, something one can't always count on with inexperienced writers.A less risky strategywould be to search for and eliminatesentencescontainingthese patterns: 1) that + comma + subordinatingconjunction, 2) that + comma + present/pastparticiple, and 3) that + comma + preposition. 2) DemonstrativePronounAfter Participle: * If you enjoyed doing that, then write how you felt doing it. * Seeing that, I thoughtto myself that I was coordinatedenough to do that. To eliminatethese sentencesin a second-pass, one mightsearchforthepatternpresentparticiple + that + comma. A similar strategycould be used to eliminatesentenceslike Withthat, I am moredevelopedthanpast generationsorBecause
COMPUTER DETECTION OF ERRORS IN NATURAL LANGUAGE TEXTS
of that, Brutuswas shot. These strategieswould deal accuratelywith all the sentencesflagged by the originalstringthat + commaexcept for this one: Thisis themostdangerouspart and itproved to be just that, while we were working on the roof myfather and I... PossessiveErrors The most frequenterroramongourpopulation of studentsinvolving possessive form is leaving off an apostrophe, as in my teams starting catcher. An erroneouslyused apostrophe,as in the upholsterer'sproved that is much less frequent.To trapa missing apostropheerror,we can searchfor the patternplural noun plus singular or plural nounor plural nounplus adjectiveplus singular or plural noun. Our test run showed that40 percentof the constructionsflagged were errors.2 Errors Flagged: * For example, when I first got my drivers license, I recall asking to borrowthe car to go out with some... * Since I have no controlover myparents income I had to find a part-timejob. * I believe that there are very small limitations on a persons freedom of choice. * I believe that by doing a wide variety of writing, it can help open the studentsmind and let them write better. * People talk about their neighbors unkept house, the faded, torn jeans he wears, and the... False Alarms: * A few of his relativesshow it as do the friends of his grandmotherand his own friends... * You seem to have the power to alter it if you "play your cards right." * My problems deal with grammar. The program's problem here is that show, right, and deal could be nouns, and the patternmatcherwouldn'tbe able to tell that they'renot used as nouns in the above sentences. Of course, if the phrase My problems deal was read as a noun phrase ratherthan a noun phrase with a verb, then the sentence My problem deal with
113
grammarwould be a sentence fragment.A possible second pass in such instances would be to search the set of flagged sentences for those without a main verb. This strategy, however, requiresparsing capability. Comma Splice Errors A sentenceboundaryerrorfrequentlymarked by teachers is the comma splice, two independent clauses joined by a comma. One particularly common comma splice among the writers in our populationoccurs when the second independent clause begins with then followed by a pronoun. Searchingfor comma + then + pronoun/nountrapsan error60 percentof the time. Errors Flagged: * A secretaryremains a secretaryif he choose to take her along, then she will receive her promotion. * It must be real, then there must be ways of remeding it. * The idea would be to believe you can write an interestingpaper,then it would be more likely to happen. * I moved to the middle dot, and then moved to the center dot and connected the dots above and below, then I proceededto the last row and connected the dots in a similar fashion. False Alarms: All of the sentencesincorrectlyflagged began with a subordinateclause, usually following the "if/then"pattern. * If the creating is interestingto the composer of it, then it is going to affect the creative processes of thinking when you are writing. * AfterI recognizedwhatkind of problemit was, then I broke it up to make it simpler. * If a person has a certainoccupationwhich he knows morally that he doesn'twant, then it is probably not only best for him, but for the futureprogress... * If the directionsdon'tsay anythingaboutsomething you have in mind, then it is legal. To eliminate the false positives culled by this pattern,one might use a second-passprocedure in orderto examinewhateverprecedesthe comma + then, eliminatingthose sentenceswhich con-
114
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
tain a subordinatingconjunctionin that portion of the sentence.In most cases, the programcould look at the first word in the sentence;however, this strategywould miss sentences like: I think it's true, because if you don't have any regards for yourself or things that you experience, then it is going to affect the creative processes of thinkingwhen you are writing. Such sentences may be very rare, in which case it might be worthwhileto use the first-wordstrategy.
Errors Frmgment A. PARTICIPLEFRAGMENTS
Inexperiencedwriters, perhaps fearing overlong sentences, often punctuate participial phrasesseparatelyfromthe sentencestheybelong to: The upholstereropenly spoke herfeelings of things she liked and disliked. Not spendingtoo much time on either the good or the bad. An obvious way to trapthese fragmentsis to specify end stop punctuation + -ing verb form. The success rate for this pattern is 16 percent; 84 percentof the sentencesflagged were false positives. Errors Flagged: Fragments:While some stringscall fortha variety of errors, this particularstring, when flagging actualerrors,calls forthonly the particular kind of fragmenterrorwe are looking for. * Man can changehis physicalchemistry,he can aroundhim. Meaning that, man can influence otherpeopleto live in harmonywitheachother. * ...hands and deciding what is right or wrong enjoy doing the responsibility as much as possible. Helpingotherandhelpingyourselfbe a carringinvolved and well-roundedperson. * Unfortunatelythe teacherand I were never at good termswitheachother.Havingmy attitude. * My problem is obtaining a subject and then supportingit with facts or feelings. Supporting an argumentfor or against with unreputable facts or very strongfeelings. False Alarms: 1) Gerunds:By farmostof thesentencesflagged by the specifiedstringcontained-ing verbsacting as the subject of the sentence:
* Finding topics for my speeches were easy and I liked doing the research. * Writinghas always come in steps. * Looking at what Cousins says about human natureleads me to agree with him. * Coming from a poor family didn't limit my choices of colleges to go to. These sentencesmight be eliminatedon a second pass by a search for a verb following the participle. 2) IntroductoryPhrases:Some of the correct sentencescontainedparticipialphrasesbeforethe main clause of the sentence: * Looking at an emotionalhandicapperson you might wonder how they can overcome what is... * Starting in twelth grade, I was beginning to learn hows to write in my... * Looking at themes, I learnedthe basic format of intro, body, conclusion. Eliminating these sentences as correct is a more difficult matter.Perhapsthe easiest way would be to searchfor a comma on the second pass-easiest but riskiest. As the first example demonstrates, commas aren't always placed where they are required,and thus, this sentence wouldn't be eliminatedin the second pass. Instead, we might search on a second pass again for a verb following the participle.This strategy should eliminateall correctsentences, but in the process, some incorrectsentences would also, unfortunately,be eliminated: Meaning that, I knowthatbeing cooperativeand competitiveone can achieve anythingin society. Or,Feeling that he should do what his parents say. Note: One sentence called up by this string did notbeginwitha verbat all, butwitha pronoun ending in -ing: Nothingis to hard to accomplish if you have the right tools in your mind. This sentencewould, however,be eliminatedin a second pass for a verb. B. SUBORDINATECLAUSE FRAGMENTS
Another common fragment made by inexperienced writers occurs when subordinate clauses-not attached to main clauses-are
COMPUTER DETECTION OF ERRORSIN NATURALLANGUAGE TEXTS
punctuatedas sentences:Frommy understanding of Rebecca's good experience with the young girl, I would agree with her. Because I am the younger generation.Thus far, we have searched
ourdatabaseforthiskindof errorby specifying the patternend stop punctuation + subordinate
conjunction.This patterntrapsthe erroralong witha veryhighpercentage of false positives.3
Errors Flagged:
Fromthe datathatare available,it appears thatinexperienced writersareparticularly prone to considerclausesbeginningwith althoughas completesentences:
115
False Alarms: * ...only there in the beginning because by
waitingto long one of the two will become dominant.Therefore,the limitationson this freedomis basedon time. * In the long run peoplehave the freedomto makedecisions.Howevertheycanbe goodor bad. * Cousins'view of dualismmakesthepointthat thereare some limitationsof free will. How manylimitationsman encounterscan not be determined. * OnceI was achievinggreatresults,I became moreandmoremotivated.
*...year we were requiredto write an essay
* ...had no difficulty passing the classes and
* ... we also went to the same schools until we
Someof thefalsepositivescanbe eliminated by separatingconjunctiveadverbs(moreover, therefore,thus)fromsubordinate conjunctions
whatwe had learnedin those demonstrating two years. Althoughmy paper was well researched.I received a borderlinegrade becauseof grammatical... both graduate. Although he graduated threeyearsbeforeI did.I feel we weredealt...
* ... as a matterof fact, it is hated. But all writers
love to have written.Although,I'm surethis coursewill only enhancemy skills. *
... I regretthatI had not takena writingcourse,
for they were offeredeach term.AlthoughI havelittlewritingexperience.I do feel I aman averagewriter. Lessfrequently, clausesbeginning withotherconnectorsarepunctuated as completesentences:
*
...joke, or physical abuse. As a college stu-
dent, I made decisionsconcerningmy education.If I decidenotto go to class.I as wellas othersfeel... * I assureyou, we are only preparingfor our future.HowI wantit to be. * The menwho werein management wouldgo play golf. Becauseof whatshe wantedto do wasnotveryimportant intheyesof themen. *
...an example thatwas acceptableby her stan-
dardas a "senior"memberof society.Since Westfelt, uponseeingthe upholsterer's facial tastein art, expressions,thatthe upholsterer's coincidedwith her's. * Wherethey weren'table to move awayfrom home because they weren'tmakingenough money.
theonlythingI regretted wasmylackof effort. WhenI transferred to Pitt,I wasshockedtofind out thatmy English...
(while, if, although). The difference between
thesetwogroupsis, of course,essential:a clause beginningwithif, punctuated by itself,is a fragment,whilea similarclausebeginningwiththus is correct.Otherfalsealarmsaremoreproblematic. Any patternlookingfor endof sentence+ subordinate conjunction will trap correct sen-
tencesas well as sentencefragments:sentences like Once I was achieving great results, I became more and more motivatedas well as fragments like Although my paper was well researched.
A solutionherewouldseem to be a secondwhichsearchesfortwo separate passprocedure verbs.This strategyshouldcatchall sentences clauseanda main containingbotha subordinate clause.Itwill alsocatchandeliminateas correct sentencescontainingtwo subordinateclauses and no main clause: While I went to town because I needed to shop. The more complex the
sentence, the more difficult it is to determineits correctnessby lookingfor main verbs.
Discussion There are severalclear limitationsto patternmatchingas an approachto detectingerrorsin
116
HULL,BALL,FOX,LEVIN,AND McCUTCHEN
naturallanguage texts. As is apparentfrom the examples above, it is often difficultto specify a patternbased on the surface featuresof a text that won't call up many correctstructuresalong with incorrectones. This problemis particularly troublesomein the case of our targetpopulation, remedialcollege writers. While more practiced writerscould, on theirown, distinguishbetween true errorsand false positives, it would be risky indeed, froma pedagogicalstandpoint,to expect remedialwritersto do so. A second limitationconcerns the range of errors for which one can realistically expect to write patterns. For example, even errors that seem to be very basic, such as the subject/verb agreementerrorin a sentence like My brothers and my mother plans to give my dad a party, cannot be specified in enough detail to be detected throughpattern-matching. Althougha pattern-matcherthat searchedfor the sequencepossessive pronoun + plural noun + conjunction + possessive pronoun + singular noun + 3rd person singular verb would detect this error,a differentpatternwould be requiredto detect the errorin the boy next door and the girl down the street plans to go swimming,because the linear sequence of partsof speech is quite different.It would, of course, be unrealisticto list all such patternsinvolvingsubjectverbagreementerrors. The advantages of a pattern-matchingapproachare that it is fast, it can be implemented now on a minicomputerand soon on microcomputers, it worksaccuratelywithina limitedrange of errors,and it maps well onto pedagogy.We're interestedin a programthatrunsquickly for the usual reasons: we're assuming that immediate feedback is preferableto delayed feedback and that an interactivecapability is preferableto a non-interactiveor batchmode. Studentscan best be taught to edit their writing for errorsif they can receive instructionon the spot. We imagine a programwhich will scan a student'sessay and locate errors within a matter of seconds; the remainderof the student'stime would then be spent in tutorial,with the programhighlighting the errors of one type and asking students to correct them, offering feedback about the student's success, and, if necessary,narrowingthe
errorregions and providingexplanationsof the errorand how it might be corrected. The pattern-matchingwe have done so far works best when errors(1) consist of at least one constant (the their of their is/are/seem/,etc. and the to of hope to learned);(2) when theircomponent partsarecontiguous;and(3) when they span less than a sentence (a phrase as opposed to a clause). Withinthese parameters,pattern-matching can detect close to 100 percentof the occurrences of a given errorwith a low rate of false positives. However, when an errorcan't be described by means of at least one constant, and when the componentsof an errorare interrupted by otherwords, and when the errorcrosses sentence or clause boundaries,thena greaterpercentage of false positives will result. has a built-inpedagogicaladPattern-matching vantage in that it presentsfeedbackon the presence of one errortype at a time. Studentslearn to detect errorsby becoming sensitized to error patterns. In terms of teaching techniques, this meansthatinstructorsneed to find a way to make a particularkind of error salient to a student. One way to do this is to highlight all of the occurrencesof a particulartype of error in a paper,while ignoringoccurrencesof othertypes of errors. In this manner,a student can focus upon one problem at a time. This highlighting of one type of errorat a time is, of course, exactly what a pattern-matcher can do. This kind of presentationtakes on addedsignificance in light of the fact that most remedialwritersmake a few typesof errorsmanytimesratherthanmanykinds of errorsa few times. For example, a student may have trouble with homophones but never write fragments.Anothermay leave off inflectionalendingsof verbsbutbe troubledonly rarely by subject/verbagreementerrors.Given this tendency for error"specialization,"a couple of pattern-matchingprogramsthat addresscertain errorsmaybe all the instructiona particularstudent needs. When we imagineways thatwe mightimprove pattern-matchingcapabilities, we think mainly in terms of certain search heuristics. Primarily, we see a need for a "secondpass"strategy,where a set of constructionsthat have been flagged as
COMPUTER DETECTION OF ERRORSIN NATURALLANGUAGE TEXTS
possible errorsis searchedagainfor the presence or absence of a word or string. Or we could accomplishthe same thing with a "notvariable," that is, a commandto flag sentences containing certaindesignatedvariablesand not one or more others. We also plan to incorporatea wild card capabilityfor partof a wordor an indeterminate numberof words such that irrelevantbut nevertheless presentunits can be skippedover in the search for specified variables;such a capability is needed to account for the expandabilityand versatilityof our language. When we imagine ways to increasethe pedagogical effectiveness of pattern-matching,we ought not to overlook the role that variedkinds of feedbackcan play.Thus far,we have assumed that the feedbacka studentreceives will consist only of informationon the location of an error: "there are three passages highlighted in your essay; each containsthe same kindof error."This kind of absolutejudgmenton the presenceof an erroris ideal, but it will be possible, of course, only a partof the time, and we don't yet know what part. An alternativeto relying on 100 percent errordetection accuracyor 0 percentfalse alarmsis a sort of feedback which puts part of the responsibilityfor determiningthe presence of an erroron the student. Take,for example, the commonfragmenterror that resultsbecause a studentseparatesa clause beginning with a subordinateconjunctionfrom the rest of a sentence:I don't thinkshe will be able to pass. Because she hasn'tstudiedenough. The currentversionof ourpattern-matcher would flag thaterroralong with these correctsentences: We were hoping to go along. Until she left we thought she'd come, too. In such cases, where 100 percentaccuracycan'tbe obtained,we might consider feedback that contraststhe errorwith the correctsentenceandthenpromptsthe student to check the sentences flagged in his own essay to see if any fit the errorpattern.Thus, the feedback mightread:"Thereare threepassageshighlighted in your essay.These passagesmightcontain an errorlike ExampleA. Or they might be correctlywrittenlike ExampleB. Checkthe passages to see if you've written a fragment, the errorin ExampleA." Althoughthis kind of feedback puts some of the burdenof errordetection
117
on the student,it doesn'task him or her to make the judgment entirely alone. And it may even facilitate the recognitionof errorpatternsby increasing a student'sawarenessof the difference betweena sentencethatcontainsa particularkind of errorand its correctcounterpart. By learningto be analyticalabout language, which involves learningto make finer linguistic distinctions and to better conceptualize the relationshipsbetween linguisticunits, studentsbecome betterable to modify and revise theirown idiosyncraticrules and move towarda truerand firmerapproximationof the rules which produce correcttext. It is possible, then, that in addition to learninghow to correcterrors,a necessarybut by no meanssufficientskill, studentswill further develop more generalized analytical skills, including the abilityto be conscious of andreason about their choices, as well as a more refined andreviunderstandingof hypothesis-generation sion. Computerizededitinginstructionmightnot only aid studentsin errorlocationandcorrection, it might also provideteacherswith moretime to discuss in class other aspects of writing which, to those of us who have internalizedthe rules and conventions, seem far more crucial to the enterprise of writing--and of inquiry in general-than correctness.
FutureDirections Partof our researchin the futurewill be directed at more pattern-matching. We plan to write and test other errorpatternsas well as refine those described in this paper.Also, we will continue to develop our lexicon and to refine our feature list. Andwe'reinterestedin developingheuristics which will lower our false positive rates. Becauseof the limitationsof pattern-matching, we arealso developingotherapproachesfor error detection.We are currentlywritinga naturallanguage parserand a small set of grammarrules and a lexicon to drive it. Ratherthan searching for particularerror patterns, this programwill look for well-formedsentences, culling as errors those that violate its grammarrules. Our plan now is to have separatepattern-matchingand parsing, but we're beginningto speculateabout how the two approachesmight be combinedfor greateraccuracyand efficiency.
HULL, BALL, FOX, LEVIN, AND McCUTCHEN
118
NOTES
REFERENCES
1. We should note that we are developing a separate,more elaborate lexicon for a parser.
Bartholomae, D. (1979). Teaching basic writing: An alternative to basic skills. Journal of Basic Writing,31, 253-269. Cherry,L.L. (1980). PARTS-a systemfor assigning wordclasses to English text. (ComputingScience TechnicalReportNo. 81.) Bell Laboratories,MurrayHill, NJ. Cohen, M.E., & Lanham,R.A. (1984). HOMER:Teachingstyle with a microcomputer.In W. Wresch, (Ed.), The computer in composition instruction. Urbana:NCTE. Heidorn, G.E., Jensen, K., Miller, L.A., & Chodorow, M.S. (1982). The EPISTLE text-critiquing system. IBM Systems Journal, 21, 305-326. Hull, Glynda. (1987). Constructing taxonomies for error (or can straydogs be mermaids?).InT. Enos, (Ed.), A sourcebook for basic writing teachers. New York:Random House. Keifer, K.E., & Smith, C.R. (1983). Textualanalysis with computers: Tests of Bell Laboratories' computer software. Research in the Teachingof English, 17, 201-14. Shaughnessy, M. (1977). Errors and expectations: A guide for the teacher of basic writing. New York: Oxford University Press. Thiesmeyer, T. (1984). Teaching with the text checkers. In Thomas E. Martinez, (Ed.), Collected Essays on The Written Wordand the WordProcessor.Villanova:VillanovaUniversity. Von Blum, R., & Cohen, M.E. (1984). WANDAH:Writing-aid and author's helper. In W. Wresch, (Ed.), The computer in composition instruction. Urbana:NCTE.
2. We should note, however, that the original test run was a more general patternthan we specify above and thus probablyturned up more false positives than the currentpatternwould. We estimate that the correctly flagged percentage should be about 60 percent. 3. We don't have an actual figure for the percentagesbecause of a windowing limitation in our search program.That is, often it is impossible to tell, from the amount of text that is called up, whetherthe sentence beginning with the subordinateconjunction is a fragment:Cousins says man does have the capacity to make decisions in life. Whethera man is born with money, no money, a handicap or healthy... We need to see the rest of the sentence to determine its correctness.