Positive Grammar Checking: A Finite State Approach - CiteSeerX

1 downloads 0 Views 114KB Size Report
The correct form of the verb vakna 'stop' should be in the past tense, i.e. ... coincides with the infinitive (and for some verbs also imperative) form of the verb. ..... ical in one context and ungrammatical in another are treated the same. The.
Positive Grammar Checking: A Finite State Approach Sylvana Sofkova Hashemi, Robin Cooper and Robert Andersson {sylvana,cooper,robert}@ling.gu.se Department of Linguistics G¨ oteborg University Box 200, SE-405 30 G¨ oteborg, Sweden

Abstract. This paper reports on the development of a finite state system for finding grammar errors without actually specifying the error. A corpus of Swedish text written by children served as the data. Errors are more frequent and the distribution of the error types is different than for adult writers. Our approach (following Karttunen et al. [9]) for finding errors involves developing automata that represent two “positive” grammars with varying degree of detail and then subtracting the detailed one from the general one. The difference between the automata corresponds to a grammar for errors. We use a robust parsing strategy which first identifies the lexical head of a phrase together with the lexical string which precedes it beginning at the left margin of the phrase. The technique shows good coverage results for agreement and verb selection phenomena. In future, we aim to include also the detection of agreement between subject and predicative complements, word order phenomena and missing sentence boundaries.

1

Introduction

Research and development of grammar checking techniques has been carried out since the 1980’s, mainly for English and also for other languages, e.g. French [6], Dutch [15], Czech [11], Spanish and Greek [4]. In the case of Swedish, the development of grammar checkers started not until the later half of the 1990’s with several independent projects, one of them resulting in the first product release in November 1998 - Grammatifix [2, 3], now part of the Swedish Microsoft Office 2000. Our approach differs from the other Swedish projects in that the grammatical errors are found without direct description of the erroneous patterns. The detection process involves light parsing and subtraction of transducers representing “positive” grammars at different accuracy levels. The difference between the automata corresponds to a grammar for the errors to be found. Karttunen et al. [9] use this technique to find instances of invalid dates and our work is an attempt to apply their approach to a larger domain. The rest of this paper includes a short description of the child data, followed by the system architecture with separate subsections on each module. Then follows evaluation of the system and comparison with the other Swedish systems.

2

The Child Data

The analyses of writing problems is based on a small corpus of 29 812 words (3 373 word types), composed of computer written and hand written essays written by children between 9 and 13 years of age. In general, the text structure of the compositions reveals clearly the influence of spoken language and performance difficulties in spelling, segmentation of words, the use of capitals and punctuation, with fairly wide variation both by individual and age. In total, 260 instances of grammatical errors were found in 134 narratives. The most recurrent grammar problem concerns the omission of finite verb inflection (38.8% of all errors), i.e. when the main finite verb in a clause lacks the appropriate present or past tense endings: ∗ (1) P˚ a natten vakna jag av att brandlarmet tj¨ ot. in the-night wake [untensed] I from that fire-alarm howled – In the night I woke up from that the fire-alarm went off.

The correct form of the verb vakna ‘stop’ should be in the past tense, i.e. vaknade ‘stopped’. This type of error arises from the fact that the writing is highly influenced by spoken language. In spoken Swedish regular weak verbs in the past tense often lack the appropriate ending and the spoken form then coincides with the infinitive (and for some verbs also imperative) form of the verb. Other frequent grammar problems are: extra or missing words (25.4%), here the preposition i ‘in’ is missing: akte skidor. (2) Gunnar var p˚ a semester ∗ norge och ˚ Norway and went skis Gunnar was on vacation – Gunnar was on vacation in Norway and skied.

word choice errors (10%), here the verb att vara lika ‘to be alike’ requires the particle till ‘to’ in combination with the noun phrase s¨ attet ‘the-manner’ and not p˚ a ‘on’ as the writer uses: a s¨ attet (3) vi var v¨ aldigt lika ∗ p˚ we were very like on the-manner – We were very alike in the manners.

errors in noun phrase agreement (5.8%), here the correct form of the noun phrase requires the noun to be definite as in den n¨ armsta handduken ‘the nearest towel’: ∗ (4) jag tar den n¨ armsta handduk och sl¨ anger den i I take the [def ] nearest [def ] towel [indef ] and throw it in vasken the sink – I take the nearest towel and throw it into the sink.

errors in verb chains (2.7%), here the auxiliary verb should be followed by an infinitive, ska bli ‘will become’, but in this case the present tense is used:

n˚ agon riktig brand. (5) Men kom ih˚ ag att det inte ska ∗ blir but remember that it not will becomes[pres] some real fire – But remember that there will not be a real fire.

Other grammar errors occurred less than ten times in the whole corpus, including reference errors, agreement between subject and predicative complement, specificity in nouns, pronoun form, errors in infinitive phrases, word order. Punctuation problems are also included in the analyses. In general, the use of punctuation varies from no usage at all (mostly among the youngest children) to rather sparse marking. In the following example the boundary between the first and second sentence is not marked.: (6) nasse blev arg han gick och la sig med dom andra syskonen. nasse became angry he went and lay himself with the other siblings – Nasse got angry. He went and lay down with the other siblings.

The omission of end of sentence marking is quite obvious problem, corresponding to 35% of all sentences that lack marking of a sentence boundary. But the most frequent punctuation problem concerns the omission of commas, left out in 81% of clauses. The finite verb problem, verb form in verb chains and infinitive phrases and agreement problems in noun phrase are the four types of errors detected by the current system.

3

System Architecture

The framework for detection of grammar errors is built as a network of finite state transducers compiled from regular expressions including operators defined in the Xerox Finite State Tool (XFST) [10]. Each automaton in the network composes with the result of previous application and in principle all the automata can be composed into a single transducer. There are in general two types of transducers in use: one that annotates text in order to select certain segments and one that redefines or refines earlier decisions. Annotations of any kind are handled by transducers defined as finite state markers that add reserved symbols into text and mark out syntactical segments, grammar errors, or other patterns aimed at selections. Finite state filters are used for refinement and/or revision of earlier decisions. The system runs under UNIX in a simple emacs environment used for testing and development of finite state grammars. The environment shows the results of an XFST-process run on the current emacs buffer in a separate buffer. An XFST-mode allows for menus to be used and recompile files in the system. The current system of sequenced finite state transducers is divided in four main modules: the dictionary lookup, the grammar, the parser and the error finder.

input text

Lexicon ~160,000 w

LOOKUP

Broad

PARSER

Filtering

GRAMMAR

Narrow

ERROR FINDER

output text

Fig. 1. The system architecture

3.1

The Lexicon Lookup

The lexicon of around 160, 000 word forms, is built as a finite state transducer, using the Xerox tool Finite State Lexicon Compiler [8]. The lexicon is composed from two resources, the SveLex project under the direction of Daniel Ridings, LexiLogik AB, and the Lexin project [13]. It takes a string and maps inflected surface form to a tag containing part-of-speech and feature information, e. g. applying the transducer to the string kvinna ‘woman’ will return [nn utr sin ind nom]. The morphosyntactic tags follow directly the relevant string or token. More than one tag can be attached to a string, since no contextual information is taken into account. The morphosyntactic information in the tags is further used in the grammars of the system. The set of tags follows the Stockholm Ume˚ a Corpusproject conventions [7], including 23 category classes and 29 feature classes, that was extended with 3 additional categories. Below is an example of lookup on the example sentence in (5): (7) Men[kn] kom[qmvb prt akt][vb prt akt] ih˚ ag[ab][pl] att[sn][ie]det[pn neu sin def sub/obj][dt neu sin def] inte[ab] ska[vb prs akt][mvb prs akt] blir[vb prs akt] n˚ agon[dt utr sin ind][pn utr sin ind sub/obj] riktig[jj pos utr sin ind nom]brand[nn utr sin ind nom]

3.2

The Grammar

The grammar module includes two grammar sets with (positive) rules reflecting the grammatical structure of Swedish, differing in the level of detail. The broad grammar is especially designed to handle text with ungrammaticalities and the linguistic descriptions are less accurate accepting both valid and invalid

patterns. The narrow grammar is fine and accurate and accepts only the grammatical segments. For example, the regular expression in (8) belongs to the broad grammar set and recognizes potential verb clusters (VC) (both grammatical and ungrammatical) as a pattern consisting of a sequence of two or three verbs in combination with (zero or more) adverbs (Adv∗). (8) define VC

[Verb Adv* Verb (Verb)];

This automaton accepts all the verb cluster examples in (9), including the ungrammatical instance (9c) (marked by an asterix ‘∗’), where a finite verb follows a (finite) auxiliary verb. (9) a. kan inte springa ‘can not run’ b. skulle ha sprungit ‘would have run [sup]’ c. ∗ ska blir ‘will be [pres]’

Corresponding rules in the narrow grammar set represented by the regular expressions in (10) take into account the internal structure of a verb cluster and define the grammar of modal auxiliary verbs (Mod) followed by (zero or more) adverb(s) (Adv∗), and either a verb in infinitive form (VerbInf) as in (10a), or a temporal verb in infinitive (PerfInf) and a verb in supine form (VerbSup), as in (10b). These rules thus accept only the grammatical segments in (9) and will not include example (9c). The actual grammar of grammatical verb clusters is a little bit more complex. (10) a. define VC1 b. define VC2

3.3

[Mod Adv* VerbInf]; [Mod Adv* PerfInf VerbSup];

Parsing and Ambiguity Resolution

The various kinds of constituents are marked out in a text using a lexical-prefixfirst method, i.e. parsing first from left margin of a phrase to the head and then extending the phrase by adding on complements. The actual parsing (based on the broad grammar definitions) is incremental in a similar fashion as the methods described in [1], where the output from one layer serves as input to the next, building on the segments. The system recognizes the head phrases in certain order in the first phase (verbal head, prepositional head, adjective phrase) and then applies the second phase in the reverse order and extends the phrases with complements (noun phrase, prepositional phrase, verb phrase). Reordering rules used in parsing allows us to resolve certain ambiguities. For example, marking verbal heads before noun phrases will prefer a verb phrase interpretation of a string over a noun phrase interpretation and avoid merging constituents of verbal heads into noun phrases and yielding noun phrases with too-wide range. For instance, marking first the sentence in (11) for noun phrases will interpret the pronoun De ‘they’ as a determiner and the verb s˚ ag ‘saw’, that is exactly

as in English homonymous with the noun ‘saw’, as a noun and merges these two constituents to a noun phrase as shown in (12). De s˚ ag will subsequently be marked as ungrammatical, since a number feature mismatch occurs between the plural De ‘they’ and singular s˚ ag ‘saw’. (11) De s˚ ag ledsna ut They looked sad out – They seemed sad. (12) De s˚ ag ledsna ut .

Composing the marking transducers by first marking the verbal head and then the noun phrase will instead yield the more correct parse. Although the alternative of the verb being parsed as verbal head or a noun remains, the pronoun is now marked correctly as a separate noun phrase and not merged together with the main verb into a noun phrase: (13) De s˚ ag ledsna ut .

At this stage, the output may be further refined and/or revised by application of filtering transducers. Earlier parsing decisions depending on lexical ambiguity are resolved (e.g. adjectives parsed as verbs) and phrases extended (e.g. with postnominal modifiers). Other structural ambiguities, such as verb coordinations or clausal modifiers on nouns, are also taken care of. 3.4

Error Detection

The error finder is a separate module in the system which means that the grammar and parser could potentially be used directly in a different application. The nets of this module correspond to the difference between the two grammars, broad and narrow. By subtracting the narrow grammar from the broad grammar we create machines that will find ungrammatical phrases in a text. For example, the regular expression in (14) identifies verb clusters that violate the narrow grammar of modal verb clusters (VC1 or VC2, defined in (10)) by subtracting these rules from the more general (overgenerating) rule in the broad grammar (VC, defined in (8)) within the boundaries of a verb cluster (‘’, ‘’), that have been previously marked out in the parsing stage. (14)

define VCerror

[ ""

[VC - [VC1 | VC2]]

"" ];

By application of a marking transducer in (15), the found error segment is annotated directly in the text as in example (16). (15) define markVCerror [VCerror -> "" ... ""]; (16) Men kom ih˚ ag att det inte ska blir n˚ agon riktig brand

4

The System Performance

The implemented error detector cannot at present be considerred as a fully developed grammar checking tool, but still even with its restricted lexicon and small grammar the results are promising. So far the technique was used to detect agreement errors in noun phrases, selection of finite and non-finite verb forms in main and subordinate clauses and infinitival complements. The implementation proceeded in two steps. In the first phase we devoted all effort to detection of the grammar errors, working mostly with the errors and not paying much attention to the text as a whole. The second phase involved blocking of the resultant false alarms found in the first stage. In Table 1 we show the final results of error detection in the training corpus of Child Data. There were altogether 14 agreement errors in noun phrase, 102 errors in the form of finite verb, 7 errors in the verbform after an auxiliary verb (Vaux ) and 4 errors in verbs after infinitive marker (IM). Table 1. Error detection in Child Data: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 14 14 15 0 100% 48% Finite verb form 102 93 94 9 91% 50% Verb form after Vaux 7 6 32 1 86% 16% Verb form after IM 4 4 4 0 100% 50% Total 127 117 145 10 92% 45%

The error detection of the system is quite high with a recall rate of 92% on text that the system was designed to handle. Only ten errors were not detected, nine of them in the finite verb form and one in the verb form after an auxiliary verb. The precision rate of 45% is not that satisfactory, but comparable with the other Swedish grammar checking tools that report precision rates on adult texts between 53% and 77%. The recall values are between 35% and 83%. There are in general two kinds of false alarms that occur: either due to the ambiguity of constituents or the “wideness” of the parse, i.e. too many constituents are included when applying the longest-match strategy. The following example shows an ambiguous parse: (17) Linda, brukade ofta vara i stallet. Linda used often to be in the-stable – Linda used to be often in the stable. (18) Linda , brukade ofta vara i stallet .

Here ‘Linda’ is a proper noun and parsed as a noun phrase, but it also is a (non-finite) verb in infinitive (‘to wrap’) and will then be marked as a verb

phrase. Then the error finder marks this phrase as a finite verb error, due to the violation of this constraint in the narrow grammar that does not allow any non-finite verbs without the presence of a preceding (finite) auxiliary verb or infinitival marker att ‘to’. Some of the effects of the longest-match strategy can be blocked in the parsing stage, as mentioned above, but some remain as in the following example: (19) till vilket man kunde ringa ... to which one could call – to which one could call ... (20) till vilket man kunde ringa

Here the pronoun vilket ‘which’ in neuter gender is merged together with the noun man ‘man’ in common gender to a noun phrase, causing a gender missmatch. There are also cases where other types of errors are recognized by the detector as side-effects of the error detection. Some split words and misspellings were recognized and diagnosed as agreement errors. For instance in the example (21) the noun o ¨gonblick ‘moment’ is split and the o ¨gon ‘eyes’ does not agree in number with the preceding singular determiner and adjective. (21) a. F¨ or ett kort o ¨gon blick trodde jag ... for a short eye blinking thought I ... – For a short moment I thought ... b. F¨ or ett kort o ¨gon blick trodde jag ...

Others as errors in verbal group. In the example (22) fot ‘foot’ in the split word fotsteg ‘footsteps’ is interpreted as a noun and is homonymous with the (finite) verb steg ‘stepped’ and causes an error marking in the verb form after an infinitival verb. (22) a. Jag h¨ or fot steg fr˚ an trappa I hear foot steps from stairs – I hear footsteps from the stairs. b. Jag h¨ or fot steg fr˚ an trappa

Further, sentence boundaries that are not marked out can be detected. The diagnosis is always connected to the verbal group, where verbs over sentence boundaries are merged together. Mostly it is a question of more than one finite verb in a row as in the example below: (23) a. I h˚ alet som pojken hade hittat fanns en mullvad. in the-hole that the boy had found was a mole – in the hole the boy had found was a mole. b. I h˚ alet som pojken hade hittat fanns en mullvad.

Here the verb cluster boundary is too wide, including both a verb cluster and a finite verb belonging to the next clause. Preliminary results on arbitrary text also show some promising results. In Table 2 we show the system evaluation on a small literal text of 1 070 words (after updating the lexicon with words necessary to the text). We found 16 noun phrase agreement errors, 4 errors in the form of finite verb and 1 error in the verbform after an auxiliary verb in the text. The error detector found all the verb form errors and most of the agreement errors, ending in a recall value of 81%. False alarms occurred also only in the agreement errors, resulting in an overall precision of 85%. Table 2. Error detection in Other Text: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 16 12 3 4 75% 80% Finite verb form 4 4 0 0 100% 100% Verb form after Vaux 1 1 0 0 100% 100% Total 21 17 3 4 81% 85%

5

Test with Other Tools

Three other Swedish grammar checkers: one commercial Grammatifix [2], and two prototypes Granska[5] and Scarrie[12], have been tested on the child data. Here we give the results of detection for the error types implemented in this system (Tables 3, 4 and 5). These tools are designed to detect errors in text different from the nature of the child data and thus not surprisingly the accuracy rates are in overall low. The total recall rate for these error types is between 10% and 19% and precision varies between 13% to 39%. Errors in noun phrases seem to be better covered than verb errors. In the case of Grammatifix (Table 3), errors in verbs are not covered at all. Half of the noun phrase errors were identified and only five errors in the finite verb form. Many false alarms result in a precision rate below 50%. Granska (Table 4) included all four error types and detected at most half of the errors for three of these types. Only seven instances of errors in finite verb form were identified. The number of false alarms varies among error types. Errors in verb form after infinitive marker were not detected by Scarrie (Table 5). Errors in noun phrase were the best detected type. The tool performed best of all three systems in the overall detection of errors with a recall rate of 19%. On the other hand, many false alarms occurred (154) and the precision rate of 13% was the lowest. Agreement errors in noun phrases is the error type best covered by these tools. All three managed to detect at least half of them. Errors in the finite verb form obtained the worse results. Granska performed best and detected at least half of the errors in three error types with an overall precision of 39%.

The detection performance of these three tools on child data is in general half that good in comparison to our detector and the fact that the error type with worst coverage (finite verbs) is the one most frequent among children indicates clearly the need for specialized grammar checking tools for children. The overall precision rate of all the tools including our system lies below 50%. Table 3. Grammatifix error detection in Child Data: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 14 7 20 7 50% 26% Finite verb form 102 5 6 97 5% 45% Verb form after Vaux 7 Verb form after IM 4 Total 127 12 26 104 10% 32%

Table 4. Granska’s error detection in Child Data: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M Agreement in NP 14 7 25 7 Finite verb form 102 7 1 95 Verb form after Vaux 7 4 5 3 Verb form after IM 4 2 0 2 Total 127 20 31 107

R P 50% 22% 7% 88% 57% 44% 50% 100% 16% 39%

Table 5. Scarrie’s error detection in Child Data: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M Agreement in NP 14 8 133 6 Finite verb form 102 15 13 87 Verb form after Vaux 7 1 7 6 Verb form after IM 4 0 1 4 Total 127 24 154 103

R P 57% 6% 15% 54% 14% 13% 0% 0% 19% 13%

These three tools were also tested on the small literal text, that reflects more the text type these tools are designed for. The results given in Tables 6 – 8 show that all three tools had difficulties to detect the verb form errors, whereas most of the errors in noun phrase agreement were found. This means that the overal coverage of the systems is slightly lower than for our detector (see Table 2) lying between 52% and 62%.

Table 6. Grammatifix error detection in Other Text: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 16 12 0 4 75% 100% Finite verb form 4 1 0 3 25% 100% Verb form after Vaux 1 0 0 1 0% 0% Total 21 13 0 8 62% 100% Table 7. Granska’s error detection in Other Text: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 16 12 0 4 75% 100% Finite verb form 4 0 1 4 0% 0% Verb form after Vaux 1 1 0 0 100% 100% Total 21 13 1 8 62% 93% Table 8. Scarrie’s error detection in Other Text: correct alarms (C ), false alarms (F ), missed errors (M ), recall (R), precision (P ) Error type No. Errors C F M R P Agreement in NP 16 10 4 6 63% 71% Finite verb form 4 1 0 3 25% 100% Verb form after Vaux 1 0 0 1 0% Total 21 11 4 10 52% 73%

6

Conclusion

The simple finite state technique of subtraction presented in this paper, has the advantage that the grammars one needs to write to find errors are always positive grammars rather than grammars written to find specific errors. Thus, covering the valid rules of language means that the rule sets remain quite small and practically no prediction of errors is necessary. The approach aimed further at minimal information loss in order to be able to handle text containing errors. The degree of ambiguity is maximal at the lexical level, where we choose to attach all lexical tags to strings. At higher levels, structural ambiguity is treated by parsing order, grammar extension and some other heuristics. There is an essential problem of ambiguity resolution on complement decisions that remains to be solved. Sequences of words grammatical in one context and ungrammatical in another are treated the same. The system overinterprets and gives rise to false alarms, mostly due the application of longest-match, but more seriously information indicating an error may be filtered out by erroneous segmentation and errors overlooked. A ‘higher’ mechanism is needed in order to solve these problems that takes into consideration the complement distribution and solves these structural dependencies.

The linguistic accuracy of the system is comparable to other Swedish grammar checking tools, that actually performed worse on the child data. The low performance of the Swedish tools on child data motivates clearly the need for adaptation of grammar checking techniques to children. The other tools obtained in general much lower recall values and although the error type of particular error was defined, the systems had difficulties to identify the errors, probably due problems to handle such a disrupted structure with many adjoined sentences and high error frequency. Further, the robustness and modularity of this system makes it possible to perform both error detection and diagnostics and that the grammars can be reused for other applications that do not necessarily have anything to do with error detection, e. g. for educational purposes, speech recognition, and for other users such as dyslectics, aphasics, deaf and foreign speakers.

References 1. Ait-Mohtar, Salah and Chanod, Jean-Pierre (1997) Incremental Finite-State Parsing, In ANLP’97, Washington, pp. 72-79. 2. Arppe, Antti (2000) Developing a Grammar Checker for Swedish. The 12th Nordic Conference of Computational Linguistics, 1999. 3. Birn, Juhani (2000) Detecting Grammar Errors with Lingsoft’s Swedish Grammar Checker. The 12th Nordic Conference of Computational Linguistics, 1999. 4. Bustamente, Flora Ram´irez and Le´ on, Fernando S´ anchez (1996) GramCheck: A Grammar and Style Checker. In the 16th International Conference on Computational Linguistics, Copenhagen pp. 175-181. 5. Carlberger, Johan and Domeij, Rickard and Kann, Viggo and Knutsson Ola (2000) A Swedish Grammar Checker. Association for Computational Linguistics. 6. Chanod, Jean-Pierre (1996) A Broad-Coverage French Grammar Checker: Some Underlying Principles. In the Sixth International Conference on Symbolic and Logical Computing, Dakota State University Madison, South Dakota. 7. Ejerhed, E. and K¨ allgren, G. and Wennstedt, O. and ˚ Astr¨ om, M. (1992) The Linguistic Annotation System of the Stockholm-Ume˚ a Corpus Project. Report 33. University of Ume˚ a, Department of Linguistics. 8. Karttunen, Lauri (1993) Finite State Lexicon Compiler. Technical Report ISTLNLTT-1993-04-02, Xerox Palo Alto Research Center, Palo Alto, California. 9. Karttunen, Lauri, Chanod, Jean-Pierre, Grefenstette, Gregory and Schiller, Anne (1996) Regular Expressions for Language Engineering, In Natural Language Engineering 2 (4) 305-328. 10. Karttunen, Lauri and Ga´ al, Tam´ as and Kempe, Andr´e (1997) Xerox Finite-State Tool. Xerox Research Centre Europe, Grenoble, Maylan, France. 11. Kirschner, Zdenek (1994) CZECKER - a Maquette Grammar-Checker for Czech. In The Prague Bulletin of Mathematical Linguistics 62, Praha: Universita Karlova. 12. S˚ agvall Hein, Anna (1998) A Chart-Based Framework for Grammar Checking: Initial Studies. The 11th Nordic Conference of Computational Linguistics, 1998. 13. Skolverket (1992) LEXIN: sprklexikon fr invandrare. Nordsteds Frlag. 14. Tapanainen, Pasi (1997) Applying a Finite State Intersection Grammar. In Roche, Emmanuel and Schabes, Yves (eds.) Finite State Language Processing, MIT Press. 15. Vosse, Theodorus Gregorius (1994) The Word Connection. Grammar-based Spelling Error Correction in Dutch. Enschede: Neslia Paniculata.

Suggest Documents