EngCG tagger, Version 2 1 Introduction - Semantic Scholar
Recommend Documents
set of N RAM-discriminators (see Figure 1), where. N is the number of ... discriminator consists of a set of X Boolean RAMs, ..... California, Los Angeles, pp. 1-8.
Louis: CV Mosby. TOWNSEND, MC 1996: Psychiatric mental health nursing: Con- cepts of care. Philadelphia: FA Davis. UNIVERSITY OF JOHANNESBURG ...
Furthermore, the laissez-faire Nash equilibrium outcome is still an admissible solution of the emission game if no further restrictions are imposed. This lead to.
between the provinces in terms of the age-standard- ised death rate due ... 74% speaking Setswana. Research .... Figure 2: List of lifestyle modifications (n=551).
Oct 25, 1996 - salers to make electronic trancsactions between them. We also plan to realize functionality of market in SAGE. In the prototype of accessing ...
driving my car to my mechanic, the mechanic inspecting the car, diagnosing the ... 3There are, of course, any number of unnatural ways to describe this action. ..... Similarly, consider the fact that we all go to sleep at night and wake up each ...
Alan W Black ..... L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classi cation and ... Hirschberg & Litman 93] J. Hirschberg and D. Litman. Empirical studies ...
Sep 18, 1997 - Phoenix Building. University of Bradford ... We view software development as a sequence of transformations which are ap- plied to an initial ...... G. Booch, 2nd ed.,. The Benjamin/Cummings Publishing Company Inc., 1994.
Dec 17, 1998 - Vertical handoff" 13 describes a mobile host roaming across wireless ... Policies on what the best" reachable network is, and when to handoff to it, ..... from a SPAND server, base stations can in turn report that information to.
Python seems to be the go-to language for network people. â More languages and libraries will show up, PHP will probab
Bayesian Networks (BNs) [13], [20], [22] constitute a probabilistic framework for ... used for classifying patients according to their prognosis of survival after one, ...
In BrickBlock, each player controls his brick around a two dimensional playfield ... server was implemented in Java SE while the client was implemented in Java ME. ... For real-time multiplayer games like BrickBlock, the response time has high ...
Thermo-gravitational stratification induced by steam injection at small flow ... narrow plume above steam injection source and spreads into a thin hot layer at the ...
Heat exchangers and high momentum jet mixers installed in the SP are used to chill and mix the SP. As non- .... momentum inertia, history effects in the pool mixing/stratification, etc.). b. General ..... New York, 515â516 (1996). H.S. Kang, C.H. .
Polish. [6]. 11. Quinlan, J.: C4.5: Programms for Machine Learning. Morgan Kaufmann, San. Mateo (1993). 12. Quinlan, R.: Ross Quinlan's Personal Homepage.
9.4% error rate, which makes it signficantly better than our previous stochastic .... âsearch statesâ results from replacing the exact Viterbi search to compute.
... Trigone/LIFL laboratory,. University of Sciences and Technology, Lille, France ... a kind of global brain. [1]. New applications like Blogs, Wikis, Social networks,.
their company's approach to IT-related change in the UK. ...... IEEE Transactions on Software Engineering, Vol. .... Provision of help-desk facilities to support.
Jul 12, 2006 - Individual animal consciousness appears limited to a single giant component of interacting cognitive modules, instanti- ating a shifting, highly ...
A non-exhaustive list includes such otherwise disparate phenomena as .... between gains to specialization and internal and external transaction costs, it ...... Some shocks, such as an advance in information technology, arguably affect the cost.
on the theory that one bad apple spoils the barrel, an expression has the value .... (S7) t ^ x = x. (S8) x _ (:x ^ y) = x _ y. (S9) x ^ y = y ^ x. (S10) x ^ (y _ z) = (x ^ y) ...
Mário Marques da Silva / Américo Correia. Telecommunications Institute & MOD. IST, Torre Norte 11.10, Av. Rovisco Pais, 1049-001 Lisboa,. Portugal.
tems e.g. MRDSM 6 , Mermaid 14 where the global layer has no control ..... Global Metada Manager aggregation association class. Legend: write read name.
Aug 11, 1995 - Obviously, the straightforward solution would be to create a common ..... signatures instead of abstract virtual classes for the type hierarchy: 8 ...
EngCG tagger, Version 2 1 Introduction - Semantic Scholar
Atro Voutilainen. Research Unit for Multilingual Language Technology ..... "An Introduction to the Coptic Art of Egypt" By Azer Bestavros. "Elf and Faerie : The ...
EngCG tagger, Version 2
Atro Voutilainen Research Unit for Multilingual Language Technology Department of General Linguistics FIN-00014 University of Helsinki, Finland [email protected]
This paper1 examines some problems of earlier versions of the EngCG morphological disambiguator and describes solutions to them. An informal evaluation of the new version of the EngCG tagger is reported.
1 Introduction The EngCG (English Constraint Grammar) morphological disambiguator is a reductionistic rule-based tagger based on the Constraint Grammar framework (Karlsson 1990; Karlsson et al (eds.) 1995). It contains three main modules (the following gures concern the previously published, `early' versions of EngCG):
a tokeniser (identi cation of words, punctuation marks and some 8,000 multiword expressions { idioms and modi er{head expressions) a morphological analyser (introduction of morphological ambiguity)
{ a two-level lexicon and morphology (over 90,000 entries) { a rule-based heuristic analyser of unknown words (`guesser')
a rule-based disambiguator: alternative analyses are removed on the basis of context-conditions expressed in some 1,150 constraint rules.
This paper was published in Tom Brondsted and Inger Lytje (eds.), Sprog og Multimedier. Aalborg Universitetsforlag, Aalborg. Note that in the book the paper was misnamed 1
due to an editorial mistake.
1
The sentence Check the cylinder bores for score marks and remove glaze and carbon deposits looks like the following after tokenisation and morphological analysis: "" "check" "check" "check" "check" "check" N NOM SG "" "the" DET CENTRAL ART SG/PL "" "cylinder_bore" N NOM PL "" "for" PREP "for" CS "" "score" N NOM SG/PL "score" V SUBJUNCTIVE VFIN "score" V IMP VFIN "score" V INF "score" V PRES -SG3 VFIN "" "mark" V PRES SG3 VFIN "mark" N NOM PL "" "and" CC "" "remove" N NOM SG "remove" V SUBJUNCTIVE VFIN "remove" V IMP VFIN "remove" V INF "remove" V PRES -SG3 VFIN "" "glaze" N NOM SG "glaze" V SUBJUNCTIVE VFIN "glaze" V IMP VFIN "glaze" V INF "glaze" V PRES -SG3 VFIN "" "and" CC "" "carbon" N NOM SG "" "deposit" V PRES SG3 VFIN "deposit" N NOM PL ""
2
V V V V
SUBJUNCTIVE VFIN IMP VFIN INF PRES -SG3 VFIN
After morphological disambiguation, most ambiguities are resolved: "" "check" V IMP VFIN "" "the" DET CENTRAL ART SG/PL "" "cylinder_bore" N NOM PL "" "for" PREP "" "score" N NOM SG/PL "" "mark" N NOM PL "" "and" CC @CC "" "remove" N NOM SG "remove" V IMP VFIN "" "glaze" N NOM SG "glaze" V PRES -SG3 VFIN "" "and" CC @CC "" "carbon" N NOM SG "" "deposit" V PRES SG3 VFIN "deposit" N NOM PL ""
Some of the most dicult ambiguities may remain unresolved: e.g. here "" retains a noun and a verb reading. Early versions of the EngCG tagger became generally known in the early 1990's because of two main reasons. Firstly, the methodology was dierent from most other systems: unlike mainstream morphological disambiguators (or taggers), the EngCG tagger uses hand-coded rules rather than automatically generated corpus-based language models. Secondly, according to several empirical evaluations with previously unseen texts of a few thousand up to some thirty thousand words (Voutilainen et al. 1992; Voutilainen and Heikkila 1994; Tapanainen and Voutilainen 1994; Voutilainen 1995c), EngCG turned out to be successful in terms of its ambiguity/correctness tradeo: when the average output word had some 1.061.1 alternative analyses (input: 1.7-2.2 analyses/word), about 99.7-99.8% of all output words contained the analysis marked as correct in the benchmark corpus hand-tagged before the evaluation by using the double-blind method (for details, see Voutilainen and Jarvinen 1995). As Voutilainen and 3
Heikkila (1994) show, certain state-of-the-art probabilistic taggers perform their slightly dierent task (i.e. they use dierent tag sets) with a considerably poorer ambiguity/correctness tradeo. The success of the EngCG tagger seems to have renewed the interest in the linguistic approach to tagging. The author is aware of recent or ongoing work on disambiguation grammars of several other languages, e.g. Finnish, Swedish, Danish, Norwegian, German, French (Chanod and Tapanainen 1995), Basque, Turkish (O azer and Kuruoz 1994) and Swahili (Hurskainen 1996).
2 Problems in early versions of EngCG It seems that EngCG is an advance in morphological (or part-of-speech) tagging. The system is currently used in many academic and industrial institutions, and large constraint grammars for several other languages have been developed. However, the early versions of the EngCG tagger also suer from certain shortcomings. Firstly, the system does not resolve all the ambiguities it introduces. Though 1.06-1.1 analyses per word does not seem much when compared to the initial ambiguity (1.7-2.2 analyses per word), it may seem problematic especially to those accustomed to the typically unambiguous output of a probabilistic tagger. Secondly: though the system's correctness rate is reasonably high, it is also obvious that some of the errors (cases where the output word does not contain a correct analysis) could be avoided. Most of the errors fall into two categories.
2.1 Lexical problems One type of error is due to the lexical analyser: none of the analyses it proposes are acceptable in some particular context. Typical cases are listed:
The word was not represented in the lexicon, and the guesser, whose predictions are based on the properties of the word (but not its context) fails to give the correct analysis. For instance, on the basis of its ending "th", the noun "mega-month" is analysed as an adjective: "" "a" DET CENTRAL ART SG "" "mega-month" A ABS "" "at" PREP