Named Entity Recognition for the Indonesian Language

3 downloads 0 Views 72KB Size Report
call the new method InNER for Indonesian Named Entity Recognition. InNER is based on a ... For example in the following text in the Indonesian language we.
Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach Indra Budi1, Stéphane Bressan 2, Gatot Wahyudi1, Zainal A. Hasibuan 1 and Bobby A.A. Nazief 1 1

Faculty of Computer Science University of Indonesia {indra, zhasibua, nazief}@cs.ui.ac.id, [email protected] 2 School of Computing, National University of Singapore [email protected]

Abstract - We present a novel named entity recognition approach for the Indonesian language. We call the new method InNER for Indonesian Named Entity Recognition. InNER is based on a set of rules capturing the contextual, morphological, and part of speech knowledge necessary in the process of recognizing named entities in Indonesian texts. The InNER strategy is one of knowledge engineering: the domain and language specific rules are designed by expert knowledge engineers. After showing in our previous work that mined association rules can effectively recognize named entities and outperform maximum entropy methods, we needed to evaluate the potential for improvement to the rule based approach when expert crafted knowledge is used. The results are conclusive: the InNER method yields recall and precision of up to 63.43% and 71.84%, respectively. Thus, it significantly outperforms not only maximum entropy methods but also the association rule based method we had previously designed.

1. Introduction Named entity recognition is the task of recognizing and classifying terms and phrases as named entity from free text [13]. It is a fundamental building block of many, if not all, textual information extraction strategies. It is going to be a crucial component of most tools for the construction of a semantic Web layer on top of the existing wealth of textual information available on the World Wide Web. We could also apply information extraction into scientific documents. Some interesting entities could be extracted from the document. For example, we could extract the statement that indicates the research’s problem, objective, method and result from the abstract of a technical paper. Typically useful named entity classes are names of person, locations, organizations, money amounts, percentages and dates. Named entity recognition is the first step towards the extraction of structured information from unstructured texts. For example in the following text in the Indonesian language we may recognize `Habibie’ and `Amien Rais’ as names of person and `Jakarta’ as a location. Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin. (President Habibie met Prof. Amien Rais in Jakarta yesterday)

The recognition task is usually leveraging features of the terms such as their morphology, part of speech, and their classification and associations in thesauri and dictionaries. It is also leveraging the context in which terms are found such as neighboring terms and structural elements of the syntactical units, for instances propositions, sentences, and paragraphs. Clearly the characteristic combinations of the above features differ significantly from one language to another. Techniques developed for the American language need to be adapted to non-English linguistic peculiarities. It is also possible that entirely new and specific techniques need to be designed. This research is part of a larger project aiming at the design and development of tools and techniques for the Indonesian Web. We here present a novel named entity recognition approach for the Indonesian language. We call the new method InNER for Indonesian Named Entity Recognition. InNER is based on a set of rules capturing the contextual, morphological, and part of speech knowledge necessary in the process of recognizing named entities in Indonesian texts. The InNER strategy is one of knowledge engineering: the domain and language specific rules are designed by an expert knowledge engineer [1]. After showing in our previous work [6] that mined association rules can effectively recognize named entities and outperform maximum entropy methods,

we now need to evaluate the potential for improvement to the rule based approach when expert crafted knowledge is added. The rest of this paper organized as follows. We present and discuss some background and related works on named entity recognition in the next section. In the third section, we present the InNER strategy and its implementation. We then present the results of an evaluation of its performance in section 4. We empirically evaluate the effectiveness of the InNER method and compare it with one of the methods that we had previously showed to outperform existing methods when applied to the Indonesian language. We present conclusions on section 5 and finally, we give an outline of the directions for future work in section 6.

2. Background and Related Works There are two common families of approaches to named entity recognition, knowledge engineering approaches and machine learning approaches [1]. Knowledge engineering approaches are expertcrafted instances of generic models and techniques to recognize named entity in the text. Such approaches are typically rule-based. In a rule-based approach the expert design rules to be used by a generic inference engine. The rule syntax allows the expression of grammatical, morphological and contextual patterns. The rules can also include dictionary and thesauri references. For example, the following rule contributes to the recognition of persons. If a proper noun is preceded by a title then the proper noun is name of person

In [2], the authors introduce the FASTUS system whose rules are using regular expressions. In [14], the authors built knowledge representation that consist on rules to identify name entities based on geographical knowledge, common person names and common organization names. In [16], the authors use a semantic network consisting of some 100.000 nodes and hold information such as concepts hierarchies, lexical information, and prototypical events. All the above works are applied to the English language. In machine learning approaches, a generic computer program learns to recognize named entities with or without training and feedback. Very general machine learning models exist that do not necessitate the mobilization of expensive linguistic expert knowledge and resources. For instance the Nymble system [3] uses a hidden Markov model. In both [2] and [7], the authors present an approach that uses the now popular maximum entropy. These models can make use of different features. For instance, in [4], the authors use morphological features, paragraphs and a dictionary. In [7], the authors combine local and global features. As mentioned in [10], the knowledge engineering and the machine learning approach are not incompatible. For instance, the system presented in [15] combines rules and maximum entropy to recognize named entity in English texts. Our first attempt to design a named entity recognition system for the Indonesian language was of the machine learning family [6]. We mined a set of association rules from a training corpus in which we considered the sequences of terms annotated with their features and name class. Numerous other authors have worked on named entity recognition for non-English languages. Some have made their results available. In [18], the authors propose a named entity recognition approach based on decision tree for the Japanese language. In [12], the authors proposed a rule based approach for financial texts in the Greek language. In [19], the authors use a combination of lexical, contextual and morphological feature to recognize named entities for the Turkish language. In [9], the authors present an approach combining rules with machine learning for the Swedish language.

3. Name Entity Recognition The approach we propose in this paper is based on the rules generated by expert engineers for the recognition of named entities. The rules are designed and verified by educated native speakers after analysis of a training corpus. A rule combines contextual, morphological, and part of speech features to assign a class to terms and groups of terms in the text. The class of a named entity can be directly recognized from its context. For example, in a sentence comprising a title particle such as “Prof.” followed by proper name, the proper name is the name of a person. For example, in the sentence: “Prof. Yusuf berkunjung ke Jakarta”. The term “Yusuf” is recognized

as the name of a person because it is a proper name preceded by a term which belongs to contextual information (“Prof.”). In the above example, we can directly infer that the term ‘Yusuf’ is a proper name because of its spelling with an upper case in the beginning. The format and nature of the characters forming terms give some basic indications: lower and upper cases, signs, diacritics, and digits. In the above example, the term ‘Yusuf’ belongs to the morphological proper name. It means ‘Yusuf’ is a proper name morphologically. We assume the availability of the necessary tools for the lexical analysis and part of speech tagging of the text. The knowledge engineering task for the expert is the design of rules identifying the chosen named entity classes based on contextual, morphological, and part of speech information as explained above. As we have seen in the introduction, an example rule reads as follows. If a proper noun is preceded by a title then the proper noun is the name of person If a proper noun is preceded by ‘di’ then the proper noun is the name of a location

The InNER system processes the input text sentence by sentence. To each input sentence, the corresponding output is a sentence in which the text corresponding to a recognized named entity is marked-up with XML tags following the widely used framework introduced in [8]. For instance, the processing of the example sentence “Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin” outputs the following XML tagged text: Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin.

The InNER system has four main processes, as depicted on Fig. 1: tokenization, feature assignment, rule assignment and name tagging. Input

Tokenization

Feature Assignment

Features Dictionary

Rule Assignment Rules Dictionary Name Tagging

Output

Fig. 1. The InNER architecture

The purpose of the tokenization process is to identify tokens (words, punctuation and other units of text such as numbers etc.) from the input sentence. Tokens are labeled with their kinds. For example, table 4; illustrates the results of the tokenization of the sentence below in which the tokenizer identifies words (WORD), punctuation (OPUNC, EPUNC) and numbers (NUM). Ketua MPR, Amien Rais pergi ke Bandung kemarin (24/4). (Chief of MPR, Amien Rais went to Bandung yesterday (24/4).)

Table 1. List of contextual features Feature Name PPRE PMID PSUF PTIT OPRE OSUF OPOS OCON LPRE LSUF LLDR POLP LOPP DAY MONTH

Explanation Person prefix Person middle Person suffix Person title Organization prefix Organization suffix Position in organization Other organization contextual Location prefix Location suffix Location leader Prepositions that’s usually followed by person name Prepositions that’s usually followed by location name Day Month

Example Dr., Pak, K.H., bin, van SKom, SH Menristek, Mendagri PT., Universitas Ltd., Company Ketua Muktamar, Rakernas Kota, Propinsi Utara, City Gubernur, Walikota oleh, untuk di, ke, dari Senin, Sabtu April, Mei

The feature assignment component labels the terms with their features, the basic contextual features (for instance identifying preposition, days, or titles), the morphological features, as well as the part of speech classes. The identification of contextual features uses the context dictionary. The analysis of the morphological features parses the token. The identification of the part of speech classes is a complex process. We use part of speech tagging technology developed by our team [5, 17]. Contextual feature example is illustrated in the Table 1, while Table 2 and 3 illustrate and provide examples for morphological and part of speech features, respectively. As an example, Table 4 illustrates the result of feature assignment process on the above sentence. Table 2. List of morphological features Feature Name TitleCase UpperCase LowerCase MixedCase CapStart CharDigit Digit DigitSlash Numeric NumStr Roman TimeForm

Explanation Begin with uppercase letter followed by all lowercase letter All uppercase letter All lowercase letter Uppercase and lowercase letter Begin with uppercase letter Letter and number All number Number with slash Number with dot or comma Number in word Roman number Number in time format

Example and

Soedirman KPU menuntut LeIP LeIP, Muhammad P3K 2004 17/5 20,5; 17.500,00 satu, tujuh, lima VII, XI 17:05, 19.30

Table 3. List of part-of-speech features Feature Name ART ADJ ADV AUX C DEF NOUN NOUNP NUM MODAL OOV PAR PREP PRO VACT VPAS VERB

Explanation Article Adjective Adverb Auxiliary verb Conjunction Definition Noun Personal noun Number Modal Out of dictionary Particle Preposition Pronominal Active verb Passive verb Verb

Example si, sang indah, baik telah, kemarin harus dan, atau, lalu merupakan rumah, gedung ayah, ibu satu, dua akan kah, pun di, ke, dari saya, beliau menuduh dituduh pergi, tidur

The rule assignment component selects the candidate rules for each identified token in the text. The actual orchestration and triggering of the rules occur in the name tagging component. Table 4. Result of tokenization and feature assigment processes Token string Ketua MPR , Amien Rais pergi ke Bandung kemarin ( 24/4 ) .

Token kind WORD WORD OPUNC WORD WORD WORD WORD WORD WORD SPUNC NUM EPUNC OPUNC

Contextual features OPOS

Morphological features TitleCase, CapStart UpperCase, CapStart

Part-of-speech features NOUN OOV

TitleCase, CapStart TitleCase, CapStart LowerCase LowerCase TitleCase, CapStart LowerCase

OOV Noun VERB PREP NOUN NOUN, ADV

DigitSlash

The rules in the InNER system capture the typical patterns of features characterizing the various named entity classes. The left hand side of a rule is the pattern. The right hand side of a rule is the identified named entity class. The following is an example of a rule. IF

Token[i].Kind="WORD" and Token[i].OPOS and Token[i+1].Kind=”WORD” and Token[i+1].UpperCase and Token[i+1].OOV THEN Token[i+1].NE = "ORGANIZATION"

The rule above would recognize token “MPR” as an organization. Table 5 shows the result of the rule assignment process for the example sentence. The empty string indicates that the particular term or phrase has not been classified. In the example, “MPR”, the Indonesian parliament, is identified as an organization. “Amien Rais”, an Indonesian politician, is identified as a person. “Bandung”, an Indonesian city is identified as a location. There is no such order of rules, we arrange the rules randomly without following certain mechanism. We chose the first match between each token and the rules, then we applied the rules to that token. Table 5. Result of rule assigment process Token Ketua MPR , Amien Rais Pergi Ke Bandung Kemarin ( 24/4 )

Type of Named Entity “” ORGANIZATION “” PERSON PERSON “” “” LOCATION “” “” “” “”

The last process in InNER system is the XML tagging of the original sentence. The name tagging arranges some tokens that are identified as same class and position consecutively in the text into one single name entity class. The syntax of the tags follows mechanism in [8]. The example below is the output of the system for our running example. Term “Amien” and “Rais” are positioned consecutively and identified as same class PERSON, tagged together as one single name entity class (PERSON). Ketua MPR, Amien Rais pergi ke Bandung kemarin (24/4).

4. Performance Analysis We now empirically analyze the performance of the newly proposed approach. This analysis is done in comparison with our previous association rule based approach. We recall that in [6] we have shown that the latter outperforms existing method (maximum entropy) for the named entity recognition task for the Indonesian language. 4.1. Experimental Setup and Metrics For this evaluation we consider three named entity classes: names of person, locations and organizations. Our experts are graduate students who are native speakers of Indonesian language. The observation corpus is composed of a set of articles from the online versions of two Indonesian newspaper Kompas (www.kompas.com) and Republika (www.republika.co.id). The observation corpus consists of 802 sentences. It comprises 559 names of person, 853 names of organization, and 418 names of location. Our testing corpus consists of 1.258 articles from the same sources. It includes 801 names of person, 1.031 names of organization, and 297 names of location. Both the observation and testing corpora have been independently tagged by native speakers based on guideline provided in [20]. We wish to measure the effectiveness of the approaches empirically evaluated. For this we use the definitions of the recall, precision and F-Measure metrics proposed by MUC (Message Understanding Conference) in [11]. These definitions use the following measurements. − Correct: number of correct recognitions performed by the system − Partial: number of partial correct recognitions performed by the system. for example: the phrase "Amien Rais " should be recognize as a person but the system just recognized "Amien" as PERSON or just recognized "Rais" as a PERSON. − Possible: number of named entities in the text as manually tagged for the training. − Actual: number of tagged named entities generated by the system. They may be correct, partially correct, or incorrect (we call incorrect tagged terms which are neither correct nor partially correct) Based on the values above, the system performance parameters can be calculated in term of recall, precision and F-Measure using following formula. Recall =

Correct + 0.5 * Partial Possible

Precision =

F − Measure =

(1)

Correct + 0.5 * Partial Actual

(2)

Recall * Precision 0.5 * ( Recall + Precision)

(3)

Let us illustrate the above definition with our example sentence manually tagged as given in section 1: Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin.

Namely, there are three named entities: Habibie, Amien Rais and Jakarta of respective class person, person and location. Let us now assume that the same sentence is tagged by the system as follows. Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin.

Namely, the system identifies four named entities: Habibie, Amien, Rais and Jakarta of respective class person, person, organization and location. The first and fourth terms are correctly tagged. The

second term ‘Amien’ is partially tagged as it should be ‘Amien Rais’. The third term ‘Rais’ is wrongly tagged. Therefore, recall, precision, and F-Measure for this sentence alone are computed as follow. Recall = (2 + 0.5)/3 = 2.5/3 = 83.33% Precision = (2 + 0.5)/4 = 2.5/4 = 62.50% F-Measure = 71.43% 4.2. Results and Analysis Our experts engineered a total of 100 rules by examining observation document. We have classified the rules in four categories depending on the combination of features that they are using. We call contextual rules, rules involving contextual features only. This is the base set of rules. There are 18 such rules. We call CM rules, rules that combine contextual and lexical features. There are 33 such rules. We call CP rules, rules that combine contextual and part of speech features. There are 27 such rules. We call CMP rules, rules that combine all features. There are 22 such rules. Table 6 shows the performance of different combinations of the rule sets. Surprisingly morphological features seem to yield better results than part of speech features. This is probably due to the named entity classes that we are considering for which upper case first character is often a determinant indicator. However, regardless of the specificity of our choice of named entity classes and as we expected, the best overall results for recall, precision and F-measure are obtained from the combination of all types of rules and all types of features. Using all combinations of all rules is the strategy we propose and we call it the InNER strategy. Table 6. Result on different combination of the rule sets Rules Contextual (Base) Base + CP Base + CM Base + CP + CM + CMP

Recall 35.79% 46.81% 47.91% 63.43%

Precision 33.87% 49.80% 70.30% 71.84%

F-Measure 34.82% 48.26% 56.98% 67.37%

For further language processing steps, the partial correct is not useful, so we also give result without partial correct, only calculate the correct response. Table 7 shows the result of system without partially correct and Table 8 gives the difference on F-Measure between system with partial correct and without partial correct. Table 7. Result of experiment without partial correct Rules Contextual (Base) Base + CP Base + CM Base + CP + CM + CMP

Recall 28.28% 41.29% 44.11% 60.22%

Precision 26.74% 43.93% 64.71% 67.76%

F-Measure 27.49% 42.57% 52.49% 63.77%

Table 8. Difference between with and without partial correct on F-Measure Rules Contextual (Base) Base + CP Base + CM Base + CP + CM + CMP

With Partial Correct 34,82% 48,26% 56,98%

Without Partial Correct 27,49% 42,57% 52,49%

67,37%

63,77%

Difference 7,33% 5,69% 4,49% 3,60%

Based on Table 8, we see that the contextual feature gives the highest contribution of partial correct and the all combination of features gives the lowest difference result when using partial correct. It means that adding more features gives more accurate result. We could see that the results are still below standard performance as compared to some other languages for NER (e.g. 80%). We have no definitive answer yet why this happened since this is the first generation of NER in Indonesian language. It may be related to the datasets used, or to the decision to choose the rules, or both. We develop the datasets manually and carefully, but we have no

judgment from the domain expert about correctness of the datasets. Moreover, if we look manually at the datasets, we find that so many occurrences that a conjuction “dan (and) used as part of organization name, for example: “Fakultas Ilmu Sosial dan Politik” (Political and Social Faculty), “Departemen Kehakiman dan HAM” (Human Right and Court Department), and “Pusat Studi Demokrasi dan HAM” (Center of Human Right and Democracy Study). Our system could detect those terms as two entities instead of just one entity. The decision of choosing the right rules could contribute to the lower result. The result might be better if we could design the ranked of rules. When comparing the performance of the InNER strategy, i.e., for all rules, to the performance of a named entity recognition by means of mined association rules (which we had shown in [6] outperforms maximum entropy methods for the Indonesian language) we find that InNER yields a consistently and significantly better performance in both recall and precision and therefore, naturally, in F-measure. Even if we compare it to the combination of contextual and morphological rules, the InNER strategy still performed better. We used observation document as training set to discover those association rules. We applied feature rule as association rules which used in [6] that form: => nc2, (support, confidence) The above rule was constructed from a sequence of terms , where f2 is the morphological feature of t2 and nc2 is the name class of t2. The left hand side of the association rules is a sequence of terms and features while the right hand side is the name class. The support and confidence depend on their occurrences in the training sets. See [6] for further detail on how these association rules could be used in name entity recognition. This form is similar to the third rule of InNER, combination contextual and morphological feature. Table 9 contains the figures of this comparison. Table 9. Comparation with association rules method Method InNER Association Rules

Recall 63.43% 43.33%

Precision 71.84% 52.50%

F-Measure 67.37% 47.49%

A manual closer look at the results, going through the correct, partial, possible, and actual named entities, seems to indicate that association rules induce more partial recognition. This fact also shows that the performance of association rules is closer to the performance InNER using combination of lexical and morphological feature without partially correct. This is avoided by the knowledge engineering approach, which is capable of a finer grain tuning of the rule by leveraging the variety of features available.

5. Conclusions We have proposed a knowledge engineering approach to the recognition of named entities in texts in the Indonesian language. The approach is based on rules that combine contextual, morphological, and part of speech features in the recognition process. The method yields a highest performance of 63.43% recall and 71.84% precision with the combination all of three features. Based on experiment, we also showed that morphological feature has a better result than that of part-of-speech feature. It means that knowing the structure of the letter forming a term gives a better result rather than knowing its part-of-speech. We showed that this method outperforms an association rule based method we had previously developed because this method reduces the partially correct result. Since we had previously shown that under a similar experimental set up the association rule based yielded a better performance than state of the art method (maximum entropy) [6], we can conclude that based on our experiment, the knowledge engineering method is the best.

6. Future Work Clearly this comes at the cost and expenses of expert knowledge and effort. Our experts have manually designed 100 rules. It is a tedious task, which we did not conduct, to compare these rules

individually with the association rules that are automatically mined. It would be however interesting to compare and integrate the mined association rules and the engineered rules. Indeed, not only do we expect the controlled merging of the mined association rules with the engineered rules to result in an even more efficient method, but we also expect an effective and elegant tool for visualizing and browsing the mined association rules to help the knowledge engineer in the design process itself. For better performance, specifically to get the standard performance, we think there should be improvement on the datasets beside the method. The datasets should be evaluated and revised by the domain expert in order to reduce the manual error. The next step in our project is to devise a method to reconstruct structured elements from the elementary name entities identified. Our target language is XML. To illustrate our idea, let us consider the motivating example from which we wish to extract an XML document describing the meeting taking place: “Presiden Habibie bertemu dengan Prof. Amien Rais di Jakarta kemarin.” Fig. 2 contains the manually constructed XML we hope to obtain. In italic are highlighted the components that require global, ancillary, or external knowledge. Indeed, although, we expect similar methods (rules based, association rules) can be applied to learn the model of combination of elementary entities into complex elements, we also expect that global, ancillary, and external knowledge will be necessary such as gazetteers (Jakarta is in Indonesia), document temporal and geographical context (Jakarta, 05/06/2003), etc. 05/06/2003 Jakarta Indonesia Habibie Presiden Amien Rais Prof.

Fig. 2. Extracted structural form in XML

7. References [1] Appelt, D., and Israel, D.J.: Introduction to Information Extraction Technology, Tutorial at IJCAI-99, Stockholm, Sweden (1999) [2] Appelt, D., et al.: SRI International FASTUS system MUC-6 test results and analysis, In Proceedings of the 6th Message Understanding Conference (MUC-6) (1995) [3] Bikel, D., et al.: NYMBLE: A High Performance Learning Name-Finder, In Proceeding of the fifth Conference on Applied Natural Language Processing, pp 194-201 (1997) [4] Borthwick, A., et al.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada (1998) [5] Bressan, S., and Indradjaja, L.: Part-of-Speech Tagging without Training, in Proc. of Intelligence in Communication Systems: IFIP International Conference, LNCS V3283, Springer-Verlag Heidelberg, ISSN: 0302-9743, ISBN: 3-540-23893-X (INTELLCOMM) (2004) [6] Budi, I., and Bressan, S.: Association Rules Mining for Name Entity Recognition, In Proceeding of 4th Web Information System Engineering (WISE) Conference, Roma (2003) [7] Chieu, H.L., and Ng, H.T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information, In Proceedings of the 19th International Conference on Computational Linguistics (2002) [8] Chinchor, N., et al.: Named Entity Recognition Task Definition Version 1.4, The MITRE Corporation and SAIC (1999) [9] Dalianis, H., and Åström, E.: SweNam-A Swedish Named Entity recognizer. Its construction, training and evaluation, Technical report, TRITA-NA-P0113, IPLab-189, NADA, KTH (2001) [10] Dekang, L.: Using Collocation Statistics in Information Extraction, In Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)

[11] Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual, In Proceedings of the 7th Message Understanding Conference (MUC-7) (1998) [12] Farmakiotou, D., Karkaletsis, V., Koutsias, K., Sigletos, G., Spyropoulos, C.D., and Stamatopoulos, P.: Rulebased Named Entity Recognition for Greek Financial Texts. In Proceedings of the International Conference on Computational Lexicography and Multimedia Dictionaries COMLEX 2000 (2000) [13] Grishman, R.: Information Extraction: Techniques and Challenges, Lecture Notes in Computer Science Vol. 1299, Springer-Verlag (1997) [14] Iwanska, L., et al.: Wayne State University:Description of the UNO natural language processing system as used for MUC-6, In Proceedings of the 6th Message Understanding Conference (MUC-6) (1995) [15] Mikheev, A., Grover, C., and Moen, M.: Description of the LTG System Used for MUC-7, In Proceedings of the 7th Message Understanding Conference (MUC-7) (1998) [16] Morgan, R., et al.: Description of the LOLITA system as used for MUC-6, In Proceedings of the 6th Message Understanding Conference (MUC-6) (1995) [17] Savitri, S.: Analisa Struktur Kalimat Bahasa Indonesia dengan Menggunakan Pengurai Kalimat berbasis Linguistic String Analysis, final project report, Fasilkom UI, Depok (1999) (in Indonesian) [18] Sekine, S., Grishman, R., and Shinnou, H.: A Decision Tree Method for Finding and Classifying Names in Japanese Texts, Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada (1998) [19] Tur, G., Hakkani-Tur, D.Z., and Oflazer, K.: Name Tagging Using Lexical, Contextual, and Morphological Information, Workshop on Information Extraction Meets Corpus Linguistics LREC-2000, 2nd International Conf. Language Resources and Evaluation, Athens, Greece (2000) [20] Wahyudi, G.:Pengenalan Entitas Bernama berdasarkan Informasi Kontekstual, Morfologi dan Kelas Kata, final project report, Fasilkom UI, Depok (2004) (in Indonesian)

Suggest Documents