Name entity recognition using inductive logic ...

18 downloads 1175 Views 48KB Size Report
uses SRV - an inductive logic program - to extract name entities in Vietnamese text. ... summarization; Training data; Character recognition; Computer systems ... with Rule Learning for Information Extraction from HTML (2004) Proceeding of.
Name entity recognition using inductive logic programming Le H.T., Nguyen T.H. School of Information and Communication Technology, Hanoi University of Technology, 1 Dai Co Viet street, Hanoi, Viet Nam; Centre for Gifted Education, Hanoi University of Technology, 1 Dai Co Viet street, Hanoi, Viet Nam Abstract: Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, and percentages. It is useful in applying NER to other natural language tasks such as questionanswering, text summarization, building semantic web, etc. This paper presents a system, called BKIE, that uses SRV - an inductive logic program - to extract name entities in Vietnamese text. New predicates and features are added to SRV to deal with characteristics of Vietnamese language. Also, several strategies are proposed in this paper to improve the efficiency of the SRV algorithm. The data set using in experiments is 80 homepages of scientists in Vietnamese language that were tagged manually. The experiments give us the best F-score of 83% for extracting the "name" entity. It shows that SRV is an efficient NER algorithm given its advantages of generality and flexibility. In order to increase the system's performance, our future work includes (i) building a larger set of training data to improve system's performance; (ii) implementing BKIE using parallel programming to increase system efficiency; and (iii) testing BKIE with other application domains to get a more accurate evaluation of the system. © 2010 ACM. Author Keywords: first order logic; information extraction; name entity recognition; SRV Index Keywords: Data sets; F-score; First order logic; Home page; Inductive logic; Inductive Logic Programming; Information Extraction; Name entity recognition; Named entity recognition; Natural languages; Other applications; Question Answering; SRV; System efficiency; System's performance; Text summarization; Training data; Character recognition; Computer systems programming; Information analysis; Information technology; Natural language processing systems; Parallel programming; Semantic Web; Text processing; Inductive logic programming (ILP) Year: 2010 Source title: ACM International Conference Proceeding Series Page : 71-77 Link: Scorpus Link Correspondence Address: Le, H. T.; School of Information and Communication Technology, Hanoi University of Technology, 1 Dai Co Viet street, Hanoi, Viet Nam; email: [email protected] Conference name: Symposium on Information and Communication Technology, SoICT 2010 Conference date: 27 August 2010 through 28 August 2010 Conference location: Hanoi Conference code: 82357 ISBN: 9.78145E+12

DOI: 10.1145/1852611.1852626 Language of Original Document: English Abbreviated Source Title: ACM International Conference Proceeding Series Document Type: Conference Paper Source: Scopus Authors with affiliations: 1. Le, H.T., School of Information and Communication Technology, Hanoi University of Technology, 1 Dai Co Viet street, Hanoi, Viet Nam 2. Nguyen, T.H., Centre for Gifted Education, Hanoi University of Technology, 1 Dai Co Viet street, Hanoi, Viet Nam

References: 1.

Aitken, J.S., Learning Information Extraction Rules: An Inductive Logic Programming approach (2002) Proceedings of the 15th European Conference on Artificial Intelligence, , ed. van Harmelen, F., IOS Press, Amsterdam

2.

Badica, C., Badica, A., Experimenting with Rule Learning for Information Extraction from HTML (2004) Proceeding of SYNASC 2004

3.

Califf, M.E., Mooney, R.J., Relational Learning of Pattern-Match Rules for Information Extraction (1998) Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Stanford, CA

4.

Cestnik, B., Estimating probabilities: A crucial task in machine learning (1990) Proceedings of the Ninth European Conference on Artificial Intelligence

5.

Freitag, D., Information extraction from HTML: Application of a general machine learning approach (1998) Proceedings of AAAI'98, pp. 517-523

6.

Freitag, D., Toward general-purpose learning for information extraction (1998) Proceedings of the 17th International Conference on Computational Linguistics, 1, pp. 404-408

7.

Huffman, S., Learning Information Extraction Patterns from Examples (1996) Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, , Springer-Verlag

8.

Le, P.H., (2010) vnTagger, , http://www.loria.fr/~lehong/tools/vnTagger.php, Last visited May

9.

Nguyen, C.T., Tran, T.O., Phan, X.H., Ha, Q.T., Named Entity Recognition in Vietnamese Free-Text and Web Documents Using Conditional Random Fields (2005) The 8th Conference on Some Selection Problems of Information Technology and Telecommunication, Hai Phong, Vietnam

10. Nguyen, H.T., Cao, H.T., Named Entity Disambiguation on an Ontology Enriched by Wikipedia (2008) Proc. of the 6th IEEE International Conference on Research, Innovation and Vision for the Future - in Computing and Communications Technologies (RIVF'2008, July 13-17, Ho Chi Minh City, Vietnam), pp. 247-254 11. Quinlan, J.R., Cameron-Jones, R.M., FOIL: A midterm report (1993) Proc. European Conference on Machine Learning 12. Soderland, S., Learning Information Extraction Rules for Semi-Structured and Free Text (1999) Machine Learning Journal, 34 13. Tran, Q.T., Pham, T.X.T., Ngo, Q.H., Dinh, D., Collier, N., Named Entity Recognition in Vietnamese documents (2007) Progress in Informatics, (4), pp. 1-9 14. (2010) Vietlex Semantic Tree, , http://www.vietlex.vn/resources/default.htm, Last visited