May 8, 2008 - Myanmar Unicode & NLP Research center. â Myanmar ... Export a dictionary to print as a text document
Myanmar Lexicon
Thin Zar Phyo, Wunna Ko Ko
May 08, 2008
JCSSE 08
1
Contents ●
Introduction
●
What is Lexicon?
●
Myanmar Lexicon
●
Lexique Pro
●
Myanmar Lexicon with Lexique Pro
●
Conclusion
May 08, 2008
JCSSE 08
2
Introduction
A south-east Asian country. ● Burmese is the official language. ● Burmese script comes from Brahmi Script. ● There are more than 100 languages used in Myanmar. ● Myanmar includes Burmese, Karen, Mon, Shan and other ethnic languages used in Myanmar. ●
May 08, 2008
JCSSE 08
3
Lexicon ●
●
It is a useful language resource for both localization experts and computational linguistics. It includes information about: –
the form and meaning of words and phrases
–
lexical categorization
–
the appropriate usage of words and phrases
–
relationships between words and phrases, and categories of words and phrases
May 08, 2008
JCSSE 08
4
Myanmar Lexicon ●
●
●
●
The participating organizations include: –
Ministry of Education
–
Myanmar Unicode & NLP Research center
–
Myanmar Language Commission
Includes over 35000 head words. The word counts do not include word grouping. Field markers are chosen based on Multi Dictionary Format (MDF). It consists of a set of word meanings and their semantic relationships. Myanmar Language Commission supply necessary information and collection of word list.
May 08, 2008
JCSSE 08
5
Myanmar Lexicon (in paper format)
May 08, 2008
JCSSE 08
6
Lexique Pro ●
●
●
●
It is designed to build dictionary/lexicon especially for complex scripts. It is an open source software and can run on Windows OS. It is a product of SIL, formerly known as Summer Institute of Linguistics. The primary functions of Lexique Pro are to: –
Create a dictionary,
–
View and edit an existing Shoebox/Toolbox dictionary database,
–
Share a database with other computer users,
–
Export a dictionary to print as a text document, or html format for web publication.
May 08, 2008
JCSSE 08
7
Lexique Pro (Contd.) ●
It can export the lexicon into the following formats: –
Rich Text Format (.rtf),
–
Microsoft Word 2007, Office Open XML (.docx),
–
OpenOffice.org (.odt or .sxw) in one of three formats:
–
Alphabetical dictionary
–
Classified dictionary (by category)
–
Index table based on a gloss language.
May 08, 2008
JCSSE 08
8
Field markers of Myanmar Lexicon
May 08, 2008
JCSSE 08
9
Myanmar Lexicon with Lexique Pro
May 08, 2008
JCSSE 08
10
Categories tab in Myanmar Lexicon
May 08, 2008
JCSSE 08
11
Myanmar Lexicon in MS Word 2007 format
May 08, 2008
JCSSE 08
12
Myanmar Lexicon in html format
May 08, 2008
JCSSE 08
13
Conclusion ●
●
●
●
The developed lexicon is available in CD format. To be available online in near future. This can help in development of statistical machine translation, Text-To-Speech (TTS), Automatic Speech Recognition (ASR). This is a general lexicon but with full features of field markers and Part-of-Speech (POS). For particular use, this may not be usable as it is.
May 08, 2008
JCSSE 08
14
Thank you for your Attention.
Please send your Questions & Comments to: Thin Zar Phyo (
[email protected])
May 08, 2008
JCSSE 08
15