ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
Ontology Approach for Cross-Language Information Retrieval S.M.Chaware1 , MPSTME, Mumbai Srikantha Rao2 , TIMSCDR, Mumbai {
[email protected],
[email protected]}
Abstract Information retrieval is an important activity especially for cross-language environment. When the knowledge is represented by some means/method, it will be easy to retrieve the information. So, to represent knowledge ontology is a rich source, which may give better approach for information retrieval especially for cross language searching. There are various approaches to build ontology like using protégé toolkit, form decision tree data, or defining some mapping rules for the database with ontology terms. These approaches are either time consuming or complex. In this paper, proposed an approach to build ontology from relational database with some additional rules, which can be used for cross language information retrieval. The ontology can be build dynamically as per user’s need, which will give overall knowledge domain to the user. The domain to build ontology is Grocery shop. The proposed system shows good results.
1. Introduction Cross-language information retrieval (CLIR) is a retrieval process in which the user presents queries in one language to retrieve information in another language. CLIR has gained popularity among Information Retrieval (IR) researches in recent years. CLIR is very much needed; especially when the user only knows his/her native language and it may not be possible to process native language all the time. Simple approaches have been developed for CLIR by using multi-lingual dictionary or Word Net. Ontology will be better choice for CLIR, as it covers the entire context and its relationships, which will be helpful for both user and system provider. Knowledge representation plays an important role in almost any domain. It not only gives an exact conclusion but also useful in decision making. There are many ways to represent knowledge, for example, database, taxomies etc. But the most prominent is ontology. There are two sources to build ontology. One of which is flat file, where the data can be stored randomly. The main problem with this technique is that file systems do not provide scalability, sharability or any query facility. The second source is relational database system, where data for a particular domain may be stored which will satisfy ACID properties.
The database management system provides maturity, performance, robustness, reliability and availability. The second source is more reliable as they provide the data as per ontology terms. To acquire knowledge, even if for a small activity, every time there is need to access entire database. It will decrease the performance in terms result and also time consuming. In order to avoid this activity and to improve the performance, ontology may be best solution. Whenever any knowledge is required, data about that sub-domain can be considered, an ontology build which will give accurate knowledge. There are many methodologies proposed to build ontology, which covers the terms and their relationships, including tools, editors, top-down, bottom-up approach etc. Ontology is a hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base. According to this definition, the same ontology can be used for building several KBs, which would share the same skeleton. These skeletons can be extended by adding low level subconcepts or high level concepts that cover new areas. Such ontology will give easy and clear understanding of structure of ontology and inference mechanisms will become easier [1]. In this paper, system architecture has been proposed to build ontology from relational database, which can be useful to acquire knowledge about domain. The result shows that the approach is faster and useful to the users.
2. Ontology Building Survey Usually the ontology building is performed manually, but researchers try to build ontology automatically or semi automatically to save the time and the efforts. We survey in this section the most important approaches that generate ontology from data. There exists a group of definitions based on the procedure to build the ontology. These definitions highlight the relationships between ontology and knowledge bases. For example, the definition given by Bernaras and colleagues in the framework of KACTUS project: Ontology provides the means for describing explicitly the conceptualization behind the knowledge represented in a knowledge base. This definition proposes the extraction of ontology from a knowledge base (KB), which will give approach to build ontology. In this approach, the ontology is built following a bottom-up strategy, on the basis of an application KB, by means of an abstraction process.
379
ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
There are certain tools are available like Protégé, which gives an environment to build ontology. It gives an interface which will use the class, subclass, and its attributes as concept, sub-concept and its relationships. It does not use any fixed approach, but modification to an existing ontology is easier [2]. Abd-Elraman et.al. proposed a method for building ontology from data. They generate the ontology by applying data mining technique such as decision tree to the data. Decision tree gives the knowledge from the data in a prescribed way. They map the data from decision tree as ontology terms such as concept, sub-concept and its relationships. Finally, they represent ontology in the form of XML and/or OWL [3]. Irina et. al. proposed a methodology to store ontology in the form of SQL relational databases. They mapped the ontology terms with databases. They designed some rules that mapped both [4]. Shufeng Zhou et al proposed the methodology of using transformation rules for ontology acquisition from relational database. Some rules have been proposed to map both [5].
3. Pitfalls of Existing Methodologies There are some pitfalls from existing methodologies. They are as below: •
• •
•
•
The ontology building approach uses bottom-up approach, which will not give exact components to merge to form ontology. With the reuse of ontology skeletal, there is no approach to reuse the existing ontology. Certain tools have been used to build ontology. Each one has its limitations or may not be suitable to build entire ontology for domain. To build ontology, each time we have to construct decision tree. It will be cumbersome and timeconsuming. The rules are not enough to transform relational database to ontology.
4. Domain for Ontology In this paper, the domain has been taken as Grocessry Shop, where there is a need for CLIR, especially in rural areas. The assumptions are, first, the user or customer only knows his/her native language as Hindi or Marathi. Second, he wishes to use the system to order grocessry items either online or at the shop. Third, the owner of the shop understands the native languages along with English. In order to build ontology, some user’s scenarios have been considered as sub domain. According to sub domain, the appropriate relations will be accessed. One of them is ‘Puri-Chhole’, where the details about items and knowledge to make menu may be required.
5. Proposed System Architecture for Building Ontology 5.1 User’s Scenario User will come to the shop to use the system or the system can be accessible online. He will choose the receipe as ‘Puri-chhole’. Let us assume that the user does not know much about the items needed for the receipe and the procedure to make. The system will ask some questions and get the answers from the user inputted by the his/her native language. From this, the user will be acquainted about the items needed for the receipe or some raw data can be provided or acquired to and from the user. The answers from the module will be the source to build ontology, which can be used to represent the knowledge about the menu.
5.2 Knowledge Pool for the user Once the user will provide the answers in local language, those will be checked or searched with the database after translating into English, as we are maintaining the data in English as global language. According to the proposed algorithm, ontology can be build. For this ontology, the user can be prompted to make certain inference about the menu. He will definitely get the knowledge from the ontology knowledge pool. For each user scenario, different ontology can be build dynamically.
5.3 Proposed Overall System Architecture The overall system architecture of our CLIR system is shown in figure 1.The entered local language keyword is being translated or transliterated into English by translation/transliteration module. We use the machine-readable bi-lingual Hindi-> English and Marathi->English dictionaries for keyword translation. Hindi and Marathi are morphologically rich [5]. Therefore, their keywords are stemmed before looking up in the bi-lingual dictionary. If the word is found from dictionary, then it goes to the database to extract the terms needed to build ontology. These terms will be extracted as per the implementation rules described in section V. In some cases, transliteration will be needed to translate local language string into English and vise-versa. For example, the string in Hindi as ‘गापाट’ is to transliterated into English as ‘Dhara’. The different modules are as follows: 5.3.1 User Interface Module: This module is a user interface module, which provides the user the choice of his/her native language. This module will accept the keyword for domain and it gives to the next module. The user will choose the sub-domain for the ontology. 5.3.2 Parsing Module: The entered keyword will be interpreted by this module as simple or complex. If it is simple, the next step for ontology building will be followed i.e question/answer module will start to decide the concept and other terms for the
380
ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
ontology. Else if the entered keyword is complex, it will be parsed by this module. Each keyword will be considered separately as input to the next module.
Algorithm: Proposed Ontology Building Approach Input: Hindi/Marathi to English translated keyword/s Output: Ontology
5.3.3 Question/Answer Module: This module is an important module, as it is leading to the actual building of the ontology. Our system will provide a set of questions according to the selected subdomain. The probable answers for these questions will be provided along with possible entries for the user. This module will take care of local language input for the question/answer set, which is forwarded to the translation module.
1. Enter the keyword to acquire knowledge in local language.
5.3.4 Stemmer Module: This module will take the input as answer from question/answer module and stem the keyword by removing all stop words to find the root word. This root word will be given for translation by the translation module. 5.3.5 Translator/Transliteration Module: Once the entered local language keyword is interpreted as answer for the prescribed question, this module will provide the translation from one language to another language using Bi-lingual dictionary. This module will directly translate the entered answer or transliterate in order to get its equivalent English keyword. Transliteration will be done on parsed strings, where the mapping methodology will be used on each character. Once the Hindi/Marathi keyword is translated into English, it will be passed on to the database module for searching. We had formed bi-lingual dictionary for an entire domain for translation. 5.3.6 Query Module: The keyword from translation/transliteration module will be taken to form SQL query according to question/answer module. Query will be passed to the database for searching and result will be accessed. 5.3.7 Database Module: This module will search the database for corresponding English keyword as table-name or attribute-name or attribute-value. If the entry is found, then according to the proposed algorithm and implementation rules, the ontology can be build.
2. If the keyword is simple, goto step 4 else goto step 3. 3. Parsing module: parse the complex keyword to get proper keywords, then goto step 4. 4. Translate/transliterate module, goto step 5. 5. Ontology building module: Search for table name or attribute name or attribute value from the database for the translated string/s. 6. According to translation rules, map the retrieved data to ontology terms. 7. According to sequence of rules, form the hierarchy from the data to display ontology. 8. Define the relationships as ‘is-a’ and ‘has’ within the concepts of ontology.
6. Implementation of Ontology Database According to the proposed algorithm for building ontology from database, we implemented the approach. Some rules have been developed in construction of ontology from database. They are given below. TABLE 1: SAMPLE REPRESENTATION OF SUBCONCEPT (PURI) FROM ONTOLOGY
Nodes
Ancestor
Successor
Puri Grain
NIL Puri
Completegrain Wheat (leafnode)
Grain
Grain Completegrain Wheat
Completegrain
Type, availability, price (attributes)
Relation (forword/backword) needs/used for has/is-a has/is-a Has
6.1 Translation/Transliteration Rules C.8 Ontology Building Module: This module actually builds ontology by acquiring one-by-one keyword from database. According to the proposed algorithm and implementation rules, each keyword will be placed to form hierarchy to represent ontology. Only, we have to represent the relationships among the concepts and subconcepts explicitly.
Rule 1: Entered keyword will be transliterated into English after parsing the keyword into vowel, consonant or modifier by transliteration module. Rule 2: Check its meaning from dictionary. If it does not have any meaning, then its transliteration will be considered as it is. Rule 3: else, it will be translated into English by using bi-lingual dictionary.
381
ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
Question/Answ er Module
User Interface for Hindi/Marathi Keyword
Simple or Comple
Parsing Module
Stemmer Module
Question/Answ er Module
Native-English Dictionary
Translator/Tran sliteration Module
English Database
Ontology Module
Fig.1: Proposed Overall System Architecture
6.2. Query formation Rules Rule 4: Every translated keyword form SQL query according to question/answer module.
6.3 Transformation from relational database schema information to ontology Rule 5: If the entered keyword is a table name, then it maps as a class for concept. Rule 6: If the relation R does not contain foreign key, then R will become concept. Rule 7: For relations R1 and R2, if R2 contains a foreign key as R1 primary key, then R1 will become concept and R2 will be sub-concept. Rule 8: If the entered keyword is an attribute name of certain relation R, then its relation name will be sub-concept or concept depending on presence of foreign key.
6.4 Transformation from relational database attributes value to ontological instances Applying above rules, an ontological structure can be constructed from extracted database. Then to form leaf node as instance values for subconcept, we can apply following rules.
attribute name and its value from database. Depending on rule 4, the hierarchy will form as ontology. Rule 10: Extracted value will be leaf node for ontology.
7. EXAMPLE Consider a sub-domain as ‘पूरी छोले’. Let us find out possible ontology terms that will describe some knowledge pool which may be useful to the user. The complex keyword ‘पूरी छोले’ will be parsed to get two keywords as ‘पूरी’ and ‘छोले’. For each keyword, we will have separate question/answer module to get ontology terms from database after translation or transliteration. For example, for ‘पूरी’, the question may be ‘ या आप पूरी के िलए गेहँू या गेहँू का आटा पसंद करोगे?’ and possible answer may be ‘गेहँू या गेहँू का आटा’ or user may be prompted to enter for the possible answer. With this answer, the selected or entered keywords will be stemmed as ‘गेहँू’ and ‘आटा’, both keywords will be translated into English and searched in the database to get concept, subconcept and relationships for ontology.
Rule 9: If the entered keyword is an attribute value, then retrieve the relation with name,
382
ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
Ontology Representation in English (Hierarchy): Puri-Chhole - Puri (Concept) - Grain (Concept) - Complete-grain (Sub-concept) -Wheat (Attribute) -Wheat-type, price (Values)
Fig. 4: Language Selection for Ontology
Fig. 2: Ontology for puri chhole
Figure 2 shows the possible ontology for ‘puri’ with hierarchy of concepts, sub-concepts and attributes where an answer is entered as ‘’ for one of the question. The possible relationships may be ‘needs’, ‘used for’, ‘is-a’ or ‘has’, for example, wheat is-a complete-grain or wheat has type and price etc. Table1 shows the representation of ontology with relationship; from this we can calculate the ancestor or successor of any node from ontology. This will be helpful for inference analysis.
Fig. 5: Q/A module for ‘puri’
8. Results and Conclusions The following screen shots show the result of building ontology from database according to the implementation strategy given in section VI. The result shows the natural language selection as Hindi (figure 4), selection of recipe as subdomain (figure 4), Q/A modules for ‘puri’ and ‘chhole’ as shown in figures 5 and 6 respectively, and finally we got on ontology as shown in figure 7. The result shows the complete, easy and simple way of building ontology from database.
Fig. 6: Q/A module for ‘Chhole’
Fig. 3: Welcome Screen for Ontology
383
ISSN:2229-6093 S.M.Chaware,Srikantha Rao, Int. J. Comp. Tech. Appl., Vol 2 (2), 379-384
9.References [1] Abd-Elraman et.al. ’Applying Data Mining for Ontology Building’. [2] Osacr Corcho et. al., ‘Methodologies tools and languages for building ontologies. Where is their meeting point?’, Data and Knowledge Engineering 46 (2003) 41-64. [3] Sanjay Kumar Malik et. al., ‘Developing University Ontology in Education Domain using Protégé’, International Journal of Engineering, Science and Technology, Volumn 2 (9), 2010. [4] Irina Astrova et. al. , ‘Storing OWL in Ontology in SQL Relational Databases’, World Academy of Science, Engineering and Technology 29, 2007. [5] Shufeng Zhou et al , ‘Ontologies Acquisition from Relational Databases’, Computer and Information Scenece, Vol.3, No.1, Feb. 2010. [6] Manoj Chinnakotla et. al., ’Hindi and Marathi to English Cross Language Information Retrieval at CLEF 2007’. [7] Ketul B. Patel et. Al. ‘Knowledge Discovery and Information retrieval’. [8] Morteza rad et. Al. ‘Concept-Based information Retrieval with Ontology Approach for Cross-Language Searching’, World Applied Sciences journal 8 (8): 965971. 2010.
Fig. 7: Ontology as IR from database
384