Geometric Transformation of User Queries in ...

0 downloads 0 Views 766KB Size Report
Doan. Using Context to Improve the. Evaluation of Information Retrieval. Systems. In International ... Satoshi Sekine, David Yarowsky, Shane. Bergsma, Kailash ...
The International Arab Conference on Information Technology (ACIT’2013)

Geometric Transformation of User Queries in Information Retrieval on the Web A. Aggoune1, A.Bouramoul2, M-K. Kholladi3 1 Computer Science Department, Labstic Laboratory, University 8 may 45, Guelma, Algeria 2 Computer Science Department, Misc Laboratory, University of Constantine2, Algeria 3 Rector of El-Oued University, Misc laboratory, University of Constantine2, Algeria [email protected], [email protected], [email protected] Abstract : In the context of improving the information retrieval on the web, we propose in this paper an original approach based on geometric queries, whose goal is to reduce the problem of the non-selection of relevant documents, this issue is known as silence problem. This approach is based on the transformation of user query into geometric shapes and exploitation of a set of previously executed queries (with relevant responses) to find the closest to the initial query by using a similarity measure. We will focus on the need to involve domain ontology in the construction of geometric queries. Finally this paper describes the implementation performed and draws the first conclusions on various perspectives of this approach. Keywords: geometric query, ontology, similarity measure, semantic search.

1. Introduction The exponential development of information exchanges by the Web has updated the difficulties to find the relevant information wished by an end user. In this context, the emergence of semantic information retrieval is one of the main motivations of the Semantic Web. Improving information retrieval system (IRS) is designed to provide information on the Web a semantic representation to respond a user's information need by providing relevant responses corresponds to their query. Although Ontologies represent a solution for interpreting the semantics of the query either by adding new terms in relation to the concepts referred by the initial terms [6] [8], or by the disambiguation of query terms [5] [16], their construction raises a problem, even if it assisted by specific tools, of more their automation is a difficult work and the precise location of all instances in document remains a complex task. Approaches aim at modeling the user and the integration of the profile resulting in personalized information access process [14]. However, the personalization of information creates the problem of the evolution of user profile over time. Contextual information retrieval (RCI) is a generalization of personalized information retrieval which aims to extend the user information involving other contextual information [12]. Recently, an

approach based on the RCI for the reformulation of initial query via context of the search [2]. However, all these approaches require a combination of additional resources (profile, context) in information retrieval to meet the user needs. In this paper we present a new approach to improve the IRS by increasing the number of relevant responses and reducing the number of irrelevant responses without using either the context or the user profile. The principle of our approach is to propose a user the approximate responses rather than a set of irrelevant responses. Their idea is to perform a geometric transformation of queries in this case called "geometric queries" and a semantic similarity measure. This approach is guided by domain ontology in the construction of this type of query. The application of a similarity measure between the user query and previously executed queries returns to calculate the semantic similarity between geometric shapes representatives these queries. The advantage of using the queries of previous search to approximate respond for initial query is that it may perhaps reveal that, some queries that have been processed in the past whose answers are more or less relevant and these queries are close to the initial query semantically speaking. In order to present our contribution, the following sections are concerned with the process of decomposition of user query, the construction of geometric query and the process

of semantic similarity between queries. We then detail the operation of the search tool. We finish by mentioning some perspectives to our work.

Degree (motif1, motif 2) =

The first process of our search tool is the decomposition of user queries into a set of components and semantic relations between them obtained via binary relations ontology. This process is based on a linguistic analysis of queries that allows at first the segmenting entered query by user in order to identify the different components. Then it removes empty words and identifies each component in the ontology by exploiting the WordNet linguistic dictionary that is the simplest tool to achieve the disambiguation of query terms [11]. We used the ontology of our previous work "AnimOnto" which allows the annotation of documents from the world of animals, this ontology will serve as a basis thereafter to construct geometric queries [1]. FIG 1 shows a part of the ontology 'AnimOnto'.

,

))

(

)

( )

( )

Where p(m)= the probability of motif in the WordNet. The use of WordNet dictionary is involved in cases where the query terms are not part of the ontology, we must find its synonyms in WordNet and select among these, which belong to the ontology else it will be automatically deleted from the list of components of the query [7]. The following figure illustrates this process.

Figure 2. Decomposition of initial query process

Animal

Plant

Vertebrate

Bird

Carnivore

( (

With ncn is a nearest common node for two motifs motif1, motif2. The information content (IC) of motif m calculated by the following formula: IC(m)=-log (p(m))

2. Decomposition Process

Amphibian



Rodent

Reptile

Insectivore

Mammal

Herbivore

Invertebrate

Fish

Mollusk

Cetacean

Coelenterate

Insect

Arachni

Arthropode

Shellfis

Spongia

Myriapod

To live Earth

Sea

Figure 1. Part of the ontology 'AminOnto' [1]

This process takes as input a user query formulated in natural language and returns a set of triplet in the form where motifs are the components (search terms) of the initial query and degree represents the semantic similarity between the motifs. We used measuring of D.LIN [10]. This has the advantage of being simple to implement and having good performance as compared to other similarity measures. This measure is based on a combination of the shortest path between two motifs and content information. This reflects the relevance of a motif in the corpus taking into account the specificity or generality [9]. The semantic similarity formula is as follows:

3. Construction Geometric Query Process The construction of geometric query process based on the transformation results of the previous process to a weighted geometric shapes according to the most used measuring TF*IDF [1]. Geometric queries are composed of a point cloud between which a distance is applied on each side of the polygon represents weighted motifs. The distance between two motifs is then degree of relation obtained through the above process.

Figure 3. Geometric representation of a query

The advantage of using geometric shapes is that they are essential to define the relevant notions of similarity between polygons, which are invariant to the sampling and different poses that can take objects or beings that represent data [3]. Thus, two metric spaces close to a distance have similar concepts [4]. Therefore, to calculate the semantic similarity between two queries it is enough to calculate the similarity between two geometric shapes derived these queries.

4. Measuring Semantic Similarity Process To address the silence problem described by the non-selection of relevant documents, our approach is to exploit the previously executed queries, whose answers are relevant (or the majority of responses are relevant) and a similarity measure between these queries and the initial query. After the geometric transformation of a query, similarity measure between these queries is performed by the calculation semantic similarity process. To calculate the similarity between two geometric queries, we use the following simplified formula [15]: Sim (f1, f2) =

(

,

)

With f1, f2 and two geometric queries and Dist (f1, f2) is the Euclidean distance between f1 and f2. Table 1. Example of measuring semantic similarity Initial query Dog going on behind the cat

Geometric query

Euclidian distance

Semantic similarity

( 1, 2)

Earth =0.5

= −

| |

( 1, 2) Cat =0.6 Dog=0.7 = ((0.5 − 0.2) In the + (0.6 − 0.6) early days Sea =0.2 + (0.7 of the − 0.7) ) summer ( 1, 2) I'll buy Dog=0.7 = 0.3 two dogs Cat =0.6 and a cat

Sim (f1, f2)= = 0.77

.

Phase filter that keeps the relevant responses of initial query and eliminate irrelevant answers.

To calculate the relevance of the documents returned from the query issued, we used two measures: the system relevance obtained by applying the formula TF*IDF and user relevance resulting in judgments of relevance documents provided in response to a query. Thereafter a phase of fusion is to merge the relevant responses of the initial request and the approximate answers of nearest query. Finally, the composition of three processes allow us to define the general architecture of our system provide the closest query to the initial query semantically speaking.

5. Our Search Tool To demonstrate the applicability of the proposed approach, we have firstly presented the general architecture of our search tool for dealing of silence problem. The general architecture of our tool consists of three processes that are detailed in the previous sections. The operation of these processes is closely related in the sense that the outputs of each process are the inputs of the next process.

Figure 4. General architecture of our tool

To clarify the principle operating of the developed tool, the environment will provide a user two search modes: 1) semantic search without geometric queries 2) semantic search with geometric queries. From a user query (natural language), the results are displayed below the query form in these two search modes. The first category of results achieved in the first search mode based on ontology 'AnimOnto', while the second category describes the approximate answers obtained using geometric queries. Figure 5 shows the implementation of ontology 'AnimOnto' using Protégé2000 software [13].

Table 2. Examples of validation of search tool

Figure 5. Implementation of ontology 'AnimOnto'

The following figure provides an overview of the main interface of our search tool called "GeoQS" (Geometric Query System).

The results show that our approach based on geometric queries gives a better result than that obtained without geometric queries. We measured the number of relevant responses and from the table above, the results of research based on geometric queries are encouraging. However, an evaluation must be made in a corpus of documents associated with a set of queries, relevance judgments indicating that this document is relevant to such a query, and evaluation metric (recall, precision and Fmeasure).

6. Conclusion

Figure 6. The main interface of the prototype

Figure 7 present the geometric query relative to the user query: “In the early days of the summer I'll buy two dogs and a cat”.

To increase the rate of selection of relevant documents in information retrieval on the web, we proposed an approach based on geometric queries. Several concepts have been used: the domain ontology "AnimOnto", semantic similarity of D.Lin, the WordNet dictionary and a set of successfully previously executed queries. The geometric transformation of user query facilitates measuring of semantic distance to find the closest query to the initial query. An implementation of this approach was presented called "GeoQS" and it is necessary in future work to assess

References

Figure 7. Example of geometric query

The validation of our approach allows us to present the Table 2 describes the number of relevant responses for three examples of queries.

[1] A.Bouramoul, M-K. Kholladi et B-L. Doan, “How Ontology Can be Used to Improve Semantic Information Retrieval: The AnimSe Finder Tool”. In International Journal of Computer Applications (IJCA) – ISSN: 0975 - 8887, Vol.21, No.9, pp. 48-54, Mai 2011. FCS – US, 2011. [2] A.Bouramoul, M-K. Kholladi et B-L. Doan. Using Context to Improve the Evaluation of Information Retrieval Systems. In International Journal of Database Management Systems (IJDMS),

Vol.3, No.2 : 22-39, May 2011. AIRCC – US, 2011. [3] Alfredo Ferreira, Simone Marini, Marco Attene, Manuel J. Fonseca, Michela Spagnuolo, Joaquim A. Jorge, Bianca Falcidieno, ‘’Thesaurus-based 3D Object Retrieval with Part-in-Whole Matching’’. Int J Comput Vis (2010) 89: 327–347. Springer Science+Business Media, LLC 2009 [4] Ansary, T. F., Daoudi, M., & Vandeborre, J. P. ‘’A Bayesian 3Dsearch engine using adaptive views clustering’’. IEEE Transactions on Multimedia, 9(1), 78–88. [5] B. Pouliquen, M. Kimler, R. Steinberger, C. Ignat, Tamara, Oellinger, K. Blackler, F. Fuart, W. Zaghouani, A. Widiger, A.-C. Forslund, and C. Best. “Geocoding multilingual texts: Recognition, disambiguation and visualization“. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), May 2006. [6] Charles Petrie. “Possible Ontologies : How Reality Constrains the Development of Relevant Ontologies”. IEEE Computer Society 1089 7801/07. 2007. [7] Christiane Fellbaum., Theory and Applications of Ontology: Computer Applications, Springer Netherlands, 2010. [8] CIMIANO P. “Ontology Learning and Population from Text: Algorithms, Evaluation And Applications”. New York, USA : Springer.2006. [9] David Sánchez, Montserrat Batet., “Ontology-based information content computation”, Knowledge-Based Systems, Vol.24, Issue 2, pp. 297–30, March 2011. [10] Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David Yarowsky, Shane Bergsma, Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani, Sushant Narsale., “New Tools for Web-Scale N-grams”, LREC, 2010 nlp.cs.nyu.edu. [11] Elisabeth Niemann, Iryna Gurevych., “Automatic Sense Alignment of Wikipedia

and WordNet”, In Proceedings of the 9th International Conference on Computational Semantics (IWCS), pp. 205214, 2011. Oxford, United Kingdom. [12] Jason J. Jung., “Evolutionary approach for semantic-based query sampling in largescale information sources”, NatureInspired Collective Intelligence in Theory and Practice, Vol. 182, Issue 1, pp. 30–39, January 2012. [13] Kalibatiene Diana, “Protégé-OWL Problems with Launching Family ontology”http://protege.cim3.net/ 2010 [14] Max Chevalier, Christine Julien, Chantal Soulé-Dupuy., “User models for adaptive information retrieval on the web: Towards an interoperable and semantic model”, International Journal of Adaptive, Resilient and Autonomic Systems (IJARAS), Vol.3, pp. 19, 2012. [15] P. Resnik., “Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language”, Journal Of Artificial Intelligence Research, Vol. 11, pp. 95-130, 1999 [16] S. Overell, J. Magalh˜aes, and S. R¨uger. "Place disambiguation with co-occurrence models". In A. Nardi, C. Peters, and J. L. Vicedo, editors, CLEF 2006 Workshop, Working notes, September 2006.

Suggest Documents