search interface based on natural language query ... - Semantic Scholar

7 downloads 626 Views 117KB Size Report
service once but to continue using and reusing the facility. ... from query answering to question answering has been already initiated by many search engines.
SEARCH INTERFACE BASED ON NATURAL LANGUAGE QUERY TEMPLATES Sebastian Ryszard Kruk ([email protected]), Krystian Samp ([email protected]) Caoimhin O'Nuallain ([email protected]), Brian Davis ([email protected]) Bill McDaniel([email protected]), Sławomir Grzonkowski([email protected]) National University of Ireland Galway, DERI Galway IDA Business Park, Lower Dangan, Galway, Ireland ABSTRACT The Internet, bursting with different kinds of activities with information being published with every heartbeat around the world is still the major challenge to the information retrieval research. Even Google, the premiere search technology, is not always able to bring the most relevant results to the user. Natural language processing techniques, being still under rapid development, are considered to be a solution for users trying to answer their questions. However, to build a fullfledge NLP system takes time and effort. In this article we present a very lightweight approach to NLP using query templates based on the motivations for building a lightweight query answering system originating from the domain of digital libraries. KEYWORDS information retrieval, natural language querying.

1.

INTRODUCTION

One of the important aspects of developing services for users is to make sure they will be used. As a means of achieving this we have examined the main motivational features to get users to not only use the system or service once but to continue using and reusing the facility. We describe the motivational characteristics as: • Intuitive interface: The interface must motivate users to engage with the system • Good Feedback: Feedback must be timely and meaningful to the user. • Engaging material: Pedagogically constructed material is more engaging to an audience • Collaboration: Users are encouraged to work together in groups • User driven: User driven curricula aids access and appropriateness to the material. The initial survey we carried out with library staff determined that systems exist to execute simple and complex queries, but the problem would appear to be the lack of user motivation to use the systems in place. This data reflect the situation for first, second third and fourth year students for all domains.

Question Answering Transition from query answering to question answering has been already initiated by many search engines like AskJeeves or Yahoo answers. Direct RDF query access is useful when integrating a digital library with other systems. But for the average user direct access is not useful since not many people know RDF query languages. A solution to this problem is natural language interface. However, fully-fledged NLP systems are still a field of major research. Semantic digital libraries can provide controlled vocabulary interfaces or a set of templates matching the most frequent user queries by combining natural language processing techniques with social and semantic information maintained by the digital libraries users can ask questions they were never able to ask (see Fig~\ref{fig:nlq}). This solution is not as sophisticated as fully-fledged NLP. This approach allows the full potential of semantic descriptions and social relations in the social semantic digital library to be used.

Methods and Techniques of NLQ

As a base of NLQ mechanism we use templates of queries expressed in natural language with variable parts for substitution purposes. For example, “show me ... written by ...” is an example of a simple template. all places denoted by three dots are changeable and the user has to specify them. the query “show me articles written by Stefan Decker” is an example instance of the above template. More advanced templates have more changeable parts.

3.1 Template acquisition Templates can be acquired in two significantly different ways. They can be defined by authors of the system or can be extracted automatically The first method requires accurate knowledge on which questions are asked most frequently by the users and what is the exact formation of those questions (e.g. order of words). This is very hard to determine and can give non-optimal results. We propose a method that automatically extracts query templates from such information sources By studying the structure and semantics of collected queries it is possible to gather the statistics about which of them are most frequently used and what are the differences between them. On that basis, created templates correspond to real-life questions and problems.

3.2 Query analysis The system receives and analyzes a created query (instance of the template). Regular expressions are used to get to the values of its variable parts, be they words or phrases. Each can have many different meanings. If the precise one is not known we propose a three-stage process of narrowing the semantics of the word: 1. 2. 3.

Using WordNet we find all different meaning groups of the word. In the second stage we compare trees of meanings for different words from the template. Meaning of the word can be also determined using context, sense and goal of the question.

Such a tree-staged process allows removal of a significant part of the tree or to establish the priorities of individual meanings. Other groups return the list of synonyms and other related words. This way the word initially typed in original template is replaced with the group of words.

3.3 User-Interface and Interaction A created query has a form of the question expressed in a natural language. The system creates appropriate queries using the template and the sets of words for each variable part. The Syntax of those queries depends on underlying database system and the query language. The number of generated queries depends on the size of the groups with remaining alternatives for the given words in the template. 1. Filling variable parts of a template with default values. 2. Graphical distinguishing variable parts of a template. 3. Annotating relevant parts. 4. When filling a template the list of possible or frequently used words can be displayed.

3.4 Collaboration Observation of behavior and interaction of the user is a source of the relevant information, which can be used during the construction of the NLQ interface. We propose to do it either by templates ranking - collecting the statistics of used templates or by words ranking - collecting the statistics of used words in individual templates..

3.4 Adaptation Some level of adaptation is possible during the query analysis and when recommending most frequently used templates and words. We propose three following solutions: 1. Sorting templates and words for a particular part, system can use only the statistics (see the collaboration point) from persons who are somehow similar to the current user. 2. Constructing a list of templates or words. Properties of the user model can be taken into account.

3.

Meaning of particular words can be determined using the stereotype to which the user belongs.

3.1. Natural language queries in digital libraries One of the systems where semantic-rich information is gathered and delivered to the end-users are the semantic digital libraries, like JeromeDL (Kruk et al. 2005a). Semantic information gathered in JeromeDL conforms 3 ontologies: JeromeDL structure ontology, MarcOnt bibliographic ontology (Kruk et al., 2005b) and FOAFRealm ontology (Kruk et al, 2006) Using RDF to combine these 3 types of information allowed readers to explore the database of this digital library by asking questions they could not ask before.

3.2. Natural language templates for user questions First step to deliver templates for user questions was to identify typical questions users were likely to ask. Early instances of JeromeDL were delivered as research libraries for DERI. Among questions like “all publications from year 2005 sponsored by XYZ”, we also found one very appealing: “all publications written by students supervised by ...”. This particular was one of the questions were neither typical keyword-based search nor faceted navigation would given straight forward solution. To process user question the query engine matches it against a list of templates defined in a form of regular expressions. Our prototype implementation delivered with JeromeDL (see Fig. 4) allows users to select from a list of suggested templates.. Eventually, if the algorithm succeeds the selected template is filled in with parameters extracted from user's question (see Fig. 6) and executed by the RDF storage query engine.

Evaluation In order to evaluate our approach, we presented a list of query templates supported by the prototype to librarians from the local library. We asked if these were questions the staff (9 people) were asked and asked often. Most of the questions supported by the prototype were accepted as frequently asked questions by the library staff (see Tab. 1). Additionally we have received very positive feedback on the system,

Table 1: Results of the evaluation questionnaire Question

How many of the staff was asked Percentage of this question often group asked

1

What are the books concerning X

9

100%

2

Where would I find books on X

9

100%

3

Where are the X writing in the last X years

7

78%

4

What books do you have similar to X

7

78%

5

What are the articles referencing X

9

100%

6

Show the most X on X

9

100%

7

Show me the shortest book in X

0

0%

8

Show me all publication from X (conference)

9

100%

9

Show me all publication from friends of X (conference)

8

89%

the

CONCLUSION In this article we described the vision of the question answering system for digital libraries. We have presented a simple, yet robust solution to question answering techniques based on questions templates. The prototype was built into the JeromeDL (Kruk et al 2005a) system and supported question templates positively evaluated by the librarians.

ACKNOWLEDGEMENT This material is based upon works supported by Enterprise Ireland under Grant No. *ILP/05/203* and by Science Foundation Ireland (SFI) under the DERI-Lion project (SFI/02/CE1/1131).

REFERENCES Kruk S. R. et al. 2005. JeromeDL - Adding Semantic Web Technologies to Digital Libraries, Proceedings of DEXA 2005 Conference, Copenhagen, Danmark Kruk S. R. et al. 2005. MarcOnt - Integration Ontology for Bibliographic Description Formats. Proceedings of the International Dublin Core Conference 2005, Madrid, Spain Kruk S. R. et al. 2006. D-FOAF: Distributed Identity Management with Access Rights Delegation. Proceedings of the Asian Semantic Web Conference 2006. Beijing, China