Constraint-Based Open-Domain Question. Answering Using Knowledge Graph Search. Ahmad Aghaebrahimian(B) and Filip Jurcıcek. Faculty of Mathematics ...
Constraint-Based Open-Domain Question Answering Using Knowledge Graph Search Ahmad Aghaebrahimian(B) and Filip Jurˇc´ıˇcek Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Charles University in Prague, Malostransk´e n´ amˇest´ı 25, 11800 Praha 1, Czech Republic {Ebrahimian,Jurcicek}@ufal.mff.cuni.cz
Abstract. We introduce a highly scalable approach for open-domain question answering with no dependence on any logical form to surface form mapping data set or any linguistic analytic tool such as POS tagger or named entity recognizer. We define our approach under the Constrained Conditional Models framework which lets us scale to a full knowledge graph with no limitation on the size. On a standard benchmark, we obtained competitive results to state-of-the-art in open-domain question answering task. Keywords: Question answering · Constrained conditional models Knowledge graph · Vector representation
1
·
Introduction
We consider the task of simple open-domain question answering [4], where the answers can be obtained only by knowing one entity (i.e. popular things, people, or places) and one property (i.e. entities’ attributes). The answer to such question is an entity or a set of entities. For instance, in the question “What is the time zone in Dublin?”, Dublin is an entity and time zone is a property. Hence, we have a pipeline which consists of two modules; property detection and entity recognition. For the first module, we train a classifier to estimate the probability of each property given each question. In the second module, given a question in natural language, we first estimate the distribution of each property using the classifier in the first module. Then, we retrieve entities constrained to some metadata about the properties. We use Freebase [3] as the knowledge graph to ground entities in our experiment. It contains about 58 million entities and more than 14 thousand properties. Hence, the entities which we obtain from knowledge graph are in many cases ambiguous. We extract metadata provided in the knowledge graph and integrate them into our system using Constrained Conditional Model framework (CCM) [13] to disambiguate the entities. In WebQuestions [1], a data set of 5,810 questions which are compiled using the Google Suggest API, 86 % of the questions are answerable by knowing only one entity [4]. It suggests that the majority of the questions which ordinary c Springer International Publishing Switzerland 2016 P. Sojka et al. (Eds.): TSD 2016, LNAI 9924, pp. 28–36, 2016. DOI: 10.1007/978-3-319-45510-5 4
Constraint-Based Open Question Answering via Knowledge Graph Search
29
people ask on the Internet are simple ones, and it emphasizes the importance of simple question answering systems. However, the best result of this task is 63.9 % path accuracy [4] which shows open-domain simple QA is still a challenging task in natural language processing (NLP). Simple QA is not a simple task; flexible and unbound number of entities and their properties in open-domain questions makes entity recognition a real challenge. However, knowledge graphs can help a lot by providing a structural knowledge base on entities. The contributions of the current paper are a highly scalable QA model and a high-performance entity recognition model using knowledge graph search. We organized the rest of this article in the following way. After a brief survey of freebase structure in Sect. 2, we describe our method in Sect. 3. We explain the settings of our experiment and the corresponding result in Sects. 4 and 5. In Sect. 6, we mention some related works and we discuss our approach in Sect. 7 before we conclude in Sect. 8.
2
The Knowledge Graph (Freebase)
Knowledge graphs contain significant amounts of factual information about entities (i.e., well-known places, people and things) and their attributes, such as place of living or profession. Large knowledge graphs cover numerous domains, and they may be a solution for scaling up domain-dependent QA systems to open-domain ones by expanding their boundary of entity and property recognition. Besides, knowledge graphs are instances of the linked data technologies. In other words, they can be connected easily to any other knowledge graph, and this increases their domain of recognition. A knowledge graph is a graph structure in which source entities are linked to target ones through directed and labeled edges. In a typical knowledge graph, connecting a source entity to a target one using a directed edge forms the smallest structure in a KG, which is usually called an assertion. Large knowledge graphs such as Freebase [3] contain billions of such assertions. The entities in an assertion (i.e., source and target entity) are identified using a unique ID which is called machine MD or only MID. Entities and connecting edges are objects which mean they have some attributes called properties. These properties connect source entities to some other target entities in the graph. For the purpose of this paper, it is enough to know about ID, MID, name, alias, type, and expected type. Each entity in the graph has one unique ID and one unique MID. While MID’s value is always a code, ID’s value is sometimes a human readable string. In another word, in contrast to MID which has no meaningful association with the entity, ID sometimes has significant lexical similarity with the entity’s name property (see Fig. 1). name is a surface form of the respective entity and it is usually a literal in a form of raw text, date or a numerical value. alias contains other aliases or surface forms for a given entity. A type defines an IS A relation with each entity. For instance, entity Dublin has types /common/topic,
30
A. Aghaebrahimian and F. Jurˇc´ıˇcek
Fig. 1. Freebase structure; (a): type, id and name nodes, (b): expected type node
/book/book subject 1 and many others, which says Dublin is not only a topic but also is the name of a book, ... The ID, MID, name, type and alias properties are defined for all entities (Fig. 1(a)). expected type is defined only for edges and maintains what should you expect to get as a target entity when traversing the graph through this property. Each edge has at most one expected type. For instance /location/location/time zones (Fig. 1 (b)) has /time/time zone as expected type, and “/film/film/produced by” has no expected type.
3
Our QA Learning Model
Constrained Conditional Model (CCM) [13] provides means of fine-tuning the result of a statistical model by enforcing declarative expressive constraints on them. For instance, if one asks ‘What time is it in Dublin?’, the answer should be directly related to a temporal notion, not something else. Knowing the fact that the question is about time, we can define a declarative constraint to state that the type of the answer should be the same as the expected type of the question’s property. Constraints are essentially Boolean functions which can be generated using available metadata about entities and properties in Freebase. As we illustrated in Fig. 1, Freebase assigns a set of types to each entity. It also assigns a unique expected type to each property. Intuitively, the type of an answer to a question should be the same as the expected type of the property for that question. A set of properties each of which with different probabilities and different expected types is assigned to each test question. Some of the properties do not have expected type and the types assigned to answers are not usually unique. Therefore, choosing the best property and entity for each question 1
Logical symbols are given in an abbreviated form to save space. For instance, the full form of the second type above is http://rdf.freebase.com/book/book subject.
Constraint-Based Open Question Answering via Knowledge Graph Search
31
requires searching in a huge space. Due to the enormous number of entities and their associated types in large knowledge graphs, translating a typical constraint like the one above into a feature set for training a statistical model is practically infeasible. Instead, using some constraints and Integer Linear Programming (ILP), we can make the search space much more efficient and manageable. In this way, we simply penalize the results of a statistical model which are not in line with our constraints. Let’s define P as the space of all properties assigned to a test question and E as the space of entities in it. For each question like What is my time zone in Dublin?, We intend to find the tuple (p, e) ∈ P × E for which the probability of p is maximal given some features and constraints. In our example question, we would like to get /location/location/time zones as the best property and “/en/Dublin” as the best-matching entity. We decompose the learning model into two steps, namely; property detection and entity recognition. In property detection, we decide which property best describes the question. We model the assignment of properties given questions using the probability distribution in Eq. 1. We use logistic regression to train the model and use the model for N-best property assignment to each question at test time. exp(ωpT φ(q)) (1) P (p|q) = Σpi exp(ωpTi φ(q)) Given a question q, the aim is to find N-best properties which best describe the question and generate the correct answer when querying against the knowledge graph. φ in the model is a feature set representing the questions in vector space. ω, the features’ weights, are optimized (Sect. 4) using gradient descent. Training questions are accompanied with their knowledge graph assertions, each of which includes an entity, a property, and an answer. Entities and answers are in their MID forms. We also have access to the Freebase KG [8]. First, we chunk each question into its tokens and compute the features φ(q) by replacing each token with its vector representation. To train our classifier, we assign a unique index to each property in the training data set and use the indexes as labels for training questions. The next step in the model is entity recognition which includes entity detection and entity disambiguation. In entity detection, we distinguish the main entity from irrelevant ones. A typical question usually contains many spans that are available in the knowledge graph while only one of them is the main focus of the question. For instance, in the question “what is the time zone in Dublin?”, there are eleven valid entities available in the knowledge graph (time, zone, time zone, ... and Dublin) while the focus of the question is on Dublin. In entity disambiguation, we disambiguate the detected entities in the last step. Given an entity like ‘Dublin, we need to know what Dublin (i.e., Dublin in Ireland, Dublin in Ohio, ...) is the focus of the question. To help the system with entity disambiguation, we use heuristics as a constraint to improve the chance of correct entities.
32
A. Aghaebrahimian and F. Jurˇc´ıˇcek
Entity recognition is done only at test time and on testing data. We use an Integer Linear Program model for assigning the best-matching entity to each test question (Eq. 2). best entity(q) = arg max(αT s(pq , eq )) (pq , eq ) ∈ Pq × Eq
e
(2)
Pq are the N-best properties for a given question q and Eq are valid entities in q. s(pq , eq ) is a vector of pq probabilities. α represents a vector of indicator variables which are optimized subject to some constraints. These constraints for each question are divided into two categories: – Constraints in the first category enforce the type of answers of (pq , eq ) to be equal to the expected type of pq in each question (i.e., type constraints). – Constraints in the second category dictate that the lexical similarity ratio between the values of name and ID properties connected to an entity should be maximal (i.e., similarity constraints). Type constraints help in detecting the main focus of a given question among other valid entities. Despite the assigned properties, each question has |E | valid entities from the KG. By valid, we mean entities which are available in the knowledge graph. After property detection, N-best properties are assigned to each question, each of which has no or at most one expected type. The product between N-best properties and E valid entities gives us N × E tuples of (entity, property). We query each tuple and obtain the respective answer from knowledge graph. Each of the answers has a set of types. If the expected type of the property of each tuple was available in the set of its answer’s types, type constraint for the tuple is satisfied. Similarity constraints help in entity disambiguation. As we depicted it in Fig. 1, each entity has an ID and a name property. Ambiguous entities usually have the same name but different IDs. For instance, entities “/m/02cft” and “/m/013jm1” both have Dublin as their name, while the ID for the first one is /en/dublin and for the second one is /en/dublin ohio (plus more than 40 other different entities for the same name). name determines a surface form for entities, and this is the property which we use for extracting valid entities in the first place. In our example, the similarity constraint for the entity /m/02cft holds true because among all other entities, it has the maximal similarity ratio between its name and the ID property values. For some entities, the value of ID property for an entity is the same as its MID. In such cases, instead of ID, we use the alias property which contains a set of surface forms for entities. 3.1
Entity Detection
Instead of relying on external lexicons for mapping surface forms in the question to logical forms, we match surface forms and their MIDs directly using the knowledge graph at test time. For entity detection, we extract spans of tokens in questions which correspond to surface forms of entities stored in the knowledge
Constraint-Based Open Question Answering via Knowledge Graph Search
33
graph. We query a live and full version of Freebase using the Meta-Web Query Language (MQL). MQL is a template-base querying language which uses Google API service for querying Freebase in real time. We query the entity MID of each span. We have two alternatives to obtain initial entity MIDs: greedy and full. In the greedy approach, only the longest valid entities are considered and their substrings, which may be still valid, are disregarded. In the full approach, however, all the entities are considered. For instance, in a simple span like “time zone”, the greedy approach returns only time zone while the full approach returns time, zone and time zone. 3.2
Entity Disambiguation
Detected entities in the last step are ambiguous in many cases. Entities in massive knowledge graphs each have different meanings and interpretations. In a large knowledge graph, it is possible to find Dublin as the name of a city as well as the name of a book. Moreover, when it is the name of a city, that name is not still unique as we saw in the earlier section. We consider similarity constraints as true, if the lexical similarity ratio between ID and name properties connected to that entity is maximal among other ambiguous entities. It heuristically helps us to obtain an entity which has the highest similarity with the surface form in a given question.
4
Our Experiment
For testing our system, we used the SimpleQuestions data set [4]. SimpleQuestions contains 108,442 questions accompanied with their knowledge graph assertions. The questions in the SimpleQuestions data set are compiled manually based on the assertions of a version of a Freebase limited to 2 million entities (FB2M). Therefore, all answers of the questions can be found in FB2M. To make our result comparable to the results of SimpleQuestion authors, we used the official data set separation into 70 %, 10 %, and 20 % portions for train, validation, and test sets, respectively. The inputs for the training step in our approach are the training and validation sets with their knowledge graph assertions. Using the Word2Vec toolkit [11], we replaced the tokens in the data sets with their vector representations to use them as features φ(q) in our model. We pruned questions with more than 20 tokens in length (only ten questions in the whole set). Using these features and the model described above, we trained a logistic regression classifier. We used the classifier at test time for detecting N-best properties for each test question. We used path-level accuracy for evaluating the system. In path-level accuracy, a prediction is considered correct if the predicted entity and the property both are correct. Path-level accuracy is the same evaluation metric which is used by the data set authors. We obtained our best validation accuracy using the greedy approach for entity recognition and 128-dimensional vectors for property detection. Using the same configuration, we reported the accuracy of our system on test data.
34
5
A. Aghaebrahimian and F. Jurˇc´ıˇcek
Result
We used SimpleQuestions to train our system (Table 1). In this setting, with 99 % coverage, we obtained 61.2 % path accuracy, which is competitive to the results in [4] when training on the same data set (61.6 %). However, their results are reported on a limited version of Freebase (FB5M). Therefore, as we used the full knowledge graph, we hope that our system can answer every possible simple question whose answer is available in the full knowledge graph. Table 1. Experimental results on test set of SimpleQuestions data set. Trained on Bordes et al.
Knowledge graph Path accuracy
SimpleQuestions FB5M
Constraint-based: (Ours) SimpleQuestions FULL FB
6
61.6 61.2
Related Works
Domain specific QA has been studied well [7,10,14,17,18] for many domains. In a majority of these studies, a static lexicon is used for mapping surface forms of the entities to their logical forms. As opposed to KGs, Scaling up such lexicons which usually contain from hundreds to thousands of entities is neither easy nor efficient. Knowledge graphs proved to be beneficial for different tasks in NLP including question answering. There are plenty of studies around using knowledge graphs for question answering either through information retrieval approach [4,15] or semantic parsing [1,2,5,9]. Unlike our work, they tend to use KGs only for validation and depend on predefined lexicons. Even in these studies, there is still a list of pre-defined lexicons for entity recognition (e.g., [1,5]). Essentially, they use knowledge graphs only for validating their generated logical forms and for entity recognition they still depend on initial lexicons. Dependence on pre-defined lexicons limits the scope of language understanding only to those predefined ones. In our approach, we do not use any data set or lexicon for entity recognition. Instead, we obtain valid entities by querying the knowledge graph at test time, and then we apply constraints on valid entities to get the correct entity for each question. As regards CCM, it is first proposed by Roth and Yih [13] for reasoning over classifier results. It was used for different other problems in NLP [6,12]. [7] proposed a semantic parsing model using Question-answering paradigm on Geoquery [16] under CCM framework. Our work differs from them, first, by the size of the questions and the knowledge graph and second, by answering open-domain questions.
Constraint-Based Open Question Answering via Knowledge Graph Search
7
35
Discussion
The class of entities in open-domain applications is open and expanding. Training a statistical model for classifying millions of entities is practically infeasible due to the cost of training and lack of enough training data. Since entity decisions for one question have no effect on the next one, the entity model can be optimized for each question at test time and on the constraints particular to that question. In contrast, the class of properties is closed, and the property decisions usually are global ones. Therefore, a property model can be optimized once at training time and for all the questions. In this way, by making decisions on properties first, we decrease the decision space for entities by a large extent, and it makes our approach insensitive to the size of the knowledge graph. As we demonstrated in our experiment, optimizing entity recognition model on a single question lets the system scale easily to large knowledge graphs.
8
Conclusion
We introduced a question answering system with no dependence on external lexicons or any other tool. Using our system and on a full knowledge graph, we obtained competitive results compared to state-of-the-art systems with a limited knowledge graph. A 0.5 % decrease in the performance of the system in [4] when scaling from FB2M to FB5M with 3 million more entities suggests that QA in a full knowledge graph with more than 58 million entities is a much more difficult task. We showed that by means of enforcing expressive constraints on statistical models, our approach can easily scale up QA systems to a large knowledge graph irrespective of its size. Acknowledgments. This research was partially funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221, core research funding, SVV project number 260 333 and GAUK 207-10/250098 of Charles University in Prague. This work has been using language resources distributed by the LINDAT/CLARIN project of the Ministry of Education, and Sports of the Czech Republic (project LM2010013). The authors gratefully appreciate Ondˇrej Duˇsek for his helpful comments on the final draft.
References 1. Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of EMNLP (2013) 2. Berant, J., Liang, P.: Semantic parsing via paraphrasing. In: Proceedings of ACL (2014) 3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: A collaboratively created graph database for structuring human knowledge. In: Proceedings of ACM (2008) 4. Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale Simple Question Answering with Memory Networks (2015). arXiv preprint arXiv:1506.02075
36
A. Aghaebrahimian and F. Jurˇc´ıˇcek
5. Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and lexicon extension. In: Proceedings of ACL (2013) 6. Chang, M., Ratinov, L., Roth, D.: Structured learning with constrained conditional models. Mach. Learn. 88, 399–431 (2012) 7. Clarke, J., Goldwasser, D., Chang, M., Roth, D.: Driving semantic parsing from the world’s response. In: Proceedings of the Conference on Computational Natural Language Learning (2010) 8. Google: Freebase Data Dumps (2013). https://developers.google.com/freebase/ data 9. Kwiatkowski, T., Eunsol, C., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: Proceedings of EMNLP (2013) 10. Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., Steedman, M.: Inducing probabilistic CCG grammars from logical form with higher-order. In: Proceedings of EMNLP (2010) 11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013) 12. Punyakanok, V., Roth, D., Yih, W.: The importance of syntactic parsing and inference in semantic role labeling. Comput. Linguist. 34, 257–287 (2008) 13. Roth, D., Yih, W.: Integer linear programing inference for conditional random fields. In: International Conference on Machine Learning (2005) 14. Wong, Y.-W., Mooney, R.: Learning synchronous grammars for semantic parsing with lambda calculus. In: Proceedings of ACL (2007) 15. Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: Proceedings of ACL (2014) 16. Zelle, J.M.: Using inductive logic programming to automate the construction of natural language parsers. Ph.D. thesis, Department of Computer Sciences, The University of Texas at Austin (1995) 17. Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of the National Conference on Artificial Intelligence (1996) 18. Zettlemoyer, L., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In: Proceedings of the Annual Conference in Uncertainty in Artificial Intelligence (UAI) (2005)