ARISTA Causal Knowledge Discovery from Texts - Semantic Scholar

2 downloads 0 Views 56KB Size Report
“entities” such as elastic forces, alveolar pressure, surface tension, lungs and alveoli. If the user submits the question: “What process of alveoli forces air out of ...
ARISTA Causal Knowledge Discovery from Texts John Kontos, Areti Elmaoglou, and Ioanna Malagardi Artificial Intelligence Group Laboratory of Cognitive Science Department of Philosophy and History of Science National and Capodistrian University of Athens [email protected] , [email protected]

Abstract. A method is proposed in the present paper for supporting the discovery of causal knowledge by finding causal sentences from a text and chaining them by the operation of our system. The operation of our system called ACkdT relies on the search for sentences containing appropriate natural language phrases. The system consists of two main subsystems. The first subsystem achieves the extraction of knowledge from individual sentences that is similar to traditional information extraction from texts while the second subsystem is based on a causal reasoning process that generates new knowledge by combining knowledge extracted by the first subsystem. In order to speed up the whole knowledge acquisition process a search algorithm is applied on a table of combinations of keywords characterizing the sentences of the text. Our knowledge discovery method is based on the use of our knowledge representation independent method ARISTA that accomplishes causal reasoning “on the fly” directly from text. The application of the method is demonstrated by the use of two examples. The first example concerns pneumonology and is found in a textbook and the second concerns cell apoptosis and is compiled from a collection of MEDLINE paper abstracts related to the recent proposal of a mathematical model of apoptosis.

1 Introduction A method is proposed in the present paper for discovering causal knowledge by finding the appropriate causal sentences from a text and chaining them by the operation of our system for knowledge discovery from texts. The processing of natural language texts for knowledge acquisition was first presented in [1] by the use of the new representation independent method called ARISTA. This method achieves causal knowledge mining “on the fly” through deductive reasoning performed by the system in response to a user's question and is further elaborated in [2]. Our method is an alternative to the traditional “pipeline” method recent applications of which are presented in [3] for mining and [4], [5] for information extraction. The main advantage of the ARISTA Method is that texts are not translated into any representation formalism and therefore retranslation is avoided whenever new linguistic or extra linguistic prerequisite knowledge has to be used for improved text understanding. The operation of the system relies on the search for causal chains that in turn relies on the search for sentences containing appropriate natural language phrases. The system for knowledge discovery from texts we are developing called ACkdT consists of two main subsystems. The first subsystem achieves the extraction of knowledge from individual S. Lange, K. Satoh, and C.H. Smith (Eds.): DS 2002, LNCS 2534, pp. 348-355, 2002. © Springer-Verlag Berlin Heidelberg 2002

ARISTA Causal Knowledge Discovery from Texts

349

sentences that is similar to traditional information extraction from texts [7,8] while the second subsystem is based on a reasoning process that generates new knowledge by combining “on the fly” knowledge extracted by the first subsystem but without the use of a template representation. Our knowledge discovery process relies on the search for causal chains that in turn relies on the search for sentences containing appropriate natural language phrases. In order to speed up the whole knowledge acquisition process the search algorithm proposed in [6] may be used for finding the appropriate sentences for chaining. The increase in speed results because the repeated sentence search is made a function of the number of words in the connecting phrases. This number is usually smaller than the number of sentences of the text that may be arbitrarily large.

2 A First Example of Knowledge Discovery from a Scientific Text An example text that is an extract from a medical physiology book [9] in the domain of pneumonology and in particular of lung mechanics enhanced by a few general knowledge sentences is used as a first illustrative example of knowledge discovery from texts. Our system is able to answer questions from that text that require the chaining of causal knowledge acquired from this text and produce answers that are not explicitly stated in the input texts. The example text contains the following sentences: 1. 2. 3. 4. 5. 6. 7. 8. 9.

The decrease of the concentration of surfactant increases the surface tension. Elastic forces rise is caused by volume reduction The alveolar pressure rise forces air out of the lungs. The alveolar pressure rise is caused by elastic forces. Elastic forces include elastic forces caused by surface tension. Elastic forces caused by surface tension increase as the alveoli become smaller. As the alveoli become smaller, the concentration of surfactant increases. The increase of the concentration of surfactant reduces the surface tension. The reduction of the surface tension opposes the collapse of the alveoli.

The knowledge contained in texts like the one above concerns the causal relations between “processes” such as reduction, rise, increase and collapse that apply to “entities” such as elastic forces, alveolar pressure, surface tension, lungs and alveoli. If the user submits the question: “What process of alveoli forces air out of the lungs?” then the answer “the process in which the alveoli become smaller” is produced automatically by our system after searching the text and discovering the proper causal chain that consists of the sentences 3 to 6. The chain of reasoning for answering this question can be symbolized as below: a.b.s. c.s.t. Ãe.f. a.p.r. DRROZKHUH a.b.s. c.s.t. e.f. a.p.r. a.o.o.l.

= alveoli become smaller = elastic forces caused by surface tension = elastic forces =alveolar pressure rise =air out of the lungs

350

J. Kontos, A. Elmaoglou, and I. Malagardi

If the user submits the question: "What process of alveoli opposes collapse of alveoli?" then the system gives again the answer "the process in which the alveoli become smaller" but using a different causal chain consisting of the sentences 6 to 9. The processing of the second question requires the definition of “causal polarity” for the proper treatment of verbs like “opposes”. Positive and negative causal polarities have been defined as "+cause" and "-cause" respectively. Both causal chains that are discovered for answering the above two questions have the same starting point i.e. the phrase “the alveoli become smaller” but different end points shown in italic letters.

3 The Combination of Sentences for Knowledge Discovery Let us consider the answering of the question of the user: "What process of alveoli causes flow of lungs air?" The explanation generated automatically by the system illustrates some combinations: alveoli become smaller causes increase of elastic forces because surface tension elastic forces is a kind of elastic forces and alveoli become smaller causes increase of surface tension elastic forces alveoli become smaller causes rise of alveolar pressure because alveoli become smaller causes increase of elastic forces and elastic forces causes rise of alveolar pressure alveoli become smaller causes flow of lungs air because alveoli become smaller causes rise of alveolar pressure and rise of alveolar pressure causes flow of lungs air The sentence “The reduction of the surface tension opposes the collapse of the alveoli” uses the verb “opposes” that bears a negative meaning that must be taken into account. After defining the predicates "+cause" and "-cause" for positive and negative relations respectively, the answer to the question: "What process of alveoli opposes collapse of alveoli?" is again "become smaller", while the explanation generated automatically is: alveoli become smaller +causes reduces of surface tension because alveoli become smaller +causes increase of surfactant concentration and increase of surfactant concentration +causes reduces of surface tension alveoli become smaller -causes collapse of alveoli because alveoli become smaller +causes reduction of surface tension and reduction of surface tension -causes collapse of alveoli

4 The Search Algorithm The search algorithm consists of two modules. The first consists of an algorithm of organization of the data and the second consists of the main algorithm. A few of the

ARISTA Causal Knowledge Discovery from Texts

351

words are chosen as keywords that characterize each sentence [6]. The words used as keywords namely “rise”, “tension”, “elastic” and “smaller” denoted by k1,k2,k3,k4, and ordered as: k3,k2,k4,k1. The set of keyword combinations together with their occurrence in either the logical LHS (left hand side) or RHS (right hand side) of a sentence is given in Table 1. Logical LHS corresponds to the “cause” and RHS corresponds to the “effect” and the numbers to the sentences. Table T that results from our organization algorithm is given as Table 2.

Table 1. List of keyword sets

rise tension elastic smaller

=k1 =k2 =k3 =k4 k3k1 k3k2

occurring in LHS of 3 occurring in LHS of 9 occurring in LHS of 4 occurring in LHS of 6 occurring in LHS of none occurring in LHS of 5

and and and and and and

RHS of 4 RHS of 8 RHS of 5 RHS of 7 RHS of 2 RHS of 6

The combinations of keywords characterising the sentences of the text presented in section 2 consist of any of the four keywords “rise, tension, elastic, smaller”. The keyword combinations for each sentence are given below:

Where “

´VWDQGVIRUWKH³FDXVHV´UHODWLRQDQG³Ã” stands for the “is_a” relation. Table 2. The organized Table T

Where A(Q) denotes the sentences related to the term Q. As it was shown in [6] the number of direct accesses to the organised table T is smaller or equal to the number of

352

J. Kontos, A. Elmaoglou, and I. Malagardi

keywords of a term and therefore the retrieval time is independent of the size of the table. We give below the steps of the algorithm executed for locating the sentence characterized by the keyword combination k3k2: i=0 i=1 redo i=2 return

p=1 p=T(1+1)=T(2)=6 above p=T(6+1)=T(7)=A(k3k2) location of sentences 5 and 6

because i=1 and X=POS(kr1)=POS(k3)=1 because i=1 i.e. i