Some Aspects of the Integration of Connectionist and Logic-Based Systems∗† M´aire Lane and Anthony Karel Seda Department of Mathematics and Boole Centre for Research in Informatics, University College Cork, Cork, Ireland
[email protected];
[email protected]
Abstract We discuss the computation by neural networks of semantic operators TP determined by propositional logic programs P over quite general many-valued logics T . We revisit and clarify the foundations of the relevant notions employed in approximating both TP and its fixed points when P is a first-order logic program. Keywords: Logic programs, semantic operators, many-valued logics, neural networks, metrics, approximation.
Contents 1 Introduction
1
2 Preliminaries 2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 General Semantic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 3 4
3 Finitely Determined Connectives
6
4 Construction of the Networks 7 4.1 Networks for ΦP and ΨP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Many-Valued Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 Foundations of Approximation
1
12
Introduction
An important and interesting aspect of integrating logic-based systems and connectionist systems, or artificial neural networks (ANN), see [4, 9, 10, 11], is the computation by neural ∗
This paper is an extended version of the paper: Seda, A.K., Lane, M., On Approximation in the Integration of Connectionist and Logic-Based Systems. In Li, L., Yen, K.K. (eds), Proceedings of the Third International Conference on Information, Tokyo, November, 2004, International Information Institute, 2004, 297–300. † To appear in: Information, Vol. 9, No. 4, 2006.
1
networks of the semantic operators TP determined by logic programs P . One part of this problem concerns finding algorithms to produce neural networks which compute TP exactly, when P is propositional, and this question has been settled in [10] for the case of the immediate consequence operator TP and hence for two-valued classical logic. When P is not propositional, approximation methods are needed. This question has been studied in [10, 11] for TP and more recently in [9] for certain generalizations of TP . However, the definitions formulated so far depend on embedding all the requisite notions in the real line and hence are not self-contained. The contributions of this paper are twofold. First, for the case of propositional P , we provide algorithms to find neural networks for the computation of TP relative to very general logics T used in defining TP . To date, all results obtained use classical two-valued logic and hence our results are wide generalizations of these. Second, we revisit the foundations of the methods of approximation so far used, and present general definitions which are self-contained in that they do not involve the real line. Furthermore, we greatly extend the class of programs for which approximation methods are known to work, from acyclic programs to the class of all definite programs. Since the latter class is computationally adequate, this too is a broad generalization of known results. Due to lack of space, we refer the reader to [14] for full details of all the results relating to approximation that we discuss here. Throughout, we use the notation of [9, 14, 15, 16] in relation to logics T and logic programs P ; in particular, we work with the general semantic operator TP due to Fitting [3], see also [16], and note that it includes the important immediate consequence operator TP when T is classical two-valued logic. As far as artificial neural networks are concerned, our notation and terminology is standard, see [4, 8, 9], and we closely follow [9]. Acknowlegement The authors thank the Boole Centre for Research in Informatics at University College Cork for substantial financial support in the preparation of this paper.
2 2.1
Preliminaries Artificial Neural Networks
Based loosely on the perceived workings of the human brain, connectionist systems, or artificial neural networks (ANN), or simply neural networks for short, are a computing paradigm that have gone in and out of favour over the past forty years or so. The advantages of dealing with neural networks as a means of computation are that they can be trained, they deal successfully with noisy input and they degrade gracefully. However, their drawbacks include the fact that there is no clear understanding of how any particular network does the computation it is trained to do, that is, there is no clear semantics associated with ANN in general. We next briefly summarize what terms and notation we need relating to neural networks, and how we compute with them, following closely the treatment of [9], see also [4, 8]. A neural network is a weighted digraph. Thus, its underlying structure is that of a collection of units or nodes together with weighted directed connections between the units. The units can be of different types, but the basic premise is the same in each case: the units receive as input the weighted sum of the output from the units connected to them. A typical unit (or node) k in this digraph is shown in Figure 1. We let wkj ∈ R denote the weight of the connection from unit j to unit k (which may be 0). Then the unit k is characterized, at time t, by the following data: its inputs ikj (t) = wkj vj (t) for j = 1, . . . , nk , its threshold θk ∈ R, its potential Pnk pk (t) = j=1 wkj vj (t) − θk ∈ R, and its value vk (t). The units are updated synchronously, time becomes t + ∆t, and the output value for k, vk (t + ∆t), is calculated from pk (t). Units are distinguished, mainly, by the nature of the output function vk . A unit k is called 2
v1 (t) H wk1 H
HH
.. .
HH j
wkj vj (t)
-
pk (t)
.. .
* wknk
6
vnk (t)
- vk (t + ∆t)
θk
Figure 1: Unit k in a connectionist network. binary threshold if vk (t + ∆t) = H(pk (t)), where H(x) = 1 if x ≥ 0 and is 0 otherwise; is called linear if vk is the identity function, so that vk (t + ∆t) = vk (pk (t)) = pk (t), and θk is 0; and is called sigmoidal if vk is a squashing function φ, that is, vk (t + ∆t) = φ(pk (t)), where φ is non-decreasing and such that limz→∞ (φ(z)) = 1 and limz→−∞ (φ(z)) = 0. As far as the architecture of neural networks is concerned, in this paper we will only consider networks where the units can be organized in layers. A layer is a vector of units. An n-layer feedforward network F consists of the input layer, n − 2 hidden layers, and the output layer, where n ≥ 2. Here, we will be mainly concerned with n = 3. Each unit occurring in the i-th layer is connected to each unit occurring in the (i + 1)-st layer, 1 ≤ i < n. A connectionist network F is called a multilayer feedforward network if it is an n-layer feedforward network for some n. Finally, a neural network is called recurrent or is made recurrent if the number of units in the input layer is equal to the number of units in the output layer and each unit in the output layer is connected with weight 1 to the corresponding unit in the input layer, see Figure 2. A recurrent network can thus perform iterated computations because the output values can be returned to the input layer via the connections just described; it can thus perform computation of the iterates TkP (I), for example.
2.2
Logic Programs
Let P denote a normal logic program with underlying first order language L. We refer to [13] for notation and basic facts concerning logic programming. Thus, P consists of a finite set of clauses of the form A ← L1 , . . . , L n , where A is an atomic formula called the head of the clause, and L1 , . . . , Ln denotes a conjunction of literals Li (atoms or negated atoms) called the body of the clause; here, n ≥ 0 and the case n = 0 is, by an abuse of notation, interpreted to mean empty body or in other words the unit clause or fact A ← . Furthermore, we let BP denote the Herbrand base of P , namely, the set of all ground (or variable free) atoms formed from the symbols in L. By the term logic T , we understand a finite set of truth values T = {t1 , . . . , tn }, together with operations of disjunction ∨, conjunction ∧, and negation ¬, the latter satisfying ¬(¬t) = t for all t ∈ T . In practice, the definitions of the connectives ∨, ∧ and ¬ are frequently given by truth tables. By the term interpretation we understand a mapping I : BP → T which assigns to each atom a truth value from T . The set of all such interpretations will be denoted by IP,T , 3
... output layer
Q k K@ A K@ A IQ 6 I A@Q A@ Q A@ Q A@ A @ AQ@ A @A QQ @ Q A @A @
...
3 6
hidden layer 3 6
...
Q k 6 AK AK IQ @ @ I @Q A@ A Q A@ A @ Q @ AQ@ A @A QQ @A A A @ @ Q
input layer 6
6
...
6
6
Figure 2: Sketch of a 3-layered recurrent network. or just by IP if the logic T is understood. For our purposes in this paper, the most important example of a many-valued logic is Belnap’s 4-valued logic, defined as in Table 1. Finally, we will distinguish the two truth values t1 and tn . In practice, they may be the least and greatest elements of T in some given orderings, as we see later, or they may be distinguished in some other way. For example, in FOU R, t1 will be taken to be t and tn to be f . When considering propositional logic programs, that is, programs with no variable symbols, BP is finite and can be written as a list (A1 , . . . , Am ) of m elements. For n-valued logics, we can view interpretations I ∈ IP,T as vectors of length m. For such a vector, the j-th entry is itself a binary string of length n holding the truth value of Aj in the sense that this string will have a 1 in its i-th place if I(Aj ) = ti , and a 0 otherwise. Thus, each such binary string contains precisely one 1. For example, taking FOU R = {t, u, b, f }, ordered as listed, and supposing that BP = (A1 , A2 , A3 ) has three elements, the vector (0001, 0010, 1000) is a fourvalued interpretation that assigns A1 to false, A2 to both and A3 to true.
2.3
General Semantic Operators
To define the general semantic operator determined by a logic program P , it is first necessary to define the set P ∗∗ associated with P , as described next. We start by adding two atoms true and false to L, and require that I(true) = t1 and I(f alse) = tn for all I ∈ IP . (Notice that t1 is not necessarily the truth value t, and tn is not necessarily the truth value f , as mentioned above.) Next, we define the set P ∗ associated with P , as follows. First, put in P ∗ all ground instances of members of P whose bodies are non-empty. Second, if a clause A ← with empty body occurs in P , replace it with A ← true. Finally, if the ground atom A is not yet the head of any member of P ∗ , add A ← false to P ∗ . Now we define P ∗∗ . First, in P ∗ replace each ground clause A ← L1 , . . . , Ln with A ← L1 ∧ . . . ∧ Ln . Next, if there are several clauses A ← c1 , A ← c2 , . . . in the resulting set having the same head, replace them with A ← c1 ∨ c2 ∨ . . .. 4
Table 1: Truth table for the logic FOU R p t t t t u u u u f f f f b b b b
q t u f b t u f b t u f b t u f b
¬p f f f f u u u u t t t t b b b b
p∧q t u f b u u f f f f f f b f f b
p∨q t t t t t u u t t u f b t t b b
We note that we may have a countably infinite disjunction at this point, and this fact causes us some technical difficulties which we need to overcome, see §3. Now, each ground atom A is the head of exactly one element A ← c1 ∨ c2 ∨ . . . of P ∗∗ , and it is common practice to work W ∗∗ with P in place of P . Indeed, A ← c1 W ∨ c2 ∨ . . . may be written A ← i ci and referred to as a pseudo-clause with head A and body i ci . 2.1 Definition Let P be a normal logic program. We define TP : IP,T → IP,T as follows. For any I ∈ IP,T and A ∈ BP , we set W TP (I)(A) = I( i ci ), W where A ← i ci is the unique pseudo-clause in P ∗∗ whose head is A. By varying the underlying logic, one obtains from this definition very general semantic operators, see [12]. Indeed, Fitting has shown, in [3] and elsewhere, that one can easily recover the well-known (two-valued) single-step operator TP and the three-valued operator ΦP , as well of course as the four-valued operator ΨP , by taking FOU R as the underlying logic, and it is convenient to briefly consider the details here. First, recall that we define the single-step operator TP : IP,2 → IP,2 by letting TP (I) be the set {A ∈ BP | there is a ground clause A ← body such that I(body) = true}. Here, IP,2 denotes the set of all two-valued interpretations (the power set P(BP ) of the Herbrand base BP of P ), and the logic in question is classical, that is, the restriction of FOU R to t and f . Next, we take the restriction of FOU R to t, f and u, and regard elements I of IP,3 as pairs (I + , I − ) of disjoint subsets of BP . Now we define the operator ΦP : IP,3 → IP,3 by setting ΦP (I) = (T, F) where T is the set {A ∈ BP | there is a ground clause A ← body, where body is true in I}, and F is the set {A ∈ BP | for every ground clause A ← body, we have that body is W false in I}. In both cases, infinite disjunctions are defined as in W the following section. Thus, i ci takes valueWtrue t if and only if at least one ci is t; i ci takes value false f if and only if all the ci are f ; i ci takes value undefined u if and only if no ci is 5
t and at least one is u. One now obtains that TP = TP in the first case, and TP = ΦP in the second. We note in passing that it is an important point that one can recover the operators GLP and WP from TP by means of the results of [18]. Here, GLP is the well-known operator used by Gelfond and Lifschitz in defining the stable-model semantics [6], and WP is the operator of [1] that characterizes the well-founded semantics of P , see [17]. Specifically, it is shown in [18] that one has GLP (I) = Tf ix(P ) (I), and WP (I) = Φf ix(P ) (I), where f ix(P ) is the fixpoint completion of a logic program as defined by Dung and Kanchanasut in [2].
3
Finitely Determined Connectives
To be able to work with the general semantic operator, we must show how we handle infinite disjunctions in determining the truth values of bodies of pseudo-clauses in P ∗∗ , and we refer to W [16] for full details. We write arbitrary countable disjunctions of elements of T in the form i∈M ti , where ti ∈ T for all i ∈ M . 3.1 Definition Let T be a logic. We say that disjunctions are finitely determined in T if, for each t ∈ T , there exists aW set Et∨ of truth values and a finite collection of sets {(disjt ) | j ∈ J } of truth values such that i∈M ti = t iff for some j ∈ J we have (i) disjt ⊆ {ti | i ∈ M }, and (ii) for all i ∈ M we have ti ∈ / Et∨ . ∨ We call Et the excluded set for t with respect to disjunction, and each set disjt is referred to as a set of required values for t with respect to disjunction. Furthermore, the elements of A∨t = (Et∨ )c are called the allowable values for t with respect to disjunction, where (Et∨ )c denotes the complement of the set (Et∨ ). In [16], it is shown that if a disjunction is finitely determined, then it is commutative, associative and idempotent. Furthermore, if these three conditions hold, then for any disjunction W i∈M si of truth values, where each of the si ∈ T and M is a denumerable set, the W sequence s1 , s1 ∨ s2 , s1 ∨ s2 ∨ s3 , . . . is eventually constant with value, t, say. Therefore, setting i∈M si = t gives each infinite disjunction in T a well-defined meaning which extends the usual meaning of finite disjunctions. In order to compute TP , it is not necessary for conjunction to be commutative since only finite conjunctions occur in the bodies of elements of P ∗∗ . Nevertheless, it will be necessary later on to impose the condition on our logics that conjunction is also finitely determined in the same sense as disjunctions are finitely determined, as follows. 3.2 Definition Let T be a logic. We say that conjunctions are finitely determined in T if, for each t ∈ T , there exists a V set Et∧ of truth values and a finite collection of sets {(conjt ) | j ∈ J } of truth values such that i∈M ti = t iff for some j ∈ J we have (i) conjt ⊆ {ti | i ∈ M }, and (ii) for all i ∈ M we have ti ∈ / Et∧ . ∧ We call Et the excluded set for t with respect to conjunction, and each set conjt is referred to as a set of required values for t with respect to conjunction. Furthermore, the elements of A∧t = (Et∧ )c are called the allowable values for t with respect to conjunction, where again (Et∧ )c denotes the complement of the set (Et∧ ). 3.3 Example Returning to the logic FOU R, we have Et∨ = ∅ and dis1t = {t}, dis2t = {u, b}, Et∧ = {u, b, f } and cont = {t}. Eu∨ = {t, b} and disu = {u}, Eu∧ = {f, b} and conu = {u}. 6
Eb∨ = {t, u} and disb = {b}, Eb∧ = {f, u} and conb = {b}. Ef∨ = {t, u, b} and disf = {f }, Ef∧ = ∅ and con1f = {f }, con2f = {u, b}. The following definition is important for further developments in the paper. 3.4 Definition A logic T is said to be finitely determined if both conjunction and disjunction are finitely determined. In the case of a finitely determined logic T , each of the operations of conjunction and disjunction determines a partial order on T . Furthermore, the excluded sets for truth values t ∈ T can easily be characterized in terms of these partial orders, and the connections between these partial orders and the excluded sets for truth values in T will be needed when we construct the neural networks we require. 3.5 Definition Let T be a logic. We define the ordering ≤∨ on T by s ≤∨ t if and only if s ∨ t = t, where s, t ∈ T . Similarly, we define the relation ≤∧ on T by s ≤∧ t if and only if s ∧ t = t, where s, t ∈ T . It can be shown, see [16], that s ∈ Et∨ if and only if t