for Biologically Inspired Cognitive Architectures

Elsevier Editorial System(tm) for Biologically Inspired Cognitive Architectures Manuscript Draft Manuscript Number: BICA-D-13-00034R1 Title: Representing, Binding and Processing of Relational Knowledge Using Pools of Neural Binders Article Type: Research Article Keywords: Neuro-symbolic integration Binding Problem Unification Inference Artificial Neural Networks Predicate Logic Corresponding Author: Dr. Gad Pinkas, D.Sc. Corresponding Author's Institution: Center for Academic Studies First Author: Gad Pinkas Order of Authors: Gad Pinkas; Priscila Lima, PhD; Shimon Cohen, PhD Abstract: For a long time, connectionist architectures have been criticized for having propositional fixation, lack of compositionality and, in general, for their weakness in representing sophisticated symbolic information and processing it. This work offers a novel approach that allows full integration of symbolic AI with the connectionist paradigm. We show how to encode and process relational knowledge using artificial neural networks (ANNs), such as Boltzmann Machines. The neural architecture uses a working memory (WM), consisting of pools of "binders", and a long-term synapticmemory that can store a large relational knowledge-base (KB). A compact variable binding mechanism is proposed which dynamically allocates ensembles of neurons when a query is clamped; retrieving KB items till a solution emerges in the WM. We illustrate the proposal through non-trivial predicate unification problems: knowledge items are only retrieved into the WM upon need, and unified, graphlike structures emerge at equilibrium as an activation pattern of the neural network. Our architecture is based on the fact that some attractor-based ANNs may be viewed as performing constraint satisfaction, where, at equilibrium, fixed-points maximally satisfy a set of weighted constraints. We show how to encode relational graphs as neural activation in WM and how to use constraints that are encoded in synapses, in order to retrieve and process such complex structures. Both procedural (the unification algorithm) and declarative knowledge (logic formulae) are first expressed as constraints and then used to generate (or learn) weighted synaptic connections. The architecture has no central control and is inherently robust to unit failures. Contrary to previous connectionist suggestions, this approach is expressive, compact, accurate, and goal directed. The mechanism is universal and has a simple underlying computational principle. As such, it may be further adapted for applications that combine the advantages of both connectionist and traditional symbolic AI and may be used in modeling aspects of human reasoning.

Cover Letter

BICA 2013 Symposium

Dear BICA chairs,

I'm happy to have the opportunity to submit our papers to the BICA 2013 conference

Sincerely Gadi Pinkas D.Sc

Gonda Interdeciplinary Brain Research Center Bar Ilan University, Israel and College of Academic Studies, Israel

16/6/2013

*Manuscript Click here to view linked References

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Representing, Binding, Retrieving and Unifying Relational Knowledge Using Pools of Neural Binders

Gad Pinkas Center for Academic Studies, Israel and Gonda Interdisciplinary Brain Research Center Bar-Ilan University, Israel [email protected] Priscila Lima Federal Rural University of Rio de Janeiro, Brazil [email protected] Shimon Cohen Center for Academic Studies, Israel [email protected]

Abstract For a long time, connectionist architectures have been criticized for having propositional fixation, lack of compositionality and, in general, for their weakness in representing sophisticated symbolic information and processing it. This work offers a novel approach that allows full integration of symbolic AI with the connectionist paradigm. We show how to encode and process relational knowledge using artificial neural networks (ANNs), such as Boltzmann Machines. The neural architecture uses a working memory (WM), consisting of pools of ―binders‖, and a long-term synaptic-memory that can store a large relational knowledge-base (KB). A compact variable binding mechanism is proposed which dynamically allocates ensembles of neurons when a query is clamped; retrieving KB items till a solution emerges in the WM. We illustrate the proposal through non-trivial predicate unification problems: knowledge items are only retrieved into the WM upon need, and unified, graph-like structures emerge at equilibrium as an activation pattern of the neural network. Our architecture is based on the fact that some attractor-based ANNs may be viewed as performing constraint satisfaction, where, at equilibrium, fixed-points maximally satisfy a set of weighted constraints. We show how to encode relational graphs as neural activation in WM and how to use constraints that are encoded in synapses, in order to retrieve and process such complex structures. Both procedural (the unification algorithm) and declarative knowledge (logic formulae) are first expressed as constraints and then used to generate (or learn) weighted synaptic connections. The architecture has no central control and is inherently robust to unit failures. Contrary to previous connectionist suggestions, this approach is expressive, compact, accurate, and goal directed. The mechanism is universal and has a simple underlying computational principle. As such, it may be further adapted for applications that combine the advantages of both connectionist and traditional symbolic AI and may be used in modeling aspects of human reasoning. Keywords: Neuro-symbolic integration, Binding Problem, Unification, Inference, Artificial Neural Networks, Predicate Logic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Introduction Humans are capable of producing combinatorial structures and reason with them. Connectionist systems, however, have been criticized for having propositional fixation (McCarthy 1988), lacking the abilities to construct combinatorial representations and to perform processes that are sensitive to complex structure (Fodor and Phylyshyn 1988). Exactly how compositionality can occur in a massively parallel network of simple processors (such as the brain) is a fundamental question in cognitive science and the variable binding aspect of it has been identified as a key to any neural theory of language (Jackendoff 2002). Nevertheless, even when we have a method for variable binding that enables a neural network to represent complex structures, it remains unclear how such structures are processed and manipulated in order to achieve a goal. These questions still challenge theories of neurocognition (Marcus 2001, Feldmann 2013). While only few researchers believe that the brain has logic-like circuitry, it is widely accepted that processing of (grounded) symbols is used in the brain for high-level cognitive tasks such as language processing, reasoning and planning. Reasoning with First Order Logic (FOL) captures the computational essence of symbol processing and, therefore, is a good test-bed for studying and comparing the abilities of cognitive symbol processing architectures. Fundamental to any logic-based computation is the problem of unification, which involves finding (most general) objects that turn two predicate instances of a symbolic language into one, by means of a set of substitutions. This work presents a massively parallel neural architecture which is capable of retrieving and processing unrestricted FOL structures. Its symbol processing abilities are demonstrated using unification queries; however, the architecture can support other, more complex symbol processing tasks such as inference and planning. Consider for example a simple reasoning scenario: It is common knowledge that everybody (X) has a mother (m(X)), and that the mother of X is also a parent of X. In FOL, this may be expressed as: for all X, Parent(m(X),X). We also know that: Z is a grandparent of X, if it is a parent of a parent of Y and Y is a parent of X: for all X,Y, Z, Parent(Y,X) and Parent(Z,Y)GP(Z,X). In order to infer that Jon has a grandparent, a ―brain-like‖ device must be able to represent an object (Jon) together with its properties and match this rather complex structure with the more abstract rule of: everybody has a mother-parent. This matching must associate Jon with both instances of variable X Parent(m(X),X) Y. One must also reuse (or duplicate) the same rule again, and apply it to Jon’s mother, concluding that there exists also a mother for the mother of Jon. Then a third rule is used binding the three compound objects: Jon, m(Jon) and m(m(Jon)) with the X,Y,Z variables of the grandparent rule, deducing finally, that the mother of the mother of Jon is his grandparent. Unification-like matching as in the above is not just a curiosity of "standard" computing paradigm. It is fundamental to our ability to process rules. Va r i a b l e B i n d i n g a n d C o m p o s i t i o n a l i t y It is clear that, in order to perform symbolic language processing and reasoning, we must first have a variable binding mechanism that will enable us to ―glue‖ together relational knowledge items, such as verbs/predicates, concepts and variables. The general binding problem concerns how items that are encoded in distinct circuits of a massively parallel computing device can be combined in complex ways for perception, reasoning or for action. Variable binding is a special instance of the general binding problem, which arises in language, abstract reasoning and other symbolic processes (Feldman 2013). While it's clear how binding is done in programming languages, it remains a mystery how it is done in the brain. Several attempts have been made to approach the variable binding problem in a connectionist framework (Shastri & Ajjanagadde 1993, Brown, Sun 2000, Zimmer at el. 2006) Van der Velde &

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Kamps 2006, Barret et al. 2008, Velik 2010); yet, virtually all these suggestions, present limitations with respect to expressiveness, size and memory requirements, central control demands, lossy information, etc. For example, compositionality and simple queries can be provided using holographic reduced representations - HRR (Plate 1995); however, the convolution operation used, is lossy and errors are introduced as structures become more complex or as more operations are done. Network size is also a concern as HRR require many neurons to reduce convolution errors. The blackboard architecture (Van der Velde & Kamps 2006), can form complex structures but does not manipulate those structures to perform cognition and need an unspecified control mechanism. Shastri’s temporal binding provides only limited FOL expressiveness, extremely restricted form of unification and no mechanism for allocating ―temporal binders‖. Finally, all the above systems need enough neurons for storing an entire knowledge base, as opposed to a brain-like architecture where the Working Memory (WM) is used for retrieving knowledge from long term synaptic storage upon need. The number of neurons they are using is at best linear in KB size; while some use much more neurons than that. For FOL compositionality in Artificial Neural Networks (ANNs) see (Ballard 1986, Pinkas, 1992, Shastri 1993, Lima 2000, Garcez & Lamb 2007). For partial-FOL encodings as Boolean constraints see (Domingo 2008, Clark et. al 2001). Unification In conventional computing, unification is a key operation for realizing inference, reasoning, planning and language processing. It is the main vehicle for conventional symbolic systems to match rules with facts, or rules with other rules. In unification, two or more distinct hierarchical entities (terms) are merged, to produce a single, unified, tree-like structure. This unified structure adheres to the constraints of both the original entities. Formally, unification is an operation which produces from two or more logic terms, a substitution, which identifies the terms. Consider the problem of unifying two FOL predicate terms (literals): the first term is stating that for every person X, there exists a mother of X who is a parent of X; the second term is stating that some parent Y exists for the mother of Charlie. In formal predicate logic, we can use the predicate P for the Parent relation, the function m(X) representing the mother of X; and the constant c to represent the individual Charlie; thus we have two literals: P(m(X),X) and P(Y,m(c)) which we wish to unify in some reasoning task. The assignment Y=m(m(c)); X=m(c) unifies the two terms, thus, the syntactically unified predicate instance should be P(m(m(c),m(c)) meaning, in our example, that the mother of the mother of Charlie is a parent of the mother of Charlie. A solution to the unification problem may be presented as a single directed acyclic graph (DAG), as in figure 1.

2: P 1 2

3:m()

5:m() U

1

1

6:m() 1

4:c Figure 1: The DAG representing P(m(m(c),m(c)). Children-nodes represent terms that are nested in certain slots of their parents. The dashed line (U) states that the two sub-terms must be unified.

For connectionist approaches specifically dedicated to unification see (Holldobler 1990, Weber 1992, Komendantskaya 2010). All previous suggestions suffer from one or more limitations: (i) the expressiveness of the FOL language is compromised (e.g. no functions); (ii) inability to reuse a KB literal more than once with different unification - "the problem of one" (Jackendorff 2002), and, (iii) the need for ad hoc algorithms dependent on a specific KB representation. Contrary to the above,

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

our architecture can implement a general unification engine that is unrelated to the KB stored in synaptic weights. It can deal with unrestricted FOL expressions, while multiple instances of the same KB-items may dynamically be formed. In addition no specific (unification) algorithm is wired into the network. Instead, standard logistic or binary threshold units are used and the same constraint satisfaction process works uniformly in all the units for all the needed tasks; e.g., binding, retrieving or unification. We have implemented a general neural architecture, where new pieces of procedural or declarative knowledge may easily be added or changed, either by clamping certain ―input‖ neurons‖ or by changing synaptic weights. Artificial Neural Networks and Boolean Constraint Satisfaction Certain attractor-based ANNs may be seen as Boolean constraint satisfaction networks, where neural units stand for Boolean variables, and where the synaptic weights represent constraints imposed on the variables. More specifically for ANNs with symmetric weights such as Bolzman-Machines(BMs) and mean-field theory based networks, there exists a simple conversion from any weighted Boolean constraint problem into a symmetric weight matrix. Any set of weighted Boolean formulae (logical constraints), could be compiled into an ANN, which performs stochastic gradient descent, on an energy function that basically counts the weighted sum of the unsatisfied logical constraints. The fixed points with minimal energy are exactly the set of the maximally satisfying solutions. The size of the generated network is linear on the size of the original set of formulae (including additional hidden units which may be required). General Boolean constraints may be specified as propositional logic formulae (Table 1). The conjunction of all these constraints can be converted into conjunctive normal form (CNF) and these CNF formulae can be further converted into an energy function, implementable by a symmetrically weighted ANN such as BM. The CNF is a conjunction (AND) of clauses where each clause is a disjunction (OR) of positive or negated Boolean variables. Table 1. Hard and soft Boolean constraint templates for enforcing graph syntax and the unfication process. The ∀ replicates the constraint for all quantified indeces; ∃ produces an indexed OR. These constraints automatically translate to synaptic weights which cause the WM to converge on a satisfying solution. KB Syntax constraints for P(m(x),x): ∀i: Item(i,”P(m(x),x)”) Sym[i,”P”] ∀i: Item(i,”m(x)”)  Sym[i,”m”]

∀i: Item(i,”m(c)”)  Sym[i,”m”] ∀i: Item(i,”c”)  Sym[i,”c”] ∀i: Item(i,”m(x)”) ∃ j: Nest[j,i,1] ∀i: Item(i,”m(c)”) ∃ j: Nest[j,i,1] ∀i,j,j’: Item(i,”P(m(x),x)”) ∃ j,j’: Nest[j,i,1]

and Nest[j’,i,2] ∀ i,j: Item(i,”P(m(x),x)”) and Nest[j,i,1] Item[j,”m(x)”] ∀ i,j: Item(i,”P(m(x),x)”) and Nest[j,i,2]  Item[j,”x”]

∀ i,j: Item(i,”m(x))”) and Nest[j,i,1]  Item[j,”x”] ∀ i,j: Item(i,”m(c)”) and Nest[j,i,1]  Item[j,”c”] ∀ i: Item(i,”P(m(x),x)”) Root[i,i] ∀ i: Item(i,”P(y,m(c))”) Root[i,i]

DAG Validity Constraints: ∀ i, k