8208!71, DCR-8405720, DCR-86-07156, and U.S. Army Research Office grant ... compute principled solutions to inheritance and recognition problems with extreme ...... The agent may want to make finer distinctions such as: âAn apple is.
COGNITIVE
12,
SCIENCE
331392
(1988)
A ConnectionistApproachto Knowledge Representationand Limited Inference LOKENDRA
SHASTRI
Computer and Information ScienceDepartment University of Pennsylvania
Although the connectionist approach has lead to elegant solutions to a number of problems in cognitive science and artificial intelligence, its suitability for dealing with problems in knowledge representation and inference has often been questioned. This paper partly answers this criticism by demonstrating that effective solutions to certain problems in knowledge representation and limited inference can be found by adopting a connectionist approach. The paper presents a connectionist realization of semantic networks, that is, it describes how knowledge about concepts, their properties, and the hierarchical relationship between them may be encoded as an Interpreter-free massively parallel network of simple processing elements that can solve an interesting class of inherltonce and recognlt/on problems extremely fast-in time proportional to the depth of the conceptual hierarchy. The connectionist realization is based that leads to principled solutions to the problems m&p/e lnherftance situations during inheritance, match computation during recognition. The paper must be satisfied parallel realization.
by the
conceptual
structure
on an evidential formulation of exceptions and conflicting and the best-match or partlolalso identifies constraints that
in order
to arrive
at an efficient
1 INTRODUCTION There is a growing interest in massively parallel and highly interconnected networks of very simple processing elements. These networks are variably referred to as connectionist networks, parallel distributed processing systems, and neural networks, and are playing an increasingly important role in artificial intelligence (AI) and cognitive science (Feldman, 1985; McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986). Connectionist models have been employed successfully to deal with a variety of problems in low and intermediate level vision, word perception, absociative memory, word This paper is based in part on the author’s doctoral dissertation carried out under the supervision of Jerry Feldman, whose enthusiasm and insights greatly influenced this work. The research was supported in part by National Science Foundation Grants MCS-8209971, IST8208!71, DCR-8405720, DCR-86-07156, and U.S. Army Research Office grant ARO-DAAG2984-K-0061. Correspondence and requests for reprints should be sent to Lokendra Shastri, Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104. 331
332
SHASTRI
sense disambiguation, modeling of context effects in natural language understanding, speech production, and learning. However, for connectionism to be considered a scientific language of choice for expressing solutions to problems in cognitive science and AI, it must be demonstrated that it can be used to represent highly structured knowledge and perform inferences based on such knowledge. A common criticism leveled against connectionism is that although it is appropriate for modeling “low level” and “approximate” memory processes, such as semantic priming and associative recall, it is unsuitable for dealing with problems related to representation and reasoning. The work described in this paper partly answers the criticism by demonstrating that the connectionist approach is extremely effective in solving certain problems in knowledge representation and inference. This paper presents a connectionist realization of semantic networks, that is, it describes how knowledge about concepts, their properties, and the hierarchical relationship between them, may be encoded as a connectionist network that can compute principled solutions to inheritanceand recognitionproblems with extreme efficiency. Some salient features of the system are:
(0 The connectionist semantic network uses controlled spreading acti-
(ii)
(iii)
(iv)
w
vation to solve an interesting class of inheritanceand recognition problems extremely fast-in time proportional to the depth of the conceptual hierarchy. Thus, the time required to solve these problems does not depend on the number of concepts but simply on the number of levels in the conceptual hierarchy. The network operates without the intervention of a central controller and does not require a distinct interpreter: The knowledge as well as mechanisms for drawing limited inferenceson it are encoded within the network. The proposed network computes the solutions in accordance with an evidential formalization that derives from the principle of maximum entropy. This formalization leads to a principled treatment of exceptionsand conflicting multiple inheritancesituations during inheritance, and the best-matchor partial-match computation during recognition. Thus, we first specify what the network oughtto compute and then provide a design to construct a network that would perform the appropriate computations. The network can be constructed from a high-level specification of the knowledge to be encoded and the mapping between the knowledge level and the network level can be carried out automatically by a “compiler.” The solution scales because the design principles are independent of the size of the underlying semantic memory. The number of nodes in the connectionist networks is at most quadratic in the number of concepts in the semantic memory.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
333
1.1 The Connectionist Model The massively parallel model employed in this work is a variation of the connectionist model proposed by Feldman and Ballard (1982). A connectionist network consists of a large number of simple computing elements called units or nodes connected via weighted links. Nodes are computational entities defined by a small number (2 to 3) of states, a real-valued potential in the range [O,l], an output value also restricted to [O,l], a vector of inputs ibi2...i,,,together with functions P, Q, and V that define the values of potential, state, and output at time t + 1, based on the values of potential, state, and inputs at time t. A node communicates with the rest of the network by transmitting a single output value to all nodes it is connected to. Nodes receive inputs via weighted links. Each link contributes an input whose magnitude equals the output of the node at the source of the link times the weight on the link. A node may have multiple input sites and incoming links are connected to specific sites. Each site has an associated site function. These functions carry out local computations based on the input values at the site, and it is the result of this computation that is processed by the functions P, Q and V. 1.2 Representation and Retrieval: An Overview The following example provides an overview of the connectionist knowledge representation system. This example is an oversimplified description of how conceptual knowledge is encoded in network form and how it is accessed in a connectionist fashion. The system’s conceptual knowledge is encoded in a connectionist network referred to as the Memory Network. This network is capable of performing inheritance and recognition via controlled spreading activation. A problem is posed to the network by activating relevant nodes in it. Once activated, the network performs the required inferences automaticallywithout the intervention of any external controller. At the end of a specified interval, the answer is available implicitly as the levels of activation of a relevant set of nodes. The property of the network whereby nodes achieve an appropriate level of activation derives in part from built-in control mechanisms that carefully regulate the spreading of activation, and in part from the rules by which a node combines incoming activation. In keeping with the connectionist paradigm, the presentation of queries to the Memory Network, as well as the subsequent answer extraction is also carried out by connectionist network fragments called routines. Routines encode canned procedures for performing specific tasks and are represented as a sequence of nodes connected so that activation can serve to sequence through the routine. In the course of their execution, routines pose queries to the Memory Network by activating appropriate nodes in it. The Memory Network in turn returns the answer to the routine by activating response
334
SHASTRI
nodes in the routine. The activation returned by a node in the Memory Network is a measure of the evidential support for an answer. In the current implementation of the model it is assumed that all queries are posed with respect to an explicit set of answers and there is a response node for each possible answer. Response nodes compete with one another and the node receiving the maximum activation from the Memory Network dominates and triggers the appropriate action. Thus, computing an answer amounts to choosing the answer that receives the highest evidence relative to a set of potential answers. The actual answer extraction mechanism explicitly allows for “don’t know” as a possible answer. This may happen if there is insufficient evidence for all the choices or if there is no clear-cut dominator. Figure 1 depicts the interaction between a fragment of an agent’s restaurant routine and a part of his Memory Network. In this routine fragment, the task of deciding on a wine results in a query to the Memory Network about the taste of food and the decision is made on the basis of the answer returned by the Memory Network. Action steps are depicted as oval nodes, queries as hexagonal nodes and response nodes as circular nodes. The arcs in the Memory Network represent weighted links. The triangular (binder) nodes associate objects, properties and property values. Each node is an active element and when in an “active” state sends out activation to all the nodes connected to it. The weight on a link modulates the activation as it propagates along the link. While a rectangular node becomes active on receiving activation from any node, a binder node becomes active only on receiving simultaneous activation from a pair of nodes. To find the taste of ham, a routine would activate has-taste and HAM. The binder bl linking has-taste and HAM to SALTY will receive coincident activation along both its links and become active. Once active, bl will transmit activation to SALTY which will ultimately become active. Similarly, if some routine needs to find an object that has a salty taste it would activate the nodes has-taste and SALTY. This will cause the appropriate binder node to become active and transmit activation to HAM. Eventually, HAM will become active completing the retrieval. These two cases roughly correspond to how inheritance (finding propertyvalues of specified object) and recognition (identifying an object given some of its attributes) may be processed by the network. None of the above involved any evidential reasoning or inheritance, and was meant solely to give the reader an overview. To summarize, the knowledge representation system comprises of a semantic memory (the Memory Network) and a procedural memory (the collection of routine networks). The Memory Network is capable of computing solutions to inheritance and recognition autonomously, provided its state is initialized appropriately. The query answering operation is a threestep process:
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
335
Routines pose queries by activating appropriate nodes in the Memory Network. Activation spreads in the Memory Network in a regulated manner according to built-in rules of spreading activation and eventually, nodes in the network reach an appropriate level of activation. The activation of certain nodes in the Memory Network feeds into the response nodes in the answer network of the routine. The response
I
FOOD
PEA
has-taste
answer network Figure
1. Connectionist
retrieval
system
336
SHASTRI
nodes accumulate evidence and compete with one another and the winning node triggers the appropriate action. 1.3 Outline of the Paper The rest of the paper is organized as follows: Section 2 explicates the motivation for pursuing this work and reviews some related work; Section 3 discusses semantic networks and the significance of inheritance and recognition; Section 4 specifies a knowledge representation language for capturing the evidential information associated with concepts; Section 5 outlines the evidential formulation of inheritance and recognition; the connectionist realization of the Memory Network is specified in section 6, followed by some simulation results; Section 7 proposes a particular conceptual structure and discusses the constraints that need to be satisfied by the conceptual structure in order to arrive at an efficient connectionist encoding; and finally, Section 8 discusses some related issues. 2 MOTIVATION In addition to demonstrating the efficacy of the connectionist approach in solving problems in knowledge representation and reasoning, this work was motivated by the belief that connectionism is the appropriate paradigm for addressing the following important issue in knowledge representation and reasoning: Solutions proposed to problems in knowledge representation and inference should be computationally effective, that is, they must satisfy the real-time constraint. 2.1 Computational Effectiveness: Limited Inference and Parallelism If we analyze human behavior we find that in spite of operating with a large knowledge base, human agents take but a few hundred milliseconds to perform a broad range of cognitive tasks, such as recognizing objects, understanding spoken and written language, and making inferences such as: “Tweety is a bird therefore it flies.” The human performance data indicates that the representation of conceptual information and the cognitive processes that access it are such that not only are relevant facts retrieved spontaneously, but certain kinds of inferences also get drawn with extreme efficiency. Any serious model of cognition will have to provide a detailed computational account of how such nontrivial operations can be performed so effectively. It may seem reasonable to ignore the issue of computational effectiveness by treating it as an implementation detail. But doing so would be a serious mistake; the computational effectiveness constraint is not an obstacle in the path to understanding intelligence; on the contrary, I believe that the principles underlying the organization and use of information in cognitive systems
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
337
cannot be understood unless the question of computational effectiveness is tackled at the very outset. When discussing inference in the context of an intelligent agent interacting with an environment in real time it is important to bear in mind that any general notion of inference is semidecidable and hence, computationally intractable. Yet we need to explain how humans perform certain inferences with great efficiency. A possible solution to this apparent paradox lies in a synthesis of the limited inferenceapproach and massiveparallelism.
Limited Inference.According to the limited inference strategy, one must identify a limited but interesting class of inference that need to be performed very fast, and develop appropriate knowledge structuring techniques, algorithms, and computational models to perform this class of inference within an acceptable time frame. The critical step in pursuing the limited inference approach is circumscribing such a class of inference. There are several ways of doing so, and in fact, several possibilities have been investigated (Ballard, 1986; Frisch & Allen, 1982; Levesque, 1984). This work focuses on a class of inference that is arguably an interesting component of commonsense reasoning, namely, inheritance and recognition in semantic networks. Parallelism.The extremely tight constraint on the time available to perform nontrivial inferences suggests that in order to achieve computational effectiveness one will have to resort to parallelism in addition to focusing on a restricted class of inference. The potential for parallelism suggests itself if one recognizes that intelligent behavior requires dense interactions between many pieces of information and it would therefore seem appropriate to distribute the processing capability across the memory of the computer: A memory cell need not be an inert repository of information, but rather may become an active processing element interacting with other such elements. Such a device would permit simultaneous occurrence of numerous interactions between pieces of information. 2.2 Connectionism and Parallelism The fine gram of parallelism supported by connectionism permits one to assign a singleprocessingelementto eachunit of information. This has the following interesting consequence: Assume that besides enumerating facts about the world, we also identify the important inferential connectionsor dependencies between these facts. Now if each piece of information is encoded as a connectionist node (henceforth node), and dependencies between pieces of information are encoded as explicit links between the appropriate nodes, then inference can be viewed as spreading of activation in a connectionist network. The above metaphor has tremendous appeal because it suggests an extremely efficient way of performing inference.
338
SHASTRI
However, in order to support extremely efficient inference the spreading activation process must converge extremely fast. In fact, we would want the process to converge in a constant number (preferably one) of passes through the network. Such a convergence would ensure that the network can compute a solution in time proportional to the diameter of the network. In the context of inheritance and recognition in a knowledge base, the diameter corresponds to the depth of (i.e., the number of levels in) the conceptual hierarchy, and is typically logarithmic in the number of concepts in the knowledge base. Hence, in principle, it should be possible to design a connectionist network that solves inheritance and recognition problems in time that is only logarithmic in the size of knowledge base. But, for the connectionist network to compute solutions in a single pass of spreading activation the dependencies among pieces of knowledge must be acyclical, and achieving purely acyclical dependencies is not always trivial. One way of removing cyclical dependencies, however, is to adopt an extremely fine-grained decomposition of knowledge so as to reduce the density of dependencies’ in the knowledge base. Doing so may render some cyclical dependencies into acylical ones. A second way of eliminating cyclical dependencies is to identify suitable constraints on the conceptual structure that rule out certain types of cyclical dependencies. Both these approaches have been exploited in this work in order to arrive at an efficient. connectionist encoding. 2.3 Connectionism and Limited Inference When knowledge is encoded in connectionist networks, the traditional distinction between the representation, inference engine, and the interpreter gets blurred. In fact, there is no distinct interpreter; the links, the weights on links, and the computational characteristics of nodes encode not just the knowledge but also how the various constituents of knowledge interact during computation. This strong coupling between the structure of knowledge and inference and the absence of an interpreter is a mixed blessing. Although, it is true that in any given connectionist knowledge representation system the class of inference that have been wired in can be performed with extreme efficiency, it is also the case that the remaining inferences can either not be performed at all or can only be “approximated.” A similar limitation also exists in traditional implementations of knowledge representation systems. However, in traditional systems it is easier to augment the interpreter by adding appropriate procedures (e.g., LISP code) to the system. Not so in a connectionist I By density we mean the ratio of the number of dependencies to the number of termsin the knowledge base. Here term is used in a general sense and includes concepts, properties, features, microfeatures, and so forth.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
339
encoding; the complete computational characteristics of nodes and their interconnections critically depend on the nature of inferences being wired in, and introducing any changes in the basic inferential ability of a system may require a major reorganization of the system. In view of this, it is essential that the class of limited inference selected for special consideration (i.e., hardwired into the system) be chosen with great care and the nature of approximations made by the system be made explicit. In this work, inheritance and recognition in a conceptual hierarchy were chosen as the class of limited inference to be hardwired into the connectionist semantic network. This choice was motivated by the belief that inheritance and recognition constitute a form of reasoning that lies at the core of intelligent behavior (see section 3). 2.4 Connectionism and Reasoning with Incomplete and Uncertain Information Connectionist networks offer a natural computational model for encoding evidential computations because of the natural correspondence between nodes and hypotheses, activation and evidential support, and potential/site functions and evidence combination rules. A node may be interpreted as representing a hypothesis and the inputs to the node may be viewed as evidence provided to it by the rest of the network. A node’s potential may be viewed as the result of combining all the evidence impinging on the node using the evidence combination rule encoded by the site functions and the potential functions. 2.5 Related Work on Parallel Encoding of Semantic Networks The use of spreading activation as a computational primitive in memory models dates back at least to Quillian’s work on semantic nets (Quillian, 1968). Since then, numerous models based on spreading activation or marker passing have been proposed in the cognitive science literature (e.g., Anderson, 1983; Charniak, 1983a; Collins & Loftus, 1975). The two models that are most closely related to this work, however, are those of Fahlman (1979) and Hinton (1981). Fahlman’s NETL was the first attempt at encoding semantic networks as a massively parallel network of simple processing elements. NETL elements communicated with one another under the control of a central controller by propagating discrete messages called markers. NETL’s use of discrete markers made it incapable of supporting “best match” or “partial match” operations. For example, in NETL recognition amounted to finding a concept that possessed all of a specified set of properties. Furthermore, NETL’s solution to the inheritance problem was sensitive to race conditions in the presence of multiple hierarchies. These limitations of marker passing sys-
340
SHASTRI
terns are discussed in (Brachman, 1985; Fahlman,.1982).’ Finally, NETL did not fully utilize the potential for parallelism because the internode communication depended on instructions issued by a central (serial) controller. Hinton proposed a “distributed” encoding of semantic networks using parallel hardware. The network encoded a set of triples of the form: [relation, rolel, role 21. The proposed system had several interesting properties: Given two components of a triple, the network could determine the third tuple, the network could be programmed using the perceptron convergence rule, and it could perform simple property inheritance. The system, however, lacked sufficient structure and control to handle general cases of inheritance and “partial matching,” especially if these occurred in a multilevel semantic network that included multiple hierarchies and conflicting information. Derthick (1987) is implementing a variant of an existing representation language KL2 (Vilain, 1985), using the Boltzmann machine formulation (Ackley, Hinton, & Sejnowski, 1985). However, the reasoning system being implemented does not admit conflicting information, and therefore, cannot deal with exceptions, conflicting multiple inheritance situation, or partialmatch problems. Work on Bayesian networks (Pearl, 1985) also deals with evidential reasoning in a parallel network. Pearl’s results however, apply only to singly connected networks (networks in which there is only one underlying path between any pair of nodes). More complex networks have to be conditioned to render them singly connected. This is in part due to the unstructured form of the underlying representation language employed by Pearl. The language does not make distinctions such as “concept,” “property,” “property value” that we make (see section 4) and hence its ability to exploit parallelism is limited. Because of its massively parallel realization and the best match formulation of recognition and pattern completion, this work is also related to models of parallel associative (or content addressable) memory such as those proposed in (Hopfield, 1982; Kohonen, Oja, & Letio, 1981; Palm, 1980). The memory being modeled in this research however, is much more sophisticated than the memory proposed in each of the above models. The above models view memory as a flat and unstructured set of stable states (each stable state may be viewed as a concept). In contrast, the proposed system views memory as a highly structured collection of concepts; the structure being given by the IS-A hierarchy of concepts and the distinction among concepts, properties, and property values. Furthermore, the inferen’ Although subsequent work by Touretsky (1986) has remedied certain heritance, the use of discrete markers still precludes dealing with conflicting ing inheritance and the partial match problem during recognition.
problems with ininformation dur-
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
341
tial ability of the proposed system far exceeds the simple pattern completion or pattern association ability of the above mentioned models (see section 3.1). 3. SEMANTIC NETWORKS, INHERITANCE, AND RECOGNITION Since their introduction by Quillian (1968) semantic networks have played a significant role in knowledge representation research. Semantic networks express knowledge in terms of concepts, their properties, and the hierarchical sub/superclass relationship between concepts. Each concept is represented by a node and the hierarchical relationship between concepts is depicted by connecting appropriate concept nodes via IS-A or INSTANCE-OF links. Nodes at the lowest level in the IS-A hierarchy denote individuals (Tokens) while nodes at higher levels denote classesor categories of individuals (Types).3 As one moves up the IS-A links concepts get more abstract. Properties are also represented by nodes and the fact that a property applies to a concept is represented by connecting the concept and property nodes via an appropriately labeled link. Typically, a property is attached at the highest concept in the conceptual hierarchy to which the property applies, and if a property is attached to a node C it is assumed that it applies to all nodes that are descendants of C. The term “semantic networks” has been used in a far more general sense in the literature. We will however, only focus on those aspects of semantic networks that have been mentioned above namely, the description of objects in terms of their properties and the organization of concepts using the IS-A hierarchy. This characterization is broad enough to capture the basic organizational principles underlying frame-based representation languages such as KRL (Bobrow & Winograd, 1977), and KL-ONE (Brachman & Schmolze, 1985). The organization and structuring of information in a semantic network leads to an efficient realization of two kinds of inferences namely, inheritance and recognition. 3.1 Significance of Inheritance and Recognition Inheritance is the form of reasoning that leads an agent to infer properties of a concept based on the properties of its ancestors. For example, if the agent knows that “birds fly,” then given that “Tweety is a bird,” the agent may infer that “Tweety flies.” In general, inheritance may be defined as the process of determining properties of a concept say C, by looking up properties locally attached to C, and if such local information is not available, by ’ The distinction between IS-A and INSTANCE-OF
links is being suppressed deliberately.
342
SHASTRI
looking up properties attached to concepts that lie above C in the conceptual hierarchy. Recognition is the dual of the inheritance problem. Unlike inheritance, which seeks a property value of a given concept, recognition seeks a concept that has some specified property values. The recognition problem may be described as follows: “Given a description consisting of a set of properties, find a concept that best matches this description.” Notice that the properties of concepts are not necessarily available locally at the concept, and may have to be determined via inheritance. For this reason, recognition may be viewed as a very general form of pattern matching: one in which the target patterns, that is, the set of patterns to which an input pattern is to be matched, are organized in a hierarchy, and where matching an input pattern A with a ‘target pattern TI involves matching properties that appear in A with properties local to T, as well as to properties that T, inherits from its ancestors. The stance taken in this work is that inheritance and recognition are important forms of limited inference. It can be argued that these two complementary forms of reasoning lie at the core of intelligent behavior and act as precursors to more complex and specialized reasoning processes. Figure 2 is intended to illustrate the role of inheritance and recognition. The top of Figure 2 represents the semantic memory component of an agent’s knowledge. The bottom half depicts the two broad classes of queries that are posed to the semantic memory by other cognitive processes. Entry point 01corresponds to recognition queries. A recognition query corresponds to a request for generalized pattern matching and is initiated whenever a mental process has a partial description of an unknown entity “X” and wants to ascertain the identity of “X” or the class to which “X” belongs. Here, “X” may be a physical object, the precondition part of a production rule, a schema, an action, and so forth. Entry point 0 corresponds to inheritance queries. Here some internal process knows the identity or class of “X” but wants to ascertain the value of some property of “X.” Diverse cognitive tasks, such as commonsense reasoning, word sense disambiguation, determination of case-fillers, and enforcement of selectional restrictions, can be shown to require inheritance as an intermediate step. Given the restriction: the filler of the instrument case for the verb to cut should have sharpness as one of its properties, one can easily see how this restriction may be enforced by posing an inheritance query: “Is sharpness a property of X” and accepting or rejecting “X” as a filler of the case instrument depending on whether the answer to the query is “yes” or “no.” In many situations, some internal process may have a partial description of an unknown entity and may need to determine the value of some property of this entity. This problem may be solved by first performing a recognition step to determine the identity of “X” or its class membership, and then per-
A CONNECTIONIST
APPROACH
Semantic
TO KNOWLEDGE
Network
RKxxrrnrrr
[X is-a C]
c’s are conce
NHEflWKJZ
Pts
)
t
t Entry’
343
(Memory)
0
[partial information about X ]
REPRESENTATION
Entry
point
point P
a Figure
2. Significance
of inheritance
and
recognition
forming an inheritance step to determine the value of the required property of “X.” Queries that combine a recognition step and an inheritance step perform a generalized form of pattern completion. In addition to their ubiquity, inheritance and recognition are also significant because humans perform these inferences effortlessly and extreme/’ fast, often in a few hundred milliseconds. This suggests that inheritance and recognition are perhaps basicand unitary components of symbolic reasoning, probably the smallest and simplest cognitive operations that (a) produce specificresponses,and (b) can be initiated, and have their results accessed by complex and higher-level symbolic reasoning processes. The speed with which these operations are performed also suggests that they are performed fairly automatically, and typically do not require any conscious and attentional control.’ ’ Atypical situations being those that involve a high degree of ambiguity or those where an answer is unavailable. In such cases, conscious intervention would probably occur.
344
SHASTRI
3.2 Need for an Evidential Formalization of Semantic Networks In this section we briefly discuss some of the existing formalizations of semantic networks and point out the need for an evidential formulation. A detailed discussion appears in Shastri (in press). A formalization of inheritance and recognition is confounded by the nature of knowledge associated with natural concepts. There is a preponderance of situations in which we associate a property with a concept, although -strictly speaking, the property may not hold for all members of the class. For example, one considers it natural to associate the property of being able to fly with birds, knowing fully well that all birds do not fly. Such default properties play an important role in commonsense reasoning. For example, it seems natural to infer “Tweety flies” given that “Tweety is a bird,” even though the ability to fly is only a default property of birds. A second source of complication arises because often there exist several alternate but equally useful hierarchical organizations of concepts in a domain. Consequently, it becomes more natural to organize concepts in the form of multiple hierarchies, wherein a concept may have more than one parent. The above factors lead to exceptionsand conflicting multiple inheritance situations. For example, an agent may believe that Birds fly, Penguins are birds, and Penguins do not fly. This leads to an exceptional situation. Similarly, an agent may believe that Quakers tend to be pacifists, Republicans tend to be nonpacifists, Dick is a Quaker, and Dick is a Republican. This leads to an ambiguous case of multiple inheritance because Dick may inherit the property “pacifist” from Quakers and at the same time inherit the property “nonpacifist” from Republicans! Exceptions and conflicting multiple inheritance in semantic networks give rise to nonmonotonicity and ambiguity, neither of which can be handled within first-order logic (FOPC). Consequently, FOPC-based formalizations of semantic networks (Charniak, 1981; Hayes, 1979), and many representation languages such as KL-ONE that conform to such an interpretation cannot deal with exceptions or multiple inheritance situations. At the same time, the translation to FOPC does not explain how the information encoded in a semantic network may be used to solve recognition problems. Recognizing the limitations of first-order logic in formalizing semantic networks with multiple hierarchies and exceptions, Etherington and Reiter (1983) proposed a formalization based on default logic (Reiter, 1980). Their proposal handles exceptions, but its treatment of multiple inheritance remains unsatisfactory. With reference to the Quaker example, the proposal amountsto randomly choosing between “Quakers tend to be pacifists” and “Republicans tend to be nonpacifists” and inferring “Dick is a pacifist” or “Dick is a nonpacifist,” depending on this choice. Our intuitions, however, suggest that in drawing conclusions about Dick, both the statements-
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
345
“Quakers tend to be pacifists,” and “Republicans tend to be nonpacifists” -are relevant, and hence, both must affect the final conclusion-in general, the final conclusion should reflect the combined effect of all the relevant information. The limitation of default logic lies in the assumption that all assertions of the form “Quakers tend to be pacifists” have the same import. This assumption is inappropriate in many cases. For instance, an agent may believe that the tendency of Quakers to be pacifists outweighs the tendency of Republicans to be nonpacifists. In which case it may be appropriate to infer that Dick is a pacifist. The need to combine relevant information and weigh the relative import of available information becomes even more apparent if we consider the following: Suppose we add to the agent’s beliefs that “Dick took part in antiwar demonstrations.” It now seems even more appropriate to infer that Dick is a pacifist. Touretzky’s (Touretzky, 1986) proposal based on the Principle of Inferential Distance Ordering provides a precise specification of what inferences should be drawn by an inheritance hierarchy in situations involving exceptions. His formalism is also an improvement on the Etherington and Reiter proposal in that it makes explicit the inferential significance of IS-A links. Touretzky however, does not solve the problem of combining information from disparate sources, and his system would report an ambiguity in the Quaker example. In order to deal with situations that involve conflicting information it is necessary to adopt an epistemologically richer representation that allows one to represent the relative import of rules. One possibility is to treat assertions such as “Quakers tend to be pacifists” and “birds fly” as evidential assertions. Thus, “birds fly” may be interpreted to mean that if “x is a bird” then there is some evidence Q that “x flies”, and also that “if x flies” then there is some evidence /3 that “x is a bird.” In the general case, if we are told that an object is flying then we would have varying degrees of evidence that it is a bird, an airplane, a frisbee. . . . Similarly, if we are told that an object is a bird, then we would have varying degrees of evidence that its preferred mode of transportation is flying, swimming, walking, and so forth. 3.3 An Overview of the Evidential Formulation Within an evidential formulation, finding solutions to inheritance and recognition problems. would amount to choosing the most likely alternative from among a set of alternatives; the computation of likelihood being carried out with respect to the knowledge encoded in the conceptual hierarchy. For example, an inheritance problem would not be posed as: “What is the mode of transportation of Tweety”? but rather as: “On the basis of available information, is the mode of transportation of Tweety most likely to be running, swimming, or flying.”
SHASTRI
346
In general, the evidential mode of reasoning may be viewed as decision making that involves choosing from among a set of mutually exclusive hypotheses. The important steps are: (i) (ii) (iii)
combining evidence provided by relevant evidential assertions, computing the likelihood of competing hypotheses bLed on the above, and choosing the most likely hypothesis.
The notion of likelihood can be made more precise as follows: Let al, a2,. . . a, be the competing hypotheses and let K be the agent’s knowledge. With each alternative, ai, associate a measure of likelihood, Ii, given by the ratio of the number of interpretations of the world that are consistent with K and that also satisfy ai, to the number of interpretations of the world that are consistent with K. Thus an interpretation ai is more likely than aj if, and only if, from among the interpretations of the world that satisfy K, a greater fraction of interpretations satisfy ai than aj. Furthermore, the most likely hypothesis is the one that is consistent with the greatest fraction of interpretations that satisfy K.’ In order to reformulate the problems of inheritance and recognition in terms of evidential reasoning, the traditional semantic network representation was extended to include evidential information. 4 AN EVIDENTIAL
REPRESENTATION
LANGUAGE
The proposed evidential representation language lies between two extremes characterized by Traditional semantic networks with no probabilistic information.
-----A> ------> -----.Z>
Representations that encode all conjunctive conditional probabilities.
In addition to supporting the specification of necessary properties, the language also allows one to associate default or evidential properties with concepts. This evidential information is encoded in terms of relative frequencies that specify how instances of certain concepts are distributed with respect to certain property values. Even such a limited use of evidential information leads to a principled treatment of exceptions and conflicting multiple inheritance situations during inheritance, and the partial-match ’ The evidential formulation does not suggest that all decisions should be based solely on the likelihoods of possible outcomes; the most likely outcome always being preferred. In many situations the agent may want to take into account the utilities of various outcomes that is, evaluate the cost and benefit of choosing a particular action. Then, again he or she may choose to take risks or be conservative, or he may adopt some other strategy. The contention is that any strategy (unless it simply involves making random choices) will perforce require the knowledge of the likelihoods of the possible outcomes.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
347
problem during recognition. Before describing the language, let us briefly consider the underlying intuitions. 4.1 Components of a Conceptual Structure A cognitive agent interprets the external world in terms of conceptual attributes and their associated values. In addition to the conceptual attributes such as “has-color” (with values: red, blue, purple, etc.) and “has-texture,” knowledge structuring relations such as IS-A and PART-OF are also considered to be conceptual attributes (henceforth, simply attributes). Although attributes need not be unstructured entities-for instance, “has-color” may be defined in terms of “subattributes” such as “has-hue,” “has-brightness,” “has-saturation,” but in our restricted representation language we are assuming that all attributes are primitive. Concepts are labeled collections of [attribute, value] pairs. Attribute values are also concepts and hence, concepts may be arbitrarily complex. This definition does not imply circularity because some concepts are grounded in perception while some others are assumed to be innate. Attributes may be classified into two broad categories: properties and structural links. Structural links provide the coupling between structure and inference. They reflect the epistemological belief that world knowledge is highly organized and that much of this structure can be factored out to provide general domain independent organizational strategies that in turn lead to efficient inference. Each structural link embodies one such organizational strategy. If we map knowledge onto a data structure (or a physical device) so that structural links get represented explicitly as arcs in the data structure (or interconnections in the physical device), then these arcs (or interconnections) provide hard-wired, and hence, efficient inference paths. The most representative structural link is the IS-A link. But, one can extend the notion of inheritance to include other structural links such as the is-a-part-of, and occurs-during links (Allen, 1983; Schubert, Papalaskaris, & Taugher, 1983). Properties relate to the intrinsic features of concepts and may vary from one domain to another. They correspond to the notion of “roles” of KLONE and “role nodes” of NETL. But the interpretation of a property and its associated values is different from any of the above approaches because of the evidential information associated with property values. We stated that a concept is a labeled collection of [attribute, value] pairs. As it stands, this statement is too undiscriminating and does not indicate which collections of [attribute, value] pairs ought to be labeled a concept. To answer this question we re-examine the notion of concepts. Concepts may be classified into Tokens and Types. An agent interprets the world as consisting of instances, and collections of [attribute, value] pairs that are perceived to correspond to an instance are represented as Tokens in the agent’s conceptual structure.
348
SHASTRI
In order to deal with a complex environment an agent must impose some structure on the external world. A way of achieving this is to record similarities between objects and to make suitable generalizations based on these. Qnce recorded these generalizations may be exploited to categorize novel objects and to make predictions about their properties. Types serve exactly this purpose. Types are abstractions defined over Tokens that capture useful generalizations about a number of Tokens. These generalizations are represented by appropriate [attribute, value] pairs. For example, the Type ELEPHANT may include the value GRAY for the property has-color to indicate that: “most elephants are gray,” while the Type APPLE may include the values RED, GREEN, and YELLOW to indicate that: “apples may be red, green or yellow”6. Simply associating property values with Types, however, does not suffice. The agent may want to make finer distinctions such as: “An apple is more likely to be red than green” and “a red colored object is more likely to be an apple than a rose,” and use such information to recognize things and predict their properties. One way of capturing these distinctions would be to store frequency distributions of concepts with respect to certain property values. Thus, instead of “Apples are red, green or yellow,” an agent’s conceptual representation may hold: “60% of all apples are red, 30% are green, and 10% are yellow.” In this work we have pursued this possibility in depth. This approach involves an oversimplification, for clearly, there are situations in which an agent may have to encode an evidential relationship between a concept and an attribute value without the knowledge of any frequency distributions (see section 8.5). For now we will assume that the agent determines evidential strengths solely on the basis of its knowledge of certain frequency distributions of concepts with respect to their property values, and investigate how such information may be used to reformulate the inheritance and recognition problems in order to deal with conflicting and partial information. The process of abstraction and differentiation applied to Types leads to a hierarchical structure. In general, multiple hierarchies may be defined over the same set of Tokens. For example, one may define a hierarchy over physical objects based on the function they perform and at the same time one may classify such objects according to their form (i.e., their appearance). The consequence of having multiple hierarchies is that a concept may have multiple ancestors in the conceptual structure. 4.2 Relation Between Concepts and Property Values An agent could represent the fact that certain objects are red in color by positing a concept RED-THING-which is completely described by { [has‘ This and other descriptive statements made in this paper are meant to be illustrative and do not make claims about the exact nature of the real world.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
349
color, RED] }-and making all red objects an instance of this concept. Alternately, he could just associate the [property, value] pair [has-color, RED] with concepts that represent red colored objects. The choice of designating any [property, value] pair(s) to be a concept is always available to an agent, but doing so involves a clear-cut tradeoff: On the one hand, it requires a commitment of additional computational resources such as nodes, links, processing elements, and so forth, but on the other hand, it makes it easier to draw certain inferences. For example, introducing the concept REDTHING would make it easier to record and retrieve general facts about red objects but embedding this concept in the conceptual structure would require additional links and nodes. 4.3 A Formal Description of the Representation Language Formally, an agent’s a priori knowledge consists of the sextuple: El=(C, a, x, #, 6, PACIFIST] node. The site INV (for inverse) receives inputs to compensate for extraneous activation incident along 1 links. This is required to ensure that nonlocal information about property values does not affect the computations at a concept when relevant information is available locally. Besides the interconnections described above, all nodes except a-nodes can have an external link incident at the site QUERY, with a weight of 1.O. &nodes do not receive any external inputs; they capture associations between concepts and are internal or hidden nodes.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
369
RELAY CP
B
HCP
OUERY b
J4P.Vil WP.Y 1
CP
A
HCP
6(AP) and .5(&P) are known, and there exists no C such that A-xC-xBand 6(C,P) is known. Figure
12.
Encoding
of &h,-nodes-11
6.2 Computational Properties of Nodes Each node in the network can be in one of two states: active or inert. The quiescent state of each node is inert. A node switches to the active state under conditions specified below, and in this state, transmits an output equal to its potential. There is a distinction between a node transmitting no output (a nil output) and a node transmitting an output of magnitude 0. The computational characteristics of various node types are as described below: C-nodes: State: Potential:
Node is in active state if it receivesone or more inputs. If no inputs at site HCP then the product of inputs at sites potential = QUERY, RELAY, CP, and PV divided by the product of inputs at site INV. else potential = the product of inputs at sites QUERY, RELAY, HCP
370
SHASTRI
&nh-IlOdeS:
State: Potential:
Node switches to active state if and only if it receivesall three inputs at site ENABLE. If state = active then potential = 1.0 * the product of inputs at site EC else potential = NIL
&,-nodes: Node switches to active state if and only if it receivesall three inputs at site ENABLE. Once active, the node remains in this state-even if the input from RECOGNIZE is withdrawn-as long as it continues to receive inputs from the property and value nodes. If state = active then potential = 1.O else potential = NIL
State:
Potential:
$-nodes, INHERIT node, and RECOGNIZE node switch to active state if they receive input at site QUERY, and in this state their potential always equals 1.0. The networks have the additional property that unlike other links that always transmit the output of their source node, the t and I links normally INV
OIJERY
OUERY Vi
P PV
RELAY
S(4P) and L(B,P) are known, and there exisls no C such that A NON-PAC] nodes: 1.0 These nodes are also in the active state because they receive inputs from QUAK, has-bel, and INHERIT nodes. As these nodes receive no inputs at site EC, their potential equals 1.0 [PERSON, has-be1-> PAC] This node also reaches the active state because it receives inputs from PERSON, has-be& and INHERIT nodes. As this node also receivesinputs at site EC from [QUAK, has be1- > PAC] and [REPUB, has-be1-> PAC], its potential equals 1.0x
1 x #QUAK[has-bel, PAC] x 1 x #QUAK #PERSON[has-bel, PAC] #REPUB #RBPUB[has-bel, PAC] #PERSON[has-bel,PAC]
= #QUACK[has-bel,PAC] x #RBPUB[has-bel,PAC] (#PERSON[has-bel,PAC])2 By symmetry (replacing PAC by NON-PAC), PERSON,has-be1 -> NON-PAC]:
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
375
#QUAK[has-bel,NON-PAC] x #REPUB[has-bel,NON-PAC] (#PERSON[has-bel,NON-PAC])2 PAC and NON-PAC The PAC node receives an input from BELIEF at site RELAY, an input from [PERSON, has-be1-> PAC] at site HCP, and inputs from [QUAK, has-be1 ->PAC] and [REPUB, has-be1 ->PAC] at site CP. However, because site HCP receives an input, the inputs at site CP are ignored. Therefore its potential is: PAC: output of [PERSON,has-be1-> PAC] x
#PERSON[has-bel,PAC] x #PAC
output of BELIEF x - #PAC #BELIEF = #QUAK[has-bel,PAC] x #REPUB[has-bel,PAC] #BELIEF x #‘PERSON[has-bel,PAC] NON pAc. #QUAK[has-be&NON-PAC] x #REPUB[has-bel,NON-PAC] ; . #BELIEF x #PERSON[has-bel,NON-PAC] analogous to PAC. Ignoring the common divisor, #BELIEF, in the potentials of the nodes PAC and NON-PAC, the potential of the node PAC corresponds to the best estimate of the number of people that are both Quakers and Republicans but subscribe to pacifism while the potential of the node NON-PAC corresponds to the best estimate of the number of people that are both Quakers and Republicans but subscribe to nonpacifism. Hence a comparison of the two potentials will give the most likely answer to the question: Is Dick a pacifist or a nonpacifist? The Fruit Example. We illustrate how the network computes a solution to the recognition problem with reference to the network in Figure 15. The network is intended to depict the following information: Fruits and vegetables are kinds of edible things. Grapes and apples are kinds of fruits. Root vegetables are a kind of vegetable, and Beet is a root vegetable. Red and green are two values of has-color, while sweet and sour are two values of has-taste. Edible things have the property has-color and has-taste associated with them. The distribution for the property has-taste is known for fruits, grapes, and vegetables, while the distribution with respect to the property hascolor is known for fruits and beets. The network encodes the above information except that the &-nodes and links associating SOUR and GREEN nodes to appropriate nodes in the hierarchy have been omitted as they do not play a role in this, example and if included, would make the diagram hopelessly complicated.
SHASTRI
#FRUl m
I\
RELAY, Pv, INV and
E are input
Not all sites
been
have
Figure
sites.
marked. 15. An example
network
far
recognition
Consider the recogition problem: Is a red sweet object an apple, grape or a beet? That is, C-SET = {APPLE, GRAPE, BEET}, REF =ED-THING, DESCR= { [has-color, RED], [has-taste, SWEET]}. Notice that this recognition probelm involves local as well as inherited information. For example, information about the color of GRAPE has to be inherited from FRUIT but specific information about the taste of GRAPE is available locally and must override the more general information available at FRUIT. To solve the above problem, the network is initialized by setting the external inputs of RECOGNIZE, has-taste, has-color, RED, and SWEET to 1.0. After one step the external input to RECOGNIZE is withdrawn, but the external input to ED-THING is set to 1.O and.the I links at ED-THING
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
377
are enabled. The potentials of some of the relevant nodes after d +2 time steps are give below. Recall that a &,-node must receive all three of its inputs at site ENABLEin order to become active. ED-THING:
1.O
All &odesshown in Figure 15 will be active and their poter.,ial will be 1.O. FRUIT = potential of ED-THING x
#FRUIT x #FRUIT[has-colorRED] x #ED-THING #FRUIT
#FRUIT[has-tasteSWEET] #FRUIT = #FRUIT[has-color,RED] x #FRUIT[has-tasteSWEET] #FRUIT x #ED-THING VEGGIE = potential of ED-THING x
#-VEGGIE x #ED-THING
#VEGGIE[has-tasteSWEET] = #VEGGIE[has-taste,SWEET] #VEGGIE #ED-THING GRAPE =potential of FRUIT
x
#GRAPE x #GRAPE[has-taste,SWEET] x #FRUIT #GRAPE
#FRUIT #FRUIT[has-tasteSWEET] =#FRUIT[has-colorRED] x #GRAPE[has-taste,SWEET] #FRUIT x #ED-THING APPLE #APPLE #‘FRUIT
=potential of FRUITx
= #FRUIT[has-tasteRED] x #FRUIT[has-taste,SWEET] x #APPLE (#FRUIT)2 x #ED-THING ROOTV = potential of VEGGIE
x
#ROOTV #VEGGIE
= #VEGGIE[has-taste,SWEET] x #‘ROOTV #VEGGIE x #ED-THING
SHASTRI
378
BEET = potential of ROOTV x
#-BEET #ROOTV
x #BEET[has-color,RED] #BEET
= #VEGGIE[has-taste,SWEET] x #BEET[has-colorRED] #VEGGIE x #ED-THING Ignoring the common divisor, #ED-THING, in the potentials of nodes GRAPE, APPLE, and BEET, the potential of the node GRAPE corresponds to the best estimate of the number of red and sweet grapes, the potential of node APPLE corresponds to the best estimate of the number of red and sweet apples, while the potential of the node BEET corresponds to the best estimate of the number of red and sweet beets. Hence a comparison of the three potentials will give the correct answer to the question: Is a red and sweet edible thing an apple, a grape, or a beet? 6.5 Simulation In order to illustrate the nature of inferences drawn by the evidential formulation, several examples cited in the knowledge representation literature have been simulated. In each case, the network was generated automatically by a compiler. The input to the compiler specified (a) the set of concepts, (b) the set of properties and their associated values, (c) the partial ordering together with the ratios #A/#B for all pairs A and B such that B is a parent of A, and (d) distributions g(C,P) in terms of #C[P,V]‘s. The first example is an extension of the “Quaker example.” Figure 16 depicts the information that is encoded in the network. There are two properties has-be1 (has-belief) with values PAC (pacifist) and NON-PAC (nonpacifist), and has-eth-org (ethnic-origin) with values AFRIC (african) and EURO (european). In broad terms, the information encoded is as follows: Most Persons are nonpacifists. Most Quakers are pacifists. Most Republicans are nonpacifists. Most Persons are of European descent. Most Republicans are of European descent. Most Persons of African descent are Democrats. As our first query, consider the inheritance question: Is Dick a pacifist or a nonpacifist? If the potentials are normalized so that the highest potential equals 1.00, we obtain: PAC = 1.OOand NON-PAC = 0.66. Thus, on the basis of available information, Dick who is a Republican and a Quaker is more likely to be a pacifist; the ratio of the likelihoods of his being’a pacifist to his being a nonpacifist is 3:2 Similar simulations for Rick, Pat, and Susan lead to the following results. l
Rick, who is a Mormon Republican, is more likely to be a nonpacifist. The ratio of pacifist versus non-pacifist for Rick being 0.39 versus 1.00.
A CONNECTIONIST #PERSON #PERSON #PERSON #PERSON
APPROACH
TO KNOWLEDGE
REPRESENTATION
= 200 [has-bel, PAC] - 60 [has&l, NON-PAC] = 140 AFRIC] = 40 [has-eth-erg,
379
Values:
PAC, NON-PAC
Values:
AFRIC,
REL-PER
EURO
POL-PER r
#CHRIST = 60 #CHRIST[has+el, #CHRIST[has-bel,
#QUAK[has-bet, #QUAK[has-bet,
PAC] = 24 NON-PAC] = 36
PAC] = 7 NON-PAC]
= 3
RICK +#REPUB #REPUB #REPUB #REPUB #REPUB
= 60 [has-bel, PAC] = 16 [has-bel, NON-PAC] = 64 [has-eth-org, AFRIC] = 5 [has-eth-org, EURO] = 75 Figure
l
l
++#oEMOC = 120 #DEMOC[has-bet, #DEMCC[has-bel, #DEMOC[has-eth-org, #DEMCC[has-ethwg, 16. The Quaker
PAC] = 44 NON-PAC] = 76 AFRIC] = 35 EURO] = 65
world
Pat, who is a Mormon Democrat, is also more likely to be a nonpacifist, but only marginally so. The ratio being 0.89 versus 1JO. Finally, Susan, who is a Quaker Democrat, is very likely to be a pacifist, the ratio being 1.OOversus 0.29.
As an example of recognition,
consider the queries:
“Among the following persons, who is most likely to be a pacifist of African descent: Dick, Rick, Susan, or Pat? . ‘1. . . who is most likely to be a nonpacifist of European descent?”
l
380
SHASTRI
These queries lead to the following final potentials, which are shown here after normalization: [has-be1PAC] [has-eth-org APRIC] Potential Person SUSAN 1.00 PAT 0.57 DICK 0.11 RICK 0.05
[has-be1NON-PAC] [has-eth-org EURO] Person Potential RICK 1.00 PAT 0.59 DICK 0.50 SUSAN 0.30
As would be expected, Susan, who is a Quaker and a Democrat, best matches the description “Person of African descent with pacifist beliefs,” while the person least likely to match this description turns out to be Rick. The latter also appears intuitively correct: Democrats correlate well with African origin and Quakers correlate well with pacifism, but Rick is neither a Democrat nor a Quaker. Rick, however, turns out to be the most likely “Person of European descent with nonpacifist beliefs.” This appears to agree with Rick’s being a Republican and a Mormon (i.e., a non-Quaker). In order to explicate how exceptions are handled, the information depicted in Figure 17 was encoded in a network. The information may be paraphrased as follows: Most Molluscs are shell-bearers. All Cephalopods are Molluscs, but most Cephalopods are not shell-bearers. All Nautili are Cephalopods, and all Nautili are shell-bearers.
The property “epidermis-type” with values: SHELL, SKIN, FUR, and FEATHER was used to encode the example. The normalized potentials of SHELL and SKIN as a result of the inheritance of the property epidermistype for MOLLUSC, CEPHALOPOD, and NAUTILUS are as follows: (the potentials of FUR and FEATHER were consistently 0.0): VALUE SHELL SKIN
MOLLUSC 1.00 0.43
CEPHAL 0.25 1.00
NAUTILUS 1.00 0.00
Thus, a Mollusc is more likely to be a shell-bearer, a Cephalopod is not likely to be a shell-bearer, and a Nautilus is definitely a shell-bearer. Notice that the likelihood of a Nautilus having an epidermis-type other than shell computes to 0.00, which is exactly what should be expected given that ALL Nautilus are shell-bearers.
7 CONSTRAINTS
ON THE CONCEPTUAL
STRUCTURE
This section enumerates and discusses the constraints that must be satisfied by the conceptual structure so as to arrive at an efficient connectionist realization of the evidential formalization presented in section 5. These constraints
A CONNECTIONIST APPROACH TO KNOWLEDGE REPRESENTATION
331
SH
#MOLLUSC = 100 #MOLLUSC[epidermis-type, #MOLLUSC[epidermis-type,
&NAUTILUS = 5 #NAUTILUS[epk!ermis-type, #NAUTILUS[epidermis-type, Property: epidermis-type Values: SHELL, SKIN, FUR, FEATHER In all S(C,P)‘s, the distribution for FUR and FEATHER are uniformally 0. and hence, are not indicated. Figure
SHELL] = 70 SKIN] = 30
SHELL] = 5 SKIN] - 0
CFlEFl-1 17. The Moll~~sc
world
will be described in the context of a particular conceptual structure namely the Multiple Views Organization. 7.1 Multiple Views: A Proposal for Structuring In the proposed scheme, concepts are organized Figure 18). The top most tier consists of a pure ontological tree. This tree classifies the universe
Concepts in a three-tier structure (see taxonomy and is called the of concepts into several dis-
.
SHASTRI
tier 1
. . .
tier 2
tier 3-
hm,
tkl
tnl
t, 4
tl, . . . . t,,,,,, are tokens. A token may have multiple parents but at most one parent per view. WI, . . . . w, are leaves of the ontological tree. Hi,, . . . . Hi,,, are qi views defined over tokens of ontological type wi. Figure
lg.
The
Multiple
Views
Organization
tinct ontological fypes, where any two ontological types represent fundamentally different sorts of things. These may correspond to cafegories of Aristotle or the ontological categories suggested by Jackendoff (1983). Keil (1979; and Sommers, 1965), has argued extensively in support of a hierarchical structure composed of ontological categories such as: Thing, Physical object, Solid, Aggregates, Event, Functional artifact, Animal, Plant, Human, and so forth. Ontological categories are derived using the principle of predictability, which says that different predicates apply to different sorts of things (predicates correspond to properties in our terminology), and one may classify things according to the predicates that apply, or do not apply, to them. Based on earlier work by Sommers, Keil proposes constraints that require that the ontological categories form a strict taxonomy, In the Multiple Views Organization it is envisaged that the leaves of the ontological tree are Types such as: Animal, Instrument, Furniture, Color,
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
383
Taste, and so forth. These concepts roughly correspond to the superordinate categories of Rosch (1975), which appear to have the right level of complexity to be the leaves of the ontological tree. It is assumed that the applicability of properties is entirely specified within the ontological tree and no new properties become applicable at tiers II and III. Note that this restriction applies to properties and NOT to property values. The third, or the lowest tier of the conceptual structure consists of Tokens. The second tier consists of a number of taxonomies called views-where each view is a distinct classification of the underlying Tokens. The root of a view is a leaf of the ontological tree and the leaves of each view are Tokens. Many views may have the same leaf of the ontological tree as their root, and therefore, there may be multiple views that have the same Token as one of their leaves. The latter implies that Tokens may have multiple parents, but each parent must lie in a distinct view. The organization suggested above offers advantages permitted by tangled hierarchies because it allows Tokens to have multiple parents, but retains certain tree like characteristic that helps in simplifying the interactions between information represented in the conceptual structure. 7.2 Constraints and their Plausibility Broadly speaking, the constraints required to arrive at a connectionist encoding fall into four categories. The constraints in the first category identify the distributions (i.e.; the 6’s) that should be stored by an agent. WFR-cs-1: If a property value of a concept is exceptional and if the ability to
correctly predict this value is crucial for the agent then it must remember this exception. (This constraint has already been mentioned in section 5.4.) WFR-cs-2: If distributions for property P are stored at concepts in a view H, then such distributions should be stored at enough concepts in H so that for every Token Ci under H, there exists an ancestor, Bi, within H, such that G(Bi,P)is known. WFR-cs3: If a concept w is a leaf of the ontological tree and P applies to w then the agent should store 6(o,P).
The second group of constraints restrict the sorts of recognition queries that may be posed. WFR-ret-1: WFR-ret-2:
All the candidate answers in a recognition query should either be descendants of the same ontological Type, or they should be ontological Types. In order to ensure that the network incorporates information from all the views while solving the recognition problem, it is required that the reference concept REF should lie in the ontologicaI tree.
384
SHASTRI
WFR-ret-1 is not unduly restrictive, although it does rule out queries such as: “Is a cat, water, a story, or red? WFR-ret-2, however, is quite significant. It implies that the choice of REF determines what information will play a role in solving a recognition problems. With reference to the Multiple Views Organization, the condition states that if the reference concept lies within one of the views then only information from that view will be used in solving the recognition problem-information from other views will be ignored even though it may be relevant. Consider the Quaker example (refer to Figure 16). In this example, one of the views is organized along religious divisions while the other along political affiliations. WFRret-2 states that if the recognition problem is posed as: “Name a politician who is a pacifist,” then only information at “Democrat” and “Republican” will be used in selecting an answer and the information at “Christian,” will be ignored. The converse will happen if the “Quaker,” and “Mormon” problem is posed as: “Name a Christian who is a pacifist.” Information from both the views will be used only if the question is posed as: “Name a Person who is a pacifist.” This constraint may shed some light on why sometimes people ignore relevant information and arrive at anomalous conclusions (see section 8.2). The third set of constraints places restrictions on properties and their values. These constraints are required to establish convergence of the connectionist network. Property values, and the concepts they apply to, should belong to distinct ontological Types, that is, a property should not be applicable to its own values. WFR-cs-5: Distinct properties should have distinct values. WFR-cd:
WFR-cs-4 and WFR-cs-5 seem reasonable if we restrict ourselves to perceptual properties and certain concepts that correspond to natural kinds. Consider the values of properties such as has-taste, has-shape, has-color on the one hand, and the concepts they apply to on the other. It is easy to convince oneself that the values of has-taste (SWEET, SOUR. . .), has-shape (ROUND, SQUARE. . .), and has-color (RED, GREEN. . .) fall into distinct classes and furthermore, the values of these properties are not subconcepts or superconcepts of objects that these properties apply to. Conditions WFR-cs-4 and WFR-cs-5, however, become restrictive if we wish to consider “properties” such as has-father or has-uncle, because the concepts these properties apply to (the sons and nephews) as well as the values of these properties (the fathers and the uncles) belong to the same ontological Type. But for the constraints WFR-cs-4 and WFR-cs-5, the proposed connectionist encoding could handle arbitrary nary relations, This can be seen by recognizing that there is a simple correspondence between relations and concepts: Each relation corresponds to a concept and each argument of a
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
385
relation corresponds to a property of that concept. For example, the representation of the two place relation ON may be viewed as a concept with two properties (arguments): on-fop and on-bottom. The constraints WFR-cs-4 and WFR-cs-5, however, preclude relations whose arguments have overlapping domains. The following constraint places restrictions on the partial ordering with reference to 6. This constraint, though directly satisfied by the Multiple Views Organization, is significant and is therefore mentioned here: WFR-1x-6: For the Memory Network to find the inherited value of property P of concept C, the conceptual structure must be such that the
ordering induced by < < on C/C,P results in a tree. The condition WFR-csd is less restrictive than it may appear. In particular, it does NOT require all concepts in C to be organized as a tree. The condition only requires that the ordering graph induced by < < on the projection C/C,P be a tree. It should be observed that WFR-csd admits much more complex conceptual organizations than the Multiple Views Organization described above.
8 DISCUSSION This section discusses some significant features of this work, points out its limitations, and lists some unresolved issues. 8.1 Structure and Effective Inference Besides resulting in a computationally effective evidential formalization of semantic networks, adopting the connectionist approach made explicit the direct correspondence between the structure of knowledge and effective inference. This correspondence may be stated as follows: Having information is not sufficient to guarantee its utilization; for it to be used effectively in the reasoning process, information must be represented in an appropriateform and occupy an appropriate place in the conceptual structure. Knowledge about property values of concepts may be represented and used very efficiently provided, it is expressed in terms of #C[P,V]‘s. If significant information about correlations betweenproperty values of a concept is available and if this information is to be used in drawing extremely fast inferences, then new concepts must be created so that this information may be expressed in terms of G(C,P)‘s. For example, in order to make effective use of the information “most red apples are sweet,” one must either introduce a node “red apple” and attach the appropriate information about
386
SHASTRI
the taste of such apples to it, or alternately, introduce a node “sweet apple” and attach the information about the color of such apples to it.’ The above requirement seems reasonable given that whenever a large amount of significant information about some concept is available, we specialize the concept in order to encode this information better. All specialized domains have multiple concepts to represent that which is represented by a single concept in common parlance. Thus, whereas it may be easy to apply the term “rock” to a large class of “relatively hard naturally formed mass of mineral or petrified matter,” a geologist has numerous concepts to capture the subtle distinctions and interrelations in the properties of such substances. It would be difficult to make effective use of these interrelations unless they were encapsulated into appropriate concepts. It follows that a criterion for creating concepts may be based on an information-theoretic measure that evaluates the effectiveness of a conceptual structure vis-a-vis its ability to predict properties of objects (inheritance) and classify objects based on partial information (recognition). For work along these lines refer to Gluck and Corter (1985). 8.2 Structure and Anomalous Inference In addition to the remarkable speed with which humans draw certain inferences, commonsense reasoning is also interesting in that it often produces anomalous results (Rips & Marcus, 1977; Tversky & Kahenman, 1983). People often ignore relevant information. Thus, one may set out for the post office on Lincoln’s birthday, even though one may “know’‘-if askedthat “Post offices are closed on Lincoln’s birthday”; and one may state that there are more words ending with “ing” than there are with an “n” in the penultimate position. I believe that there exists a computational rationale for such behavior: the selective use (or nonuse) of information is not by volition, but rather, it is an unavoidable characteristic of any memory/reasoning system that is expected to perform a class of inference with extreme efficiency. In constructing a physical device that performs a class of inference with extreme efficiency it becomes necessary to make certain commitments about how knowledge will be organized and to impose certain constraints on the conceptual structure. Any such choice entails that the remaining class of inference will either not be derivable by the physical device or at best will be “approximated” in various ways. To an external observer, all approximate responses by the system may appear irregular and anomalous. However, some of these answers may be approximate in systematic ways in that they ’ Limited amounts of information about correlations befween property values may also be encoded and used effectively by making the computational behavior of the nodes in the Memory Network more complex. But soon the network becomes too complex to be plausible.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
387
may be attributable to specific violations of constraints that must be obeyed by the system. In section 7.2 a constraint was introduced (WFR-ret-2) that was required to arrive at an efficient connectionist realization. The constraint stated that the choice of reference concept determined what information would play a role in finding a solution to a recognition query. With reference to the Multiple Views Organization (section 7.1) the condition states that if the reference concept lies within one of the views then only information from that view will be used in solving the recognition problem-information from other views will be,ignored even though it may be relevant. The above is not the only reason why relevant information may not get used during reasoning, there are other reasons as well. The connectionist realization presented in this paper deals effectively with inferences only if they have a specificform in relation to the conceptual structure. For example, consider inheritance queries. These queries inquire about the most likely value of some “property” P of some instance/class C. For the system to provide an optimal (or correct) answer, C must be a concept in the conceptual structure and P must be a property that applies to C. If the “property” and “concept” referred to by the query do not correspond to a property and concept in the conceptual structure, a correct answer will not obtain. Such a situation may arise more often than one might expect. Consider the question: Are red things more likely to be sweet or sour? The above appears to be a perfectly well-formed inheritance query but this may not be the case. The agent may not have any concept denoting the set of red-colored objects; the agent may have simply represented information about color of objects using property values (section 4.2). In this situation, the system will be unable to answer the above inheritance query. Of course, a more elaborate system may answer the query by using a more complex inferential process. Such a process may answer the query in three steps. In the first step it can locate concepts that have RED as the value of property hascolor. This can be done by posing the recognition query “[has-color RED]“. In the second step, it can pose an inheritance query and retrieve the value of the has-taste property of the concepts selected in the first step. In the third step it can accumulate the results of the second step to ascertain the final answer. The second step will involve case by case (serial) processing and one would expect the process to focus only on a few major Types selected in the first step and return an answer based on such a partial analysis. Such an “approximate” behavior should be exhibited of any system that has to respond in a limited time and certainly of a process that does not have access to external memory props. Furthermore, such an elaborate process will probably not execute automatically, and will require attentional intervention.
388
SHASTRI
8.3 Structure and Learning Another advantage of using the connectionist approach is that it suggests how a physical system, consisting of simple processing elements, may extract the information required to solve inheritance and recognition problems from the environment. If one examines the encoding of knowledge described in section 6, one would notice that most of the weights on links have a simple explanation. For example, the weights on the links emanating from 6 nodes incident on other f-nodes have the following interpretation: The weight on a link is a measure of how active,
was
fhe source
node
also
often
when
the destination
node
was
active.
This interpretation relates extremely well to a Hebbian interpretation of synaptic weights in neural nets (Hebb, 1949). Furthermore, the weights are based on purely local information. The above explanation may suffice to explain how weights on individual links arise, but it does not indicate how structures such as concepts evolve. A preliminary account of this may be found in (Shastri, 1988). The proposed learning mechanism emphasizes the central role of pre-existing (innate) structure in concept formation, and extends the notions of recruitment and chunking (Feldman, 1982; Wickelgren, 1979). Recently, Warren (1987) has implemented these ideas and shown that not only generalization, but also specialization of concepts can be achieved by a system built along the lines described in (Shastri, 1988). 8.4 Use of Winner-take-ail Networks for Answer Extraction The use of a winner-take-all network (WTA) to extract answers from the Memory network offers a clean computational account of decision making under uncertainty. In the evidential formulation adopted in this work, certain reasoning tasks are viewed as decision tasks whose final step involves accepting the most likely alternative as the correct answer and acting accordingly:The Memory network computes the evidence for each choice, but it is the WTA in the answer network that encodes such an acceptance procedure; in a WTA the best answer wins, the other answers are subdued. Recently, Goldman (1986) has also argued that the WTA answer extraction mechanism offers an “integrated model of acceptance as well as uncertainty.” What makes the use of WTAs even more interesting, however, is that besides the set of possible answers (i.e., those listed in C-SET or V-SET), the WTA mechanism also includes two don’t know possibilities, namely noinfo and conflict. The WTA returns no-info as an answer if there happens to be insufficient evidence for ail the choices. It returns conflict as the answer if none of the answers is a clear winner.9 The explicit use of don’t know an9 What constitutes a clear winner and what constitutes insufficient evidence, can be specified by setting the values of certain parameters of the nodes that make up the WTA network.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
REPRESENTATION
389
swers not only helps in modeling indecision, but it also suggests ways of encoding complex reasoning behavior. For example, one may imagine a complex routine that works as follows. At first, it initiates a simple (i.e., a quick and dirty) form of reasoning. If a don’t know response wins the competition, it either initiates an action that gathers additional information from the environment, or trigger additional-more elaborate-reasoning steps. An example of the latter was seen when inheritance queries were discussed in section 6.3. At first, an inheritance query looks up the locally available property values of a concept, if no activation arrives at nodes in V-SET (Le., the no-info answer gets selected), the IS-A links are activated so as to “inherit” the values from concepts higher up in the IS-A hierarchy. 8.5 Evidential Formulation In this work it was assumed that all evidentialrelationships between concepts and their ,attribute values-in particular, the strength of such relationships -are derived from certain frequency distributions. This approach involves an obvious oversimplification. Clearly, there are situations in which an agent may have to encode an evidential relationship between a concept and an attribute value without the knowledge of any frequency distributions. Consider a situation in which an agent is told: “Most Quakers are pacifists.” This statement certainly has an evidential import but it does not indicate what fraction of Quakers are pacifists? So how should the agent establish the strength of the evidential relationship between “Quakers” and “pacifism”? This is an important problem and needs to be addressed. Another oversimplification concerns sample-size and sampling errors. Under the present formulation, if the agent sees one apple (which happens to be green) and 1000 grapes (of which 999 are green) he or she will be more confident that the next apple will be green that that the next grape will be green. Such a response seems counterintuitive. To deal with these issues a way of recording the degree of confidence an agent has in a set of observations is needed. I hope to address some of these issues in the future. 8.6 Extended Inference So far, only “atomic” queries have been considered: Queries that involve a single stage of processing. Many seemingly simple queries require multiple stages that must be processed in sequence. Consider the query: What is the color of the block on top of a blue block? Solving this query will require three subqueries: (1) find a blue block, say X, (2) find a block (say Y) which is on top of X, and (3) find the color of Y. The key technical problem is to develop techniques for communicating the results of a query posed by one routine to another routine which may then initiate a second query. (Routines were desribed in brief in section 1.2.) A solution to this problem would require designing “parameterized” routines, where the parameters may be bound to answers extracted by other
390
SHASTRI
routines. Clearly, a solution to this problem is subsumed by a solution to the “variable binding” or “dynamic connection” problem (Feldman, 1982). In the future, I hope to formulate this problem as a special case of the general variable binding problem. 9 CONCLUSION This work demonstrates that an interesting class of inference that people seemingly perform with extreme facility may be realized with equal facility using a computational architecture that derives many of its features from the properties of biological hardware. The connectionist encoding described here is not only efficient, but it also embodies a model of reasoning that prescribes how an agent should perform inheritance and recognition based on partial knowledge so that the conclusions are optimal in a certain sense. The work also stipulates certain constraints on the organization of conceptual information that if satisfied, lead to an efficient connectionist implementation. It has often been argued that a deep understanding of what intelligence is, why do we view the world to be structured the way we do, and why are we proficient at certain tasks and not at others, will accrue only if we adopt an integrated approach that synthesizes computational, behavioral, as well as neurobiological issues. It is hoped that the work described in this paper is a small step in this direction. n
Original Submission Date: January 1987; Resubmission June 1987; Accepted October 1987. REFERENCES
Ackley, D.H., Hinton, GE.. & Sejnowski,T.J. (1985). A learningalgorithm for Boltzmann Machines, Cog/dive Science, P(l), 147-169. Allen, J.F. (1983). Maintaining knowledge about Temporal Intervals. Communications ofthe ACM. 26. 832-843. Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Ballard, D.H. (1986). Parallel logical inference and energy minimization. Proc. AAAI-86, Philadelphia, PA. Bobrow, D.G.. & Winograd. T. (1977). An overview of KRL: A Knowledge Representation Language. Cognitive Science, I(l), 3-46. Brachman, R.J. (1985). I lied about the trees. AIMagazine, 6(3), 80-93. Brachman, R.J., & Schmolze, J. (1985). An overview of KL-ONE Knowledge Representation System. Cognitive Science, P(2), 171-216. Chamiak, E. (1981). A common representation for problem solving and language comprehension information. Arlificial Intelligence, 16, 225-255. Chamiak, E. (1983a). Passing markers: A theory of contextual influence in language comprehension. Cognitive Science, 7(3), 171-190. Charniak, E. (1983b). The Bayesian basis of common-sense medical diagnosis, Proc. AAAI-83, 70-73.
A CONNECTIONIST
APPROACH
TO KNOWLEDGE
391
REPRESENTATION
Collins, A.M., & Loftus, E.F. (1975). A Spreading Activation Theory of Semantic Processing. Psych. Rev., 82(6), 409-428. Derthik, M. (1987). A Connectionist Architecture for Representation and Reasoning about Structured Knowledge. Proc. Cognhive Science ConjIerence, Seattle, WA. Etherington, D.W., & Reiter, R. (1983). On inheritance hierarchies with exceptions, Proc. AAAI-83. Washington, DC. Fahlman, S.E. (1979). NETL: A sysrem for representing and using real-world knowledge. Cambridge, MA: MIT Press. Fahlman, S.E. (1982). Three flavors of parallelism. Proc. Canadian Sociefyfor Compurational Study of Intelligence-82. Saskatoon, Canada, 230-235. Feldman, J.A. (1982). Dynamic connections in neural networks, Bio-Cybernefics, 46. 27-39. Feldman, J.A. (Ed.). (1985). Special Issue on Connectionism. Cognitive Science, 9(l). Feldman, J.A.. & Ballard, D.H. (1982). Connectionist models and their properties. Cognitive Science,
6(3).
Frisch, A.M., &Allen. (Ed.),
Lecture
205-254.
J.F. (1982). Knowledge retrieval as limited inference in D.W. Loveland Notes
in Computer
Science:
Sixth
coderence
on Automated
Deduction.
New York: Springer-Verlag. Goldman, A.I. (1986). Epistemology and Cognifion. Cambridge. MA: Harvard University Press. Gluck, M.A., & Cotter, J.E. (1985). Information and Category Utility. Proc. of the Sevenrh Annual Coderence of fhe Cognitive Science Society, Irvine, CA. Hayes, P.J. (1979). The logic of frames. In D. Metxing (Ed.), Frame concepfion and text undersfanding. Berlin: Walter de Gruyter. Hebb, D.O. (1949). The organization of behavior. New York: Wiley. Hinton, GE. (1981). Implementing semantic networks in parallel hardware. In G.E. Hinton & J.A. Anderson (Eds.), Parallel models of associative memory. Hillsdale, NJ: Erlbaum. Hopfield, J.J. (19821). Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences, 79, 2554-2558. Jackendoff, R. (1983). Semantics and cognifion. Cambridge, MA: MIT Press. Jaynes, E.T. (1957). Information theory and statistical mechanics. Part I, Phy. Rev., 106, 620-630; Part II, 108, 171-191. Jaynes, E.T. (1979). Where do we stand on maximum entropy? In R.D. Levine & M. Tribus (Eds.), The maximum entropy formalism. Cambridge, MA: MIT Press. Keil, F.C. (1979). Semanticandconceptualdevelopment. Cambridge, MA: Harvard University Press. Kohonen, T., Oja, E., & Lehtio, P. (1981). Storage and processing of information in distributed associative memory systems. In G.E. Hinton & J.A. Anderson (Eds.), Parallel models of associafive memory. Hillsdale, NJ: Erlbaum. Kyburg. H.E., Jr. (1983). The reference class. Philosophy of Science, SO, 374-397. Levesque, H.J. (1984). A fundamental tradeoff in knowledge representation and reasoning. Proc. CS-CSZ-84, London, Ontario, Canada. McClelland, J.L., & Rumelhart, D.E. (Eds.). (1986). Parallel distributedprocessing: Exglorations in the microstructure of cognition (Vol. 2). Cambridge, MA: Bradford Books. Palm, G. (1980). On associative memory, Biological Cybernetics, 36, 19-31. Pearl, J. (1985). Bayesian networks: A model of self-activated memory for evidential reasoning. Proc. 7th Cognitive Science Conference, Irvine, CA. Quillian, R.M. (1968). Semantic Memory. In M. Minsky (Ed.), Semantic i~ormafionprocessing. Cambridge, MA: MIT Press. Reiter, R. (1980). A logic for default reasoning. Arfificial Intelligence, 13. 81-132. Rips, L.J.. & Marcus, S.L. (1977). Supposition and the analysis of Conditional Sentences. In M.A. Just & P.A. Carpenter, (Eds.), Cognitive processes in comprehension. Hillsdale. NJ: Erlbaum.
392
SHASTRI
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimentul Psychology: General, 104, 192-233. Rumelhart, D.E., & McClelland, J.L. (1986). (Eds.). Parallel distributedprocessing: Explorulions in the microstructure of cognition (Vol. 1). Cambridge, MA: Bradford Books. Shubert. L.K., Papalaskaris, M.A., & Taugher, J. (1983). Determining type, part, colour, and time relationships. IEEE Computer Ia( 55-60. Shafer, G. (1976). A mothemafica/ theory of evidence. Princeton, NJ: Princeton University Press. Shastri, L. (1988). Semantic networks: An evidential formalization and its connectionist realization. London: Pitman/Los Altos, CA: Morgan Kaufman. Shastri. L. (in press). Default reasoning in semantic networks: an evidential formalization. Artificial
Intelligence.
Shastri, L., & Feldman, J.A. (1986). Neural nets, routines, and semantic networks. In NE. Sharkey (Ed.), Advunces in cognitive science (Vol. 1). New York: Wiley. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D.E. Rumelhart & J.L. McClelland (Eds.), Purullel distributed processing: Explorations in fhe microstructure of cognition (Vol. 1). Cambridge, MA: Bradford Books. Sommers, F. (1965). Predicability. In M. Black (Ed.), Philosophy in America Ithaca, NY: Cornell University Press. Touretzky, D.S. (1986). The mafhemutics of inherifunce systems. London Pitman/Los Altos, CA: Morgan Kaufman. Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: the conjunction fallacy in probability judgement. Psych. Rev.. 90(4), 293-315. Vilain, M. (1985). An Approach to Hybrid Knowledge Representation. Proc. IJCAI-85, Los Angeles, CA. Warren, C. (1987). Hierarchical learning in a massivelyparallelmachine. Senior thesis, University of Pennsylvania. Wickelgren, W.A. (1979). Chunking and consolidation: A theoretical synthesis of semantic networks, configuring in conditioning, S-R versus cognitive learning, normal forgetting, the amnesic syndrome, and the hippocampal arousal system. Psych. Rev. 86(l), 44-60.