A Neural Net Model for Mapping Hierarchically Structured Analogs Graeme S. Halford, School of Psychology, University of Queensland, Queensland 4072 Australia,
[email protected]
William H. Wilson, Computer Science and Engineering, University of New South Wales, Sydney NSW 2052 Australia,
[email protected]
Brett Gray, School of Information Technology, University of Queensland, Queensland 4072 Australia,
[email protected]
Steven Phillips, Information Science Division, Electrotechnical Laboratory, 1-1-4 Umezono, Tsukuba 305 Japan,
[email protected]
Neural net models of human analogical reasoning need to incorporate realistic limitations in capacity to process information in parallel. Because of this, the STAR2 model, represents complex analogies as a hierarchy of levels, with parallel processing within any one level and serial processing between levels. The major components of the model are two constraint satisfaction networks. The focus selection network selects a proposition in the base and a proposition in the target for mapping. These propositions are loaded into a mapping network which finds the best mapping. The most complex structure mapped in any one step is a quaternary relation, consistent with human capacity limitations. However the mapping is constrained by a set of parallel acting constraints, including consistency with previous mappings, salience of mapped elements, element and relational similarity, and structural correspondence.
1. Introduction Human analogical reasoning can be successfully modeled with parallel processing architectures (Halford, Wilson, Guo, Gayler, Wiles & Stewart, 1994; Hummel & Holyoak, in press; Holyoak & Thagard, 1989) but some analogies entail complex, hierarchically structured knowledge representations which entail more information than humans can process in parallel. The development of psychologically realistic models requires human limitations on parallel processing to be respected. Even a relatively simple analogy, such as that between heat-flow and water-flow shown in Figure 1 has a structure that is too complex to be processed entirely in parallel by humans. A metric for quantifying the complexity of structures that can be processed in parallel is required, and it is also necessary to explain how problems that exceed this capacity are processed. We have previously proposed that a metric in which complexity of relations is quantified by the number of arguments is the best for this purpose (Halford et. al., 1994). 1.1 Capacity and Complexity Each argument of a relation provides a source of variation, or dimension, and thereby makes a contribution to the complexity of the relation. An N-ary relation can be thought of as a set of points in Ndimensional space. Relations of higher dimensionality (more arguments) impose higher processing loads. A unary relation is the least complex, and has one argument. It corresponds to a set of points in unidimensional space. A binary relation (e.g. BIGGER-THAN) has two arguments, and a ternary relation has three arguments (e.g. love-triangle is a ternary relation, and has arguments comprising three people, two of whom love a third). The working memory literature, plus some specific experimentation, has led to the conclusion that adult humans can process a maximum of four dimensions in parallel, equivalent to one quaternary
relation (Halford, 1993; Halford, et al., 1994; Halford, Wilson & Phillips, submitted). Structures more complex than this must be processed by either conceptual chunking or segmentation. Conceptual chunking is recoding a concept into fewer dimensions. Conceptual chunks save processing capacity, but the cost is that some relations become temporarily inaccessible. Segmentation is decomposing tasks into steps small enough not to exceed processing capacity, as in serial processing strategies.
Mapping
Water-flow CAUSE GREATER
flow(vessel A, vessel B, water, pipe)
pressure(vessel A)
pressure(vessel B)
liquid(water) GREATER flat_top(water) diameter(vessel A)
diameter(vessel B)
Heat-flow CAUSE GREATER
flow(coffee, ice, heat, conductor)
temperature(coffee) liquid(coffee)
Cause of water-flow ↔ Cause of heat-flow CAUSE ↔ CAUSE Pressure difference ↔ Temperature difference Water-flow ↔ Heat-flow GREATER ↔ GREATER Pressure of vessel A ↔ Temperature of coffee Pressure of vessel B ↔ Temperature of ice flow ↔ flow vessel A ↔ coffee vessel B ↔ ice water ↔ heat pipe ↔ conductor pressure ↔ temperature
temperature(ice) flat_top(coffee)
Figure 1. Water-flow and Heat-flow relational domains with the analogical mapping between them.
1.2 Analogical Mapping An analogy is a structure-preserving map from a base or source to a target (Gentner, 1983). The structure of base and target are coded in the form of one or more propositions. Each proposition consists of a relation-symbol (e.g. bigger-than or CAUSE) and a number of arguments (e.g. bigger_than(dog, cat) or CAUSE(pressure-difference, water-flow). Each argument is either an element, representing a basic object (e.g. cat) or a chunked proposition (e.g. water-flow is a chunked representation of the proposition flow(vesselA, vesselB, water, pipe)). The height of a proposition in a hierarchy gives an indication of how much chunked structure it contains. Propositions that only have elements as arguments are of height 2 and are called lower-order propositions while the height of a proposition with chunked propositions as arguments is the height of the highest unchunked argument plus one, and is called a higher-order proposition.
An analogical mapping between base and target consists of a mapping of the propositions, relationsymbols and elements in the base to the propositions, relation-symbols and elements in the target. In previous work, we presented the Structured Tensor Analogical Reasoning (STAR) model (Halford et. al., 1994) which included a tensor product representation of relational knowledge. The representation handles all of the properties of relational knowledge as well as providing a natural explanation of the quaternary relation complexity limitation in terms of the number of units required to represent relations of different dimensionality (Halford, Wilson and Phillips, submitted). The STAR model also demonstrated how simple proportional analogies involving a single proposition from the base and target could be processed.
2. The STAR2 Model We present the STAR2 model which forms mappings between domains containing multiple propositions while conforming to the limitation of only mapping a single pair of quaternary propositions at a time. In order to do this, the model sequentially selects corresponding pairs of propositions from the base and target and then forms mappings between the relation-symbols and arguments of the current pair of propositions in parallel. The sequential selection of pairs of propositions can be seen as a form of segmentation, sequentially focusing on propositions of acceptable dimensionality in order to form a mapping between higher dimensional concepts (e.g. the heat flow and water flow domains). Both the parallel mapping of arguments and relation-symbols as well as the sequential selection of proposition pairs are performed by constraint satisfaction networks, indicating a degree of computational similarity between the sequential focus selection and the parallel mapping processes. The model consists of three main structures: the focus selection network, the argument mapping network, and the information storage structures. The focus selection network is responsible for selecting a pair of propositions to be mapped. Once a pair is selected, the argument mapping network is then loaded with a representation of the propositions selected from each domain. The base proposition is mapped into the target proposition, and the mapping is stored in a map storing network. The system then returns to the focus selection network which selects a new pair of propositions to form a new focus and the procedure is repeated until the domains are mapped. 2.1 Argument mapping network This is a constraint satisfaction network1, where the rows represent a base relational instance and the columns represent a target relational instance. An example of the network is shown in Figure 2, mapping the top proposition of water-flow to heat-flow. That is, it is mapping the causal relation between pressuredifference and water-flow to the causal relation between temperature difference and heat flow. The first row and column represent the relation-symbol of the base and target respectively. The other rows and columns represent the arguments. The nodes within the matrix represent potential mappings of base and target elements. Each of the nodes has an associated activation value. The general running of the network involves repeatedly updating the activation value of each of the nodes. Each node receives excitatory and inhibitory input from each of the nodes it is connected to, and this input moves the node’s activation value toward a maximum or minimum value. After many iterations the activation settles to a stable state in which a number of nodes have significantly greater activation than the other nodes. These nodes are considered winning nodes, and indicate the mapping adopted. The shaded nodes in the diagram show the winning nodes for this 1
Constraint satisfaction networks have been used in a number of PDP analogy models since first introduced by ACME. A detailed explanation of constraint satisfaction operational mechanics as well as details of ACME can be found in Holyoak & Thagard (1989).
example. They represent the mapping of the relation-symbol and arguments of the base to the relationsymbol and arguments of the target. Inhibitory connections between all nodes in the same row or column tend to make mappings unique; that is each base element is mapped to at most one target element and vice verse. This is because the winning node of each row or column provides enough inhibitory input to the other nodes of the same row or column to stop them developing significant activation. Excitatory connections exist between all nodes not in the same row or column to allow a stable growth of activation. Mapping heuristics. A number of heuristics are implemented by influencing nodes that provide constant excitatory or inhibitory input to the mapping nodes, biasing them towards or away from becoming winning nodes. The heuristics implemented include the following biases: • Corresponding argument positions - this gives an advantage to nodes that map arguments in corresponding positions, so the first argument of the base tends to be mapped to the first argument of the target, and so on. • Similarity2 - there is a bias to map identical or similar entities. • Type - items are initially specified with a type and there is a bias to map items of identical or previously mapped types. • Salience - there is a bias towards mapping pairs of items with a higher salience and a bias away from mappings between items with a difference in salience. • Consistency - there is a bias towards mappings that are consistent with previous mappings and a bias against mapping elements to propositions or relation-symbols and propositions to relation-symbols.
Columns Correspond to Target Terms CAUSE Pressure Water Difference Flow CAUSE Temperature Difference
Rows Correspond to Base Terms
Layer 1
Heat Flow
Layer 2 Influencing Nodes Influencing Nodes
Figure 2. Argument mapping network. The dotted lines represent inhibitory connections while the full lines represent excitatory connections.
Figure 3. Focus Selection network. The dotted lines represent inhibitory connections while the full lines represent excitatory connections.
2.2 Focus selection network The focus selection network has two layers both with similar structure to the argument mapping network (see Figure 3), the main difference being that the rows and columns represent the chunked propositions that represent the base and target domains rather than relation-symbol and arguments. The lower layer is 2
Similarity and salience of items along with the propositions are specified as initial input to the model.
influenced by many of the same sources of activation as the mapping network, with additional heuristics, biasing towards selecting propositions with similar height, similar number of arguments and with corresponding relation symbols and arguments. Excitatory connections are placed between mapping nodes in which the propositions represented by one node are arguments to the propositions represented by the other node. This provides a bias towards selecting similarly-shaped tree structures. The connectivity results in Layer 1 settling to a state in which a group of nodes develops a strong activation, representing a group of consistent base/target proposition pairs to potentially be mapped. Layer two is then designed to select a single winning pair of propositions from the group arising from layer one. Its structure is similar to layer one except that there are strong inhibitory connections between all nodes, thereby ensuring selection of a single focus. There are unidirectional excitatory connections from units in layer one to the corresponding units in layer two. When a focus is selected, connectivity between the two layers is set to 0 to ensure the pair of terms are not re-selected. 2.3 Information storage structures Information storage structures are used to store information about the entities (propositions, elements and relation-symbols) in the base and target. The information is stored in a number of tensor networks, storing the similarity between pairs of entities, salience of entities, entity - type associations and chunked proposition - unchunked proposition associations. In addition to these networks, a rank two tensor is used to store mappings between entities as they are formed by the argument mapping network (see Halford et. al., 1994; submitted, for a description of tensor networks). Due to different mappings being formed in different focuses, it is possible that a number of target (base) items could be mapped to a single base (target) item. Mapping scores can be calculated from the network which determine the consistency of a particular mapping between entities.
3. Analogies Solved The model has been applied to a number of complex analogies from the literature. We will provide a brief overview of how the model mapped heat-flow to water-flow (Falkenhainer, Forbus & Gentner, 1989) and then mention the other analogies that have been solved. 3.1 Mapping heat-flow/water-flow The mappings are shown in Figure 1. Initially the information storage structures are established to represent the water flow and heat flow domains. This information is used to initialize the focus selection network, which then selects the causal relation between the chunked representations of pressure-difference and water-flow in the base, and the causal relation between the chunked representations of temperaturedifference and heat-flow in the target. The argument mapping network is loaded with these two propositions and converges to a solution in which the cause relation-symbol in base and target are mapped, the chunked proposition pressure difference is mapped to the chunked proposition temperature difference, and the chunked proposition water-flow is mapped to the chunked proposition heat-flow. These mappings are stored in the map storing network and control returns to the focus selection network, which selects the chunked propositions pressure-difference and temperature-difference. The propositions are then loaded into the argument-mapping network which maps the relation-symbols GREATER in base and target. Pressurevessel-A chunked proposition is mapped to temperature-coffee chunked proposition, and Pressure-vessel-B chunked proposition is mapped to temperature-ice chunked proposition. The mappings are stored in the map storing network. This process of repeatedly selecting base/target pairs of propositions and mapping their components continues until all possible corresponding structure has been mapped. This includes all
mappings shown in Figure 1 plus additional mappings of the irrelevant propositions. At the end of the process, the map storing network contains all the mappings that have been stored. 3.2 Other Analogies Tested The Rutherford analogy (Falkenhainer, Forbus & Gentner, 1989 ) between the structure of the solar system and the structure of the hydrogen atom has more propositions than heat-flow/water-flow and has a more complex structure, but is successfully handled by the model in a similar fashion. Jealous animals story analogy is an analogy between isomorphic children’s stories where animals play the roles in the story. A number of versions were tested in which the corresponding animals’ similarity was varied as well as the presence of higher order propositions (see Holyoak & Thagard, 1989 for details of these analogies). The model correctly solved most versions of the analogy, but also made incorrect mappings on versions that are difficult for humans (e.g. where animal similarity worked against the structurally correct mappings). Addition/Union is an analogy between the properties of associativity, commutativity and the existence of an identity element on numeric addition and set union (Holyoak & Thagard, 1989). These properties are represented in a way that is isomorphic, involves higher order propositions but has no common relationsymbols or elements between the two domains. The model solves this analogy despite the lack of common items. The boy-dog analogy is an analogy in which the base and target have isomorphic structure, no higher order propositions and no common relation-symbols or elements. Several versions were tested in which the basic version could be considered too hard for humans to solve, but each other version had additions which would make the analogy solvable (see Hummel & Holyoak, in press, for details on the versions of this analogy). In accordance with human processing, the model failed the basic version but passed all the other versions tested.
4. Conclusion The STAR2 model of analogical mapping maps complex analogies through a combination of serial and parallel processing. Base/target pairs of propositions are selected sequentially while mappings between the components of the propositions are formed in parallel. This corresponds to a form of segmentation over capacity limited relational domains and thus conforms to observed psychological capacity limitations in the complexity of relations that can be processed in parallel. The model has been tested on five analogies and displays a correspondence with psychological results. References Falkenhainer, B., Forbus, K. D., & Gentner, D. (1989). The structure-mapping engine: Algorithm and examples. Artificial Intelligence, 41 1-63. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7 155-170. Halford, G. S. (1993). Children’s understanding: the development of mental models. Hillsdale, N. J.: Erlbaum. Halford, G. S., Wilson, W. H., Guo, J., Gayler, R. W., Wiles, J., & Stewart, J. E. M. (1994). Connectionist implications for processing capacity limitations in analogies. In K. J. Holyoak & J. Barnden (Eds.), Advances in connnectionist and neural computation theory, Vol. 2: Analogical connections (pp. 363-415). Norwood, NJ: Ablex. Halford, G. S., Wilson, W. H., & Phillips, S. (submitted). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology.
Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13(3) 295-355. Holyoak, K. J., & Thagard, P. (1995). Mental leaps. Cambridge, MA: MIT Press. Hummel, J.E., & Holyoak, K. J. (in press). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review. Acknowledgements: This work was supported by grants from the Australian Research Council and the Science and Technology Agency (Japan).