IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2389
PAPER
A Computational Model for Taxonomy-Based Word Learning Inspired by Infant Developmental Word Acquisition Akira TOYOMURA†a) , Nonmember and Takashi OMORI†† , Member
SUMMARY To develop human interfaces such as home information equipment, highly capable word learning ability is required. In particular, in order to realize user-customized and situation-dependent interaction using language, a function is needed that can build new categories online in response to presented objects for an advanced human interface. However, at present, there are few basic studies focusing on the purpose of language acquisition with category formation. In this study, taking hints from an analogy between machine learning and infant developmental word acquisition, we propose a taxonomy-based word-learning model using a neural network. Through computer simulations, we show that our model can build categories and find the name of an object based on categorization. key words: word acquisition, shape bias, taxonomic bias, human interface, neural network, PATON
1. Introduction In order to realize high-level human interfaces, a function that acquires word meaning from word use and simultaneous sensory input is important for realizing user-customized and situation-dependent interaction. Unfortunately, however, word-sensory event association is not a simple problem. As an example, consider a situation in which a system is shown an object, such as a banana, and searches for an appropriate name for it from its knowledge. It is possible that the system will soon find and output the word “banana” corresponding to shape information about the object. However, there is a possibility in some situations that that category name “fruit” is more appropriate. The choice “banana” has come from a sensory feature based similarity between the input and internal knowledge on “banana”. But the choice “fruit” is based on a categorical similarity with the knowledge on the word “fruit.” Here, we assume that the categorical similarity is decided by a set of features that are not directory observed from the sensory input. For example, the category “tool” is defined by the use of objects by human, and can’t be defined by their outlook. It is obvious that this kind of in directory observed and acquired features are extracted and used for object naming. The distinction between “banana” and “fruit” use, depending on the situation, becomes a problem when we search for the proper name of an observed object. The context-dependent, proper use of Manuscript received July 20, 2004. Manuscript revised February 14, 2005. † The author is with the Graduate School of Engineering, Hokkaido University, Sapporo-shi, 060–8628 Japan. †† The author is with the Graduate School of Information Science and Technology, Hokkaido University, Sapporo-shi, 060– 0814 Japan. a) E-mail:
[email protected] DOI: 10.1093/ietisy/e88–d.10.2389
these words is exhibited in the intelligence level of the realized system. Of course, we can build the category name “fruit” into the system in advance if we know it is necessary. The problem here, however, is that humans often create categories according to necessity [1]. Therefore, a function that builds a new category online in response to a situation is needed, especially for an advanced human interface with word-learning ability, and to do this, computational model of the category formation function for a word-learning machine is necessary. However, at present there are few computational model studies available on the purpose of language acquisition with category formation. Similar name-assigning behavior is observed in the infant word-learning scene. It is known that infants tend to assign a name to a new object based on object shape similarity at the initial stage of their language learning [2]. Such behavior is called “shape bias” [3], where the word “bias” means a tendency to use a specific strategy in a specific situation. Furthermore, it is also known that they begin to assign category names to new objects [4]. This behavior is called “taxonomic bias” [5], [6]. The change in the infant strategy from “shape bias” to “taxonomic bias” implicitly shows the change of the internal structure of human category-dependent object-naming processing. To further develop the advanced human interface, it is worthwhile taking a hint from this phenomenon. Computational modeling of this phenomenon is one possible approach, but to date few studies have been attempted for modeling the information processing of the infant word-acquisition process. In this study, we propose a computational model for change in the word-learning strategy from shape bias to taxonomic bias in children’s language acquisition processes using a brain structure-based neural network. Here, our assumption is as following: Children find a name of given object by judgment of sameness between the given object and knowledge of objects in memory. In the first stage, they don’t have categorical knowledge but they judge the sameness based on only sensory information. After children acquire categorical knowledge by experience, they can recall categorical knowledge and come to use the name associated with category. However, this assumption can’t explain how the category is acquired or how the category is embedded in the process of name judgment. Unless we clarify them in a viewpoint of brain computation, we can’t realize an algorithm for the actual word acquisition scene. This is a case study for the purpose. With this model, we show a struc-
c 2005 The Institute of Electronics, Information and Communication Engineers Copyright
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2390
ture of one-to-many object naming processing in relation to categorical knowledge in the brain, that is, the indirectly observed and acquired features. The model will give a fundamental theory for the advanced human-machine interaction. Categorization of an object depends on an attribute that is extracted and focused on at a given moment. Because various classification keys can be used to categorize objects under taxonomic bias, the model must be flexible enough to make use of any attributes that are additionally acquired through the infants’ experiences. To realize this feature, the model must possess the abilities of multimodal concept representation, incremental attribute manipulation and similarity computation. Neural networks are often used for wordlearning tasks, but these features are difficult to realize with a multi-layered-type learning network. One model that does satisfy the above requirements is PATON [7]–[10], which is a multi-module brain memory model (see Sect. 3 for details). Because of PATON’s suitability, in this paper we use the model as a base brain model that has the required features described above. We can realize the inclusion of category attributes by adding new attribute areas to PATON’s memory network. In our model, an object is allocated a name, for example “apple,” based on the sensory attributes in the early stages; in later stages, however, the model acquires category attributes, classifies the object with them, and allocates a higher-order category name like “fruit” to the object based on a similarity within the category. Though we can assume various possible keys for the categorization, we use the contextual similarity of word use in a sentence as an example. This means when words are used similarly in sentences, we can expect those words to have similarity. In this study, we employ RNN (Recurrent Neural Network) as a contextual similarity extraction tool, and use activity patterns of the RNN internal layer for target words as the categorization key. As a result of combining PATON and RNN, we can realize internal processes for similarity-based object naming and the taxonomy-based object naming in the brain. This paper is organized as follows. In the next section, we explain word-learning behavior in infants and its computational model. In Sect. 3, we introduce the base brain model PATON, RNN for the context extraction, and then explain the details of our model. In Sect. 4, we show computer simulations on a case of sensory attribute-based object naming first, and a categorical similarity-based case later. In Sect. 5, we discuss the adequacy and extensibility of our model, and its implications for a real-world object-naming problem. Finally, we conclude this paper. 2. Information Processing in Object Naming 2.1 Word-Learning Biases of Infants Fairly amount of works on developmental psychology [2] report a few types of word-learning bias. Typical examples include shape bias and taxonomic bias. Shape bias explains the phenomenon that infants use object shape information to
determine similarity between objects, while taxonomic bias explains the phenomenon that infants use categorical information for the similarity decision. Imai et al. [4] reported that infants change their object naming strategy from shape bias to taxonomic bias as they develop. In their behavioral experiment, three- and five-year-old children were tested for how they extend word meaning. First, they were shown a standard object, after which they were shown three object choices, and the children were asked “See? This is a fep in dinosaur talk. Can you help Jojo find another fep?” In this test, a baby dinosaur named “Jojo” was used to help the children engaged in the test. Each child chose one of the three objects as the object that had the name “fep.” When they extend the meaning of fep to an object that has a similar visual feature, they are assumed to adopt shape bias. When they take an object that belongs to the same category as the standard object for fep, they are thought to possess taxonomic bias. Another object is used as a control. One important result of this experiment was that the similarity between two objects gradually changed from a shape-based similarity to a taxonomic-based one as children aged. This result can be explained as follows. At the initial phase of word learning, infants do not have knowledge about the word category; therefore they judge similarity by using only the available perceptual information. After infants grow and acquire some knowledge about object categories, they begin to use it to judge similarity. Acquisition of categorical knowledge is a key to effecting change in wordlearning strategy. However, when we consider the information process the brain employs for this learning, we notice that we know almost nothing about the computational process or the internal process change for the learning biases. 2.2 Computation of Shape Bias- and Taxonomic Bias-Like Processing When considering a model for shape bias, we can assume the following computational processing to realize shape bias-like behavior. After the first shown object, say A, is given a name, the infant memorizes its perceptual feature in some specific modality P s (A). Then, another object, say Bi , is presented. The brain circuit subsequently compares corresponding attributes P s (A) and P s (Bi ), and selects objects that have same feature as A (Fig. 1, left). In the case of shape-bias processing, the system chooses a shape attribute to compare, and judges the level of similarity between objects A and B with respect to that attribute. In this processing, the shown object is not necessarily known in advance, and the infant learns new words by associating object A’s features with the name given at first. Currently, the reason for the shape attribute being used for word learning is not known. However, it is often said that the shape feature is useful in determining the sameness of objects in real world, and infants found this through their experience. On the contrary, in the case of taxonomic bias, we can assume the following computational process. When the first
TOYOMURA and OMORI: A COMPUTATIONAL MODEL FOR TAXONOMY-BASED WORD LEARNING
2391
2.3 A Model for Processing Change from Shape Bias to Taxonomic Bias
Fig. 1 Processing image of shape bias and taxonomic bias. Sensory attributes of objects are compared directly in the shape bias. A category feature extraction, that determines taxonomic bias key, is used in the taxonomic-bias processing.
object A is given a name, the infant extracts category attribute Pcat (A) from the sensory input by some method, and that attribute is memorized. Next, when another object, Bi , is shown, the same category attribute extraction process works and the extracted category attribute Pcat (Bi ) is compared with Pcat (A) to judge the similarity (Fig. 1, right). Here, the category attributes Pcat (A) and Pcat (B) are the indirectly observed and acquired features that define the categories. The key processing here is the extraction of the category attribute from the sensory input. Some methods, such as associative recall from a conceptual memory or precise pattern recognition, are possible for this purpose. For each of object category, we assume a set of features that represent the category is associated in brain memory system. The association between the object sensory input and the category representing features form the conceptual memory. There are so many categories in the world and so much diversity among them, however, that it is implausible to consider so much categorical knowledge could be acquired in a short time. Therefore, there need to be two stages to acquire taxonomic bias. The first is the moment at which an infant notices the importance of a category attribute for the similarity judgment in the naming task. After that moment, the infant begins to adopt the taxonomic strategy for naming and uses the category knowledge that he/she knows at that instant. The second is after learning the new category: every time the infant acquires a new category, he/she immediately applies the taxonomic strategy to a new object using the new category. One of the most powerful methods for teaching new categories to infants is the naming of an object, since it is natural to expect infants to know after some amount of word learning that when a name is given, it usually represents a category. Imai et al. [4] observed the second phenomenon in their experiment.
When we observe those learning biases from the perspective of processing, choice of the focus on comparisons is crutial. Infants initially use the perceptual attribute, employing the categorical attribute later. However, since categories are diverse and incremental, the brain’s processing system must change its internal processes as the number of categories increases. Moreover, it is possible to realize the brain processing that extracts category information from sensory inputs, i.e., the real body of the categorical knowledge, in various ways. For example, the same function may be realized by access to memory, or by new attribute learning in the feature extraction process. Brain circuit maturation through development or an automatization of conscious processing by practice may cause change to functional allocation in the brain. That is, superficially similar behavior does not necessarily mean same internal process [11]. The brain’s processing model for word-learning bias acquisition should have the flexibility to enable it to address such a wide range of processing changes and variations. From the discussion above, we can list some functions that are requested in the computational model for the word learning bias as follows: (1) the ability to add a new category from various criteria; (2) the ability to find new internal processing corresponding to the new category; and (3) the ability to switch the processing depending on the situation. These are rather new concepts in comparison to conventional single-functional neural networks, and they are not easily acquired through a simple learning of the neural network. For this purpose, we adopt the brain memory model PATON, which may possess a property that can assist with those processes, and we attempt to model the change in word-learning bias through development. 3. Realization of the Word-Learning Bias by BrainLike Computation 3.1 Brain Memory System Model PATON PATON [7]–[10] is a model of the brain’s macroscopic memory system that can express the usage process of multimodal conceptual memory. This model is constructed based on two hypotheses of brain processing, a dual coding in the memory system and a sequential activity control on brain areas [9]. The dual coding means the representation of outer world object with both of its attribute set that represents its features and its recognition that represent its identity. The activity control of brain area means existence of a control system that activates parts of brain system to enable internal sequential computation process. The former hypothesis suggests that the brain encodes pattern information in cortical sensory modality-specific areas and discrete representation in the hippocampal/entrhinal
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2392
Wki j (k = 1, . . . , N, J = 1, . . . , Mk , i = 1, . . . , L) with attribute area neuron Pk j . It associates the input pattern that corresponds to outer-world events with the symbol neuron S i and forms a memory. In the symbol layer, the symbol neurons have mutual inhibitory connections to realize single-neuron activity. The attention system generates the control signal vector, A(d) = {apk , cpk , asi , csi , cw | k = 1, . . . , N, i = 1, . . . , L}, where each element of A(d) has the value {1,0}. By generating this attention vector in sequence, the system can activate the appropriate area in sequence for a certain task. Neural activity in the model is described by following equations, where Φ(x) is the output function of the neurons. d pk j = −pk j + c1 S i Wki j + c2 Ik j cpk (1) τ dt i Fig. 2 Basic structure of the PATON model. The pattern layer is composed of multiple attribute areas Pk that represent sensory information. The symbol layer S represents the identity of outer-world objects. The attention system generates internal signals apk , cpk , asi , csi to control the activity of the attribute areas Pk and the symbol neurons S i . See text for detailed behavior of the model.
system. These two coding systems represent different aspects of the outer world and are activated dynamically corresponding to outer-world situations. By switching those representations, we can recognize and utilize continuous information like shape or color attributes in some situations and discrete representation in other cases. For example, in the case of a banana, the continuous attributes of “yellow” or “long” can be recognized, or the discrete recognition of “banana” is realized, depending on the task. The latter hypothesis suggests that a sequence of internal attention controls the activity of cortical areas, depending on the task. Though the mechanism of control in the brain is not yet known, evidence from studies on the human brain clearly show that each of the brain’s cortical areas receive some form of activity control. From the viewpoint of an information-processing model, sequential activation of cortical areas realizes a flow of neural activity within those areas, thus a neural computation sequence is realized. Internal attention is one possible source of this processing switching. As a structure for implementing these two hypotheses, PATON features a memory network with a pattern layer and a symbol layer as well as an attention system that generates the attention vector (Fig. 2). A set of attribute information that represents an outer event is given to the pattern layer and is memorized. The memorized patterns are manipulated by the internal attention. The pattern layer is composed of multiple attribute areas Pk = (Pk j | k = 1, . . . , N, j = 1, . . . , Mk ), each of which represents a different sensory modality. The attribute areas receive an input Ik = (Ik j | k = 1, . . . , N, j = 1, . . . Mk ) from corresponding preprocessing systems. Each of the Slayer neurons S i (i = 1, . . . , L) has a bilateral connection
Pk j = apk Φ(pk j ) 1 . Φ(x) = 1 + exp(−x/T ) dsi = −si + c3 τ Wki j Pk j − c4 sq csi dt j,k
(2) (3) (4)
qi
(5) S i = asi Φ(si ) dWki j = αS i (Pk j − Wki j )cw (6) τ dt In these equations, the coefficient cpk, apk, csi asi and cw have the value of 1 or 0, and they can control the dynamics of each equations by modifying effect of corresponding term. For example, in the attribute area, if cpk equals to 0, pk j is not updated as long as cpk keeps the value. In the same manner, the signals apk controls the output gain of the attribute neuron Pk j . In the symbol layer, the control signals asi and csi control the output gain of each symbol neuron Si and update the value of its internal state si independently. Learning of the connection Wki j is controlled by the signal cw. For example, if the cw equals to 0, the connection Wkij is not updated. When cw is changed to 1, Wkij is updated according to Eq. (6). These mechanisms provide the ability to change network behavior immediately by changing those parameters. Operation of the model mainly consists of four steps: input to the attribute area (ItoP) activity propagation from the attribute area to the symbol layer (PtoS) activity propagation from the symbol layer to the pattern layer (StoP) and learning. The selection of attributes and symbols in those steps depends on the task and context of the moment. When we select proper attributes and symbols in these steps, PATON can perform higher-level elemental behaviors of memorization, recognition, recollection and association [9]. These operations are realized by sequentially organizing the basic operations, and the organization itself is actualized by a sequence of the control signal A(d). With this mechanism, we can attain the situation-dependent switching of input similarity judgment processes for the object-naming task. However, we cannot simply apply the PATON model to
TOYOMURA and OMORI: A COMPUTATIONAL MODEL FOR TAXONOMY-BASED WORD LEARNING
2393
Fig. 3
Structure of SRN-based word-predicting network.
explain the mechanism of the word-learning bias, because the taxonomic bias does not depend on the sensory attribute information. Alternatively, we require an additional ability to extract a non-sensory attribute and to compare it with corresponding attribute of other objects. In this study, therefore, we add a non-sensory category attribute learning and a comparison function between the pattern representations. 3.2 Category Pattern Formation by Word-Use Context To realize taxonomic-bias processing, we need a mechanism that extracts category information from the sensory input. In this study, we use a word category from a sentence context and associate it with a corresponding word in the PATON memory system. Of course, we do not know whether the real taxonomic bias is extracted from the sentence context in the brain. With this mechanism we can, however, provide our model with a function the same as taxonomic bias. Our idea is based on the fact that the category information is embedded in the sentence. We know that serial position or the order of words in sentences of a language is fixed by syntax. Therefore, if the system can encode the information in which “apple” follows “eat” and “grape” also follows “eat,” for example, the system can guess that “apple” and “grape” have the same attribute. One learning system that learns and extracts syntactic context in a sentence is SRN [12], [13] (Fig. 3). The learning process of SRN is that words in a simple sentence are given one by one to an input layer, and the next word in the sentence is given to the output layer as a teacher signal. SRN learns to predict the word that will arrive next from the sequence of words in the sentence. It is known that the intermediate layer of SRN acts to encode the context of a word sequence in a sentence. Although we cannot say clearly whether SRN is biologically plausible, an SRNtype local neural circuit is plausible in the brain, and we can expect the brain to extract category information from a sentence by using a processing of this type. Because PATON includes some sensory attribute areas, we modify one of them to memorize the contextual information of a sentence. One way to realize this is to employ SRN as a preprocessing system for sentence context extraction and make one attribute area of PATON, say a contextual area, to receive input from the intermediate layer of SRN
Fig. 4 Structure of the proposed model for the taxonomic bias processing.
(Fig. 4). SRN learns the sequence of words in a sentence independent from other parts of PATON. The connection between the contextual area and the symbol layer learns the co-occurrence relation of an object and the context of its appearance in a sentence, then PATON can recollect the category information by recollecting the activity of the contextual area from the symbol layer. We expect this activity to be used as the category attribute for the name-assigning scene. 3.3 Similarity Judgment Procedure in PATON Once PATON has acquired the category information, it is used to categorize the input object, just as seen in the objectnaming scene. In the case of naming sensory similarity, one of the available sensory inputs is compared with the corresponding sensory pattern of a known object. In the case of taxonomic-biased naming, however, the contextual pattern associated with the object in PATON’s memory is used for the comparison. In order to judge the similarity of the input pattern with the memorized one in the attribute areas, k (m, n = 1, . . . , Mk ) between we introduce a connection Mmn the cells pkm and pkn in each of the pattern areas k to form a Hopfield-type associative memory. To compare two activity patterns, say x1 = 1 (x1 , . . . , x1Mk ) and x2 = (x21 , . . . , x2Mk ), one of them x1m is memorized. k = (x1m − 0.5)(x1n − 0.5) (7) Mmn mn
Because the input pattern x1 has value 1 or 0, the sign k becomes plus when the two cells within input patof Mmn tern are same, and becomes minus when the two cells are k memorizes the not same. As a result, the connection Mmn input pattern, and has an effect on the dynamics of pk j when the next input is given. Another pattern x2 is given as an input to the network, then, the activity of the attribute area is computed by following the memory-recollecting equation. k is This equation is same as Eq. (1), but a term using Mmn added and affects the dynamics of pk j to classify a learned
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2394
pattern and an un-learned one. d pk j = −pk j + c1 τ S i Wki j + c2 x2j dt i k +c5 M ji pk j cpk
(8)
i
The quality of memory recollection that corresponds to the degree of matching between the two patterns above is calculated using the energy of the Hopfield-type memoryrecollection state. This energy E becomes smaller when the two patterns are similar, but larger when the two are not similar. 1 k M (pki − 0.5)(pk j − 0.5) (9) Ek = − 2 i, j ji Using these equations, we can obtain a low energy value when those patterns are alike and a high energy value when they are not. In the following simulations, some objects are shown to the PATON by turns. But the order of presented objects doesn’t affect the computation and energy state of the Eq. (9). So in the simulation, we don’t have to fix the order of presented objects. 3.4 A Model for the Shape to Taxonomic Change in PATON In this section we introduce category feature extraction abilities to PATON, and attempt to model the process of shape bias and taxonomic bias using attention vector sequence of PATON. 3.4.1 Procedure of the Shape Bias Shape bias can be realized without any knowledge of the category. In the experiments by Imai et al. [4], as introduced in Sect. 2, children are shown a standard object first, followed by three target objects. In the case of shape bias, this choice behavior is realized by comparing the shape similarity for each of the target objects with the standard object. From the perspective of PATON internal processing, it is possible to actualize the similarity computation procedure by following some particular steps: first, a shape feature of the standard object is memorized within the shape area of the pattern layer; next, a shape feature of the target object is provided as an input and the associative memory Mmn activates. After the association converges to a stable state, the energy value of the state representing the degree of their similarity is calculated. Then, the energy value is determined for each of the target objects, and if the energy value of one of the targets is lower than the predetermined threshold, it is chosen as the object with the same name as the standard object. In this case, PATON does not need to know any information about the standard object nor the target objects in advance. The procedure can be realized only by having
Fig. 5 Formation of a word concept on PATON. The name of object A, a sensory input and its context in sentence are presented simultaneously. These patterns are associated as a single concept memory in the PATON network. See text for explanation of the numbers.
a threshold with which the similarity can be judged among the objects. Whether PATON memorizes the name of the object is optional, depending on the task’s requirements, and it is not the same problem as judging the similarity for shape bias. 3.4.2 Acquisition of Word Concept in PATON Memory To acquire a category, it is necessary to extract corresponding category features from a set of experiences within a certain attribute. As explained in Sect. 3.2, we use a set of sentences and SRN to extract features of word usage. The extracted word-usage information (actually, it is an activation pattern of SRN’s intermediate layer), and other sensory inputs are associated with a symbol-layer neuron in order to form a word concept in the PATON memory system. As the result, the connection Wki j that forms association between the object identity recognition and the attribute sets of the objects represents the concept of the word. Actually it is a memory in the PATON model and can be manipulated using the PATON attention vector sequence. The process of concept-memory formation can be described as follows (see also Fig. 5). 1. When a word is presented to the input layer of SRN (right hand of Fig. 5) within the course of sentence presentation, the intermediate layer of SRN represents the sentence’s corresponding context. 2. PATON’s contextual area receives the activation of SRN’s intermediate layer. The state of contextual area changes in accordance with Eq. (1). 3. Simultaneously, sensory-dependent pattern areas of PATON receive visual information (shape or color etc.) about an object corresponding to the word. 4. A symbol neuron S i is excited by the activity of the sensory-dependent pattern areas in accordance with Eq. (4). 5. The connection Wki j between the contextual area and the symbol neuron S i is learned using Eq. (6).
TOYOMURA and OMORI: A COMPUTATIONAL MODEL FOR TAXONOMY-BASED WORD LEARNING
2395 Table 1 An example of the PATON attention vector sequence corresponding to the taxonomic-bias process shown in Fig. 6. 1 2 3 5 6
ap1 0 1 0 1 0
ap2 0 1 0 1 0
ap3 0 0 0 0 0
cp1 1 0 0 1 0
cp2 1 0 0 1 0
cp3 0 0 1 0 1
as1-5 0 0 1 0 1
cs1-5 0 1 0 1 0
4. Computer Simulation
Fig. 6 PATON’s recognition process for taxonomic bias. Each number corresponds to an explanation in the text.
3.4.3 Procedure of the Taxonomic Bias Once PATON acquires the word concept associated with the category information, it can use that category information whenever necessary. The recognition process of objects using contextual similarity is as follows. 1. Some pattern areas receive the sensory input of object A like shape and color in accordance with Eq. (1). 2. Activation of the pattern areas propagates to the symbol layer, where a symbol cell S A activates and recognizes the input in accordance with Eq. (4). The activation corresponds to recognition of the object A based on the sensory based similarity. 3. The activation of S A propagates to the contextual area, resulting in the recollection of a contextual pattern associated with object A in accordance with Eq. (1). 4. The contextual pattern is memorized and hold by the k in the contextual area in accordance connection Mmn with Eq. (7). 5. Sensory input of target object B is passed to the input area, and is recognized by a symbol cell S B in accordance with Eqs. (1) and (4) just like the object A. 6. The S B activation propagates to the contextual area, after which a contextual pattern for object B is given to the contextual area as an input to the associative memory within that area in accordance with Eq. (1). Here, the comparison between the object A context pattern and the object B context pattern starts. 7. The energy of the memory recollection state in the contextual area is calculated to identify objects A and B in accordance with Eq. (9). When both of the object A and B belongs to the same category, have the same contextual pattern, the energy becomes low value. 8. Steps 5–7 are iterated for each of the target objects. Figure 6 graphically depicts this procedure, while Table 1 shows the attention vector of PATON corresponding to it.
To understand this model more easily, we first implement a case in which the model compares objects by shape similarity. Following that, we implement the case of taxonomic similarity. In these simulations, in order to clarify the internal procedure of each bias, we prepare the adequate attention vector that operates in each bias. 4.1 A Case of Shape-Dependent Choice: Shape-Bias In this computer simulation, we use an apple as the standard object A, a balloon as a perceptually similar object B1 , a grape as a taxonomically similar object B2 , and a knife as a thematically related object B3 . Before beginning the experiment, we assumed that the model already holds concept memory of these objects used for the input. This memory is not necessary for shape-bias processing, though it is essential for the taxonomic-bias processing described later. In the shape-bias processing, the model selects the perceptually similar object balloon as the object most similar to the standard object apple, since similarity between the objects depends on the shape of those objects. The energy is calculated in the shape-pattern area. The recognition process occurs as follows. 1. The model sees a standard object. a. (a) Visual shape information about the apple is given to the shape-input area, and the model recognizes it as an “apple.” b. The shape-attribute area memorizes the pattern by the connection M in the shape area. 2. The model examines the target objects. a. The shape pattern of the target object is provided to the shape-pattern area. b. Starting from the apple pattern, which results from the connection M, the activity of the area converges to a stable memory recollection state with the given input. c. The energy of the shape pattern area activity is calculated. d. The procedure 2(a)–2(c) is repeated for each of the target objects. e. The object with the lowest energy is selected.
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2396 Table 2 The shape and color patterns for each object. Similar shape patterns are assigned to the apple and the balloon. Apple Grapes Balloon Knife
Shape 110000000 000110000 011000000 000001100
Table 3
Words for SRN learning.
Category NOUN-HUM NOUN-FOOD NOUN-TOY NOUN-TOOL VERB-EAT VERB-PLAY VERB-USE
Color 110000000 001100000 000011000 000000110
Examples Mary apple, grape balloon, doll knife, scissors eat play with use
Table 4 Words and rules for SRN learning. Six sentences are possible with the words in Table 3. One of them is chosen randomly and given to SRN to learn the category feature. WORD1 NOUN-HUM NOUN-HUM NOUN-HUM
Fig. 7 Computer simulation of shape-biased object naming. The upper graph shows the activation of the symbol-layer cells. The lower graph shows the energy of the shape-attribute area. The sample object, an apple, is shown at initial 500 steps and the apple shape pattern is memorized in the shape-attribute area at the moment indicated by an arrow. Then, other target objects, a grape, a balloon, and a knife are shown sequentially, and the energy is calculated for each of them.
For the simulation, we prepared sensory input patterns for each object, as shown in Table 2. These patterns are given to the corresponding pattern areas; since the visual shape patterns of the apple and the balloon are similar, they represent the common shape feature. Figure 7 illustrates the results of the computer simulation. In this figure, it is clear that the object with the lowest energy was the balloon, reflecting the shape attribute used for the energy calculation. Shape-biased selection of similar objects can be realized by a rather simple computation in the brain. Here, recognition of the initial target, an apple, was not necessary for the similarity judgment itself. 4.2 Word-Category Learning from Word Usage in a Sentence To perform context-extraction learning by SRN, we prepared a set of sentences using words corresponding to the objects in the above experiment in addition to some other words (Table 3). Using these words, the sentence set was synthesized using simple rules (Table 4). These words were too simple as compared to actual word acquisition scene. However, in this study, we focus on the possibility of a computational model to acquire a category from a context. SRN learned to predict the word arriving next by forming a contextual representation in its intermediate layer. SRN learned the words arriving next in the sentences 80,000 times with a BP learning coefficient of 0.05.
WORD2 VERB-EAT VERB-PLAY VERB-USE
WORD3 NOUN-FOOD NOUN-TOY NOUN-TOOL
After SRN had learned the sentences, it was given six possible sentences, and contextual layer activations at the moment of each target word input were associated with corresponding word-concept memories. The association was realized by a learning of the connection between the contextual area and the symbol layer. 4.3 Recognition Process of Taxonomic Relations Here we show a case in which the model selects a taxonomically similar object. This choice is accomplished using taxonomic knowledge; that is, the model should have the knowledge that an apple and a grape belong to the same category. Although it is natural to assume that taxonomic knowledge is composed of many factors, we assume here that it is the contextual knowledge from the sentence learning. Here, we employ the concept memory of objects in PATON for that purpose. The procedure for calculating the energy of each object and sequences of the attention vector is described in Sect. 3.4.3. Figure 8 illustrates the results of the computer simulation, indicating that the object with the lowest energy is a grape. This means that the contextual pattern of a grape is similar to that of an apple. Such a result indicates that when our model acquires a module that encodes the contextual information of a word, the model becomes able to recognize the taxonomic relation of a known object, recognizing taxonomic-biased objects. 5. Discussion 5.1 Adequacy of Our Computational Approach We proposed a computational model to represent shape bias and taxonomic bias. Even though our model may not explain exactly the word-acquisition bias of infants, it is effective at elucidating the problems that occur when naming objects. The behavior of our model consists of: (a) acquisition of a new category; (b) judgment processing using the
TOYOMURA and OMORI: A COMPUTATIONAL MODEL FOR TAXONOMY-BASED WORD LEARNING
2397
tational strategy employed by humans to name objects differs largely from that of the conventional one-to-one correspondence strategy between sensory patterns and names. In addition, we prepared the attention vector for each bias by hard. However, the ability to acquire and change the bias processing is necessary for a developing system. More study is needed regarding this point. We showed through a computer simulation that the use of the embedded information in a sentence is effective for the category information extraction. However, the sentences we used in the simulation were too simple as compared to actual word acquisition scene. In the actual scene, infant uses more words in a sentence than this simulation. So, our result just shows a possible computational principal of taxonomy-based word learning. To evaluate applicability of the principle, we have to apply it to practical word acquisition scenes. If we explore application possibilities further, we should notice some other important features of child word acquisition. For example, Shimotomai et al. [14] reported that in the case of verb acquisition, there are some learning biases that are different from that for nouns. If we can model those processes, there may be other possibilities for more advanced human interfaces with quick verb learning. 5.2 What is Changed in Development?
Fig. 8 Computer simulation result of the taxonomic-biased similarity judgment. The upper graph shows the activation of the symbol layer. The second one shows the energy of the contextual area. Of the four lower graphs, the two upper ones show the history of attention signal activity in the pattern layer, while the two lower ones show the history of attention signal activity in the symbol layer. In the graph of the attention vector, the bold lines depict the valuable symbols. The sample object, an apple, is shown and recognized in the initial 500 steps, and its corresponding category pattern is recollected and memorized into the memory M in the contextual pattern area at the 500th time step (indicated by an arrow). Next, from time step 500 to 1,000, the target object, a grape, is shown and its corresponding category pattern is recollected. The associative memory M begins to converge between time steps 1000 and 1500 using the category pattern, and the energy value is obtained at time 1500. The same procedure is iterated for the balloon and the knife, and the energy value for each of the objects is obtained.
category; and (c) an attention vector that represents the sequence of internal processing. However, our model just have accomplish a part of the requirements in Sect. 2.3. For (1) the ability to add a new category from various criteria, the PATON model can realize it when we designate to add an attribute. For (2) the ability to find new internal processing corresponding to the new category and (3) the ability to switch the processing depending on the situation, we could not realize them though we can recognize their necessity. To realize a flexible naming system with context dependence and incremental features, we need to include an ability for learning categories and an ability to generate internal procedures to utilize the newly acquired category. The compu-
What is the essential difference between shape bias and taxonomic bias? One clear difference is the attribute used for the similarity judgment. When we compare the realized procedures directly, PATON did not need to use the concept memory for the shape-bias processing; it could realize the procedure by simple processing. On the other hand, in the case of taxonomic bias, categorical information access was added to the shape-bias processing as a procedure for recollecting the context attribute. To realize this, it was necessary to develop a process that extracts category information from sensory information. For example, in the case of our taxonomic-biased implementation, the conceptual memory of PATON was indispensable. In the course of development, increasing the number of processing functions, such as category information extraction, appears to be a key feature for the emergence of intellectual behavior. Processing of shape bias and taxonomic bias must be properly carried out according to the situation. For this purpose, depending on the situation it is necessary to switch internal behaviors, which in this study were actually the attention sequences for our PATON realization. For a new situation, it is necessary to acquire a new internal processing corresponding to the situation. Although we do not know how our brain acquires new internal processes, the acquisition of new attention sequences in PATON may elucidate a possible research direction. 5.3 What Must be Solved for Proper Object Naming? In the introduction, we discussed the problem of suitable
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.10 OCTOBER 2005
2398
naming for perceived objects. Many of the conventional word-acquisition models use only sensory information when they learn, which corresponds to the shape-bias processing in this paper. In human word acquisition, however, children implicitly learn categories simultaneously with each new word, forming the origin of the next taxonomic bias. We used contextual information extracted by RNN as categorical information. The reason we used RNN in this study was that it was efficient for contextual information extraction. For the acquisition of categorical information, appropriate method for individual categorical feature should be used. What is important in this study is that a system which can recognize the category and change its processing with the recognition is necessary for the object naming. Here, we have shown basic computational architecture using PATON and gave computational explanation. Another specific feature of taxonomic bias is the selection of the proper category and internal processing for a current task. In this paper, we used sentence context as the category information and did not mention the process of selection. That is still an unsolved problem with respect to object naming.
vol.94, pp.2227–2232, 1994. [8] T. Omori and A. Mochizuki, “PATON: A model of context dependent memory access with an attention mechanism,” in Brain Processes, Theories and Models, ed. R. Moreno-Diaz and J. Mira-Mira, pp.134–143, MIT Press, Cambridge, MA, 1996. [9] T. Omori, A. Mochizuki, K. Mizutani, and M. Nishizaki, “Emergence of symbolic behavior from brain like memory with dynamic attention,” Neural Netw., vol.12, no.7-8, pp.1157–1172, 1999. [10] T. Omori and M. Nishizaki, “Representation and learning of word meaning using memory model with dynamic attention,” Second Conf. Cogn. Sci., vol.99, pp.189–194, 1999. [11] J.C. Price and K.J. Friston, “Degeneracy and cognitive anatomy,” Trends Cogn. Sci., vol.6, no.10, pp.416–421, 2002. [12] J.L. Elman, “Finding structure in time,” Cog. Sci., vol.14, pp.179– 211, 1990. [13] J.L. Elman, “Learning and development in neural networks: The importance of starting small,” Cognition, vol.48, pp.171–199, 1993. [14] T. Shimotomai and T. Omori, “A model of word meaning inference development in child,” Proc. 9th Int. Conf. Neural Information Processing, pp.1236–1240, 2002.
6. Conclusion Akira Toyomura received PhD degree in Engineering from Hokkaido University, Sapporo, Japan, in 2005. He is now Post Doctor Fellow in Research Institute for Electronic Science in Hokkaido University. His research activies are in the area of Cognitive Science, Brain Science. He is a member of the Japanese Cognitive Science Society.
In this paper we explained the necessity of a computational model for naming new objects, and proposed a computational model that takes hints from developmental word acquisition. Furthermore, we showed processing of the model through a computer simulation. A brain-like memory model called PATON was used to implement the internal procedure realization. However, it is still not understood how the procedures for biases are acquired in the brain. The conventional way in which designers develop programs one-by-one for each situation may not be suitable for the next generation of human interface development, because various unexpected situations occur in real interface scenes, and we simply cannot prepare programs for all possible situations in advance. Development of an autonomous method for discovering self-internal processes through interaction with the environment will be necessary.
Takashi Omori received the PhD degree in Mathematical Engineering and Information Physics from the University of Tokyo. He is now a professor in the Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan. His research activities are in the area of Brain Science, Cognitive Science, especially his current interests lie in brain processing modelling of thinking and memory. He is a member of the Japanese Cognitive Science Society, and the Japanese Neural Network Soci-
References [1] L.W. Barsalou, “Ad hoc categories,” Memory & Cognition, vol.11, pp.211–227, 1983. [2] M. Tomasello and E. Bates, eds., Language Development, Blackwell, Malden, MA, USA, 2001. [3] B. Landau, L.B. Smith, and S.S. Jones, “The importance of shape in early lexical learning,” Cognitive Development, vol.3, pp.299–321, 1988. [4] M. Imai, D. Gentner, and N. Uchida, “Children’s theories of word meaning: The role of shape similarity in early acquisition,” Cognitive Development, vol.9, pp.45–75, 1994. [5] E.M. Markman and J.E. Hutchinson, “Children’s sensitivity to constraints on word meaning: Taxonomic versus thematic relations,” Cognitive Psychology, vol.16, pp.1–27, 1984. [6] E.M. Markman, “Constraints children place on word meanings,” Cognitive Science, vol.14, pp.57–77, 1990. [7] T. Omori and N. Yamagishi, “PATON: A model of concept representation and operation in brain,” Proc. Int. Conf. Neural Network,
ety.