Towards Artificial Forms of Surprise and Curiosity

6 downloads 0 Views 42KB Size Report
This paper addresses the issue of modelling human forms of surprise and curiosity in an artificial perceptual agent. The following statements support our.
Towards Artificial Forms of Surprise and Curiosity Luís Macedo Instituto Superior de Engenharia de Coimbra / Centro de Informática e Sistemas da Universidade de Coimbra Quinta da Nora 3030 Coimbra PORTUGAL +351 39 790321 [email protected]

ABSTRACT

This paper addresses the issue of modelling human forms of surprise and curiosity in an artificial perceptual agent. The following statements support our approach: humans are surprised when they perceive something that they did not expect; humans feel desire to learn more (i.e., feel curiosity) about novel (possibly unexpected) objects, usually manifested by focusing the senses on those objects in order to study and analyse them. We describe how external world is internally represented through graph-based representations in our model. In order to accomplish the task of modelling human surprise and curiosity, we describe two main measures: the measure of the difference (or novelty) of an object and the measure of the degree of not expecting an object. Based on these measures we propose approximate mathematical functions for surprise and curiosity. This approach is illustrated with an example of a perceptual robot that autonomously explores and studies its environment. Keywords

Evaluation of Creativity, Surprise, Curiosity. INTRODUCTION

Artificial Intelligence, one of the disciplines of Cognitive Science, attempts to understand and build intelligent agents. Particularly, one of the approaches of Artificial Intelligence aims to understand and build artificial agents that act and think like humans (Russel & Norvig, 1995). In order to accomplish this task, in addition to other human features, such an agent should become surprised with some parts of its environment that it did not expect, and it should become curious with unknown (new) things of the world. These unknown things trigger actions in the agent in order to study and know more about them.

Amílcar Cardoso Departamento de Engenharia Informática da Universidade de Coimbra /Centro de Informática e Sistemas da Universidade de Coimbra Pinhal de Marrocos 3000 Coimbra PORTUGAL +351 39 790000 [email protected]

The basic definition of surprise says: "to encounter suddenly or unexpectedly"; "to cause to feel wonder, astonishment, or amazement, as at something unanticipated". This means something unpredictable, unanticipated or unexpected causes to feel surprise. Moreover, surprise has been related to the phenomenon of creativity (e.g., Boden, 1992; Boden, 1995; Macedo et al., 1998; Macedo & Cardoso, 1998). It has been more often pointed as a consequence of perceiving a creative object. Considering that a creative product has been described as comprising originality (previously defined as the unexpected novelty) and appropriateness (defined as usefulness, aesthetic value, etc.), in this context surprise happens when an original and appropriate object is perceived (Macedo & Cardoso, 1998). Curiosity is defined in a dictionary as: "the desire to know or learn an object that arouses interest, as by being novel or extraordinary". This means novel and possibly unexpected objects stimulate actions to acquire knowledge of those objects. These actions usually comprise, firstly, the focus of the senses on the unknown object. For instance, there is no doubt that humans usually focus their eyes in the new objects of an environment. Actually, when faced with a set of objects they are more attracted by new objects, and even more if they are both new and appropriate. Objects that are familiar to the agent do not attract them as new ones do, at least for a few moments. There is no doubt that the world comes into the human mind trough the senses. The sensory world is somewhat transformed and represented in the mind, in what is called mental representation of the world (Eysenck & Keane, 1990). How human mind represents the world is one of the big questions that have been challenging psychologists, philosophers, linguistics, etc., for centuries. Nevertheless, several approaches to mental representation have been proposed. They play a central and essential role in knowledge-based intelligent agents, and consequently when modelling them, as for example in reasoning,

In our model mental representation is of semantic and episodic kind, stored, respectively, in the episodic and semantic memory. Thus, knowledge comprises both episodic knowledge (involving episodes or cases) and semantic knowledge (theoretical knowledge) (Tulving & Donaldson, 1972). Semantic knowledge may be acquired by abstracting cases (Schank, 1982) or simply storing the ontology of a domain in the memory of the system (Lenat & Ghua, 1990). Both types of knowledge are graph-based represented (Macedo & Cardoso, 1998; Messmer & Bunke, 1998), being networks of spatially (e.g., meets, during, overlaps, etc. (Coulon, 1995)), temporally (e.g., meets, during, overlaps, etc. (Allen 1985)), causally/explanatively (Schank, 1982)), or hierarchically (e.g., (Mandler, 1984)) related parts (represented by nodes). Figure 1

δ3

δ1

δ1

δ3

δ3 v1

cl1

δ2

δ3 δ2

hl1

δ2

v1

v1

δ1

MENTAL REPRESENTATION

shows an example of two graph-based representations of two physical objects.

δ1

problem solving and thinking (Aitkenhead & Slack, 1985; Eysenck & Keane, 1990). One of those approaches make use of graph-based representations (e.g., Messmer & Bunke, 1998; Macedo & Cardoso, 1998). In this paper, we will focus on the issue of modelling forms of human surprise and curiosity in an artificial perceptual agent. We use graph-based mental representations. The model relies heavily in two measures: the measure of the difference (or novelty) of an object and the measure of the degree of not expecting an object. The former measure consists of a mathematical function that computes the distance between two graph-based representations of two objects, which corresponds, to some extent, to the computation of the degree of non-isomorphism between those two graph-based represented objects. The second measure involves a mathematical function that computes the improbability of the existence of a given (perceived) object. While the simulation of curiosity is based on the measure of the difference of a perceived object, the simulation of surprise is heavily supported by the measure of the degree of not expecting an object. Thus, while the level of curiosity caused by a perceived object is function of the difference of it relatively to the other known objects (objects in the memory of the perceptual agent), the level of surprise caused by a perceived object is function of the degree of not expecting it given the current probabilities of the objects in the memory of the agent. The next section presents the approach to mental representation used in our model. Then we introduce the measures of difference and of the degree of not expecting an object. After showing how surprise and curiosity may be modelled in an artificial agent, we illustrate the model with the presentation of an example of the behaviour of a robot in an environment where it has to autonomously become curious to study new objects, and also become surprised when seeing unexpected objects. At last, we relate our work with others', and present conclusions and further work.

δ2

v1

Figure 1 - Graph-based representations and respective graphic representations of the shape of two doors. For the sake of simplicity: v1 = vertical line 1, hl1 = horizontal line 1, cl1 = curve line 1, δ1 = meets, δ2 = equal size, δ3 = supports. MEASURING THE DIFFERENCE AND THE DEGREE OF NOT EXPECTING AN OBJECT Measure of the Difference of an Object

Roughly speaking, the measure of difference, denoted throughout this paper as Difference(G1,G2), is described as follows: relying on error correcting code theory (Hamming, 1950), the function computes the distance between two graph-based represented objects G1 and G2, counting the minimal number of changes (insertions and deletions of nodes and edges) required to transform G1 into G2 (e.g., Messmer & Bunke, 1998). This is achieved by performing the following steps: (i) representation of each graph in a common shape matrix to the two graphs; (ii) extraction of a numerical code (e.g., binary or hexadecimal) from the matrix representation of each graph; (iii) computation of the Hamming Distance (Macedo & Cardoso, 1999) of those codes. Dividing this Hamming Distance value by the value of the maximal possible Hamming Distance of those codes which is given by the sum of the number of pieces (edges and nodes) of the two graphs, we obtain the difference of G1 relatively to G2. To compute the difference of a given object relatively to a set of objects, we apply the above procedure to each pair of objects formed by the given object and an object from the set of objects. The minimum of those differences is the difference of the given object relatively to the given set of objects. To illustrate the error correcting-based distance, consider, for instance, the two graph-based representations of the two doors of Figure 1. One is obtained from the other deleting the node hl1 and then inserting cl1, and also inserting two δ2 edges. This means those doors are at 4 changes of distance from each other in a maximal of 18, and therefore the relative difference between them is 4/18. Measure of the Degree of not Expecting an Object

The degree of expecting that an event X occurs in a random experiment is given by its probability P(X) (Bogart, 1988). According to the frequency definition of probability, in r repetitions of the experiment, this value P(X) is given by the number of times X occurred

in the possible r times. This value is given by Sr(X)/r, where S r(X) denotes the absolute frequency of X (i.e., the number of times X occurred in the r repetitions of the experiment). As r increases, Sr(X)/r converges to P(X). For instance, consider the experiment of rolling a die, the probability of "the upward side is the one bearing 6" is 1/6, because rolling it infinite times we will get the upward side bearing 6 approximately 16,6% (1/6=0,166) of the times. However, this value may not be 1/6 after a low number of repetitions of the experiment. For instance, after the 5th experiment, the number of times "the upward side is the one bearing 6" could be 4. In this case, if we do not know the law underlying the probability of this experiment we would make P("the upward side is the one bearing 6")= 4/5. After a big number of experiments, it would be confirmed that this is incorrect. When the probability of an event X depends on the occurrence of other events Ei (where i=1, 2, ..,n), it is called conditional probability and is given by: P( X I Ei ) i P( X / I E )= i P E ( I i i) i According to the frequency definition of probability, it can be shown, when r is big, that:

IE ) (I E )

Sr (X P( X /

I i

E i )∪

i

i

Sr

i

i

Consider once more the experiment of rolling a die, but this time suppose the sides bearing the numbers 1, 4 and 6 are painted red, whereas the remaining three sides are painted black. After the rth repetition of the experiment, given that "the upward side is the one painted red" (denoted by Red), the probability of "the upward side is the one bearing 6" (denoted by B6) is given by Sr(B6 ∩ Red)/Sr(Red), i.e., the number of times B6 and Red occurred together, in the total amount of times Red occurred. Considering that the probability of an event X is the quantification of expecting that event, then the improbability of X defines the quantification of not expecting X: I(X)=P(¬X)=1-P(X). Therefore, the mathematical function that measure the degree of not expecting an event X, when the set of events Ei (where i=1, 2, ..,n) are given, and when r is big or the probability law is unknown, is:

IE ) S (I E )

Sr ( X I(X /

I i

Ei )=1− P( X /

I i

Ei )∪1−

i

i

r

i

i

We may extend these ideas to the experiment of perceiving objects. Consider a graph-based represented object Π. As we described above it comprises a combination of several pieces (nodes and edges). Let W={x: x is a piece of Π}. Consider that a piece p occupies a place Πi in Π, and let: Πi denotes a graphbased representation of an object obtained from Π by deleting piece p from location Πi, i.e., Πi is Π without p at Πi; E denotes the event “perceived object is Π”; Ei denotes the event “perceived object is Πi”; X p denotes the event “perceived object is piece p at Πi ”. From these considerations we may infer that E = Ei ∩ Xp. Then the Degree of not Expecting Xp given that Ei has occurred is:

I ( X p / E i )=1− P( X p / E i )∪1−

Sr (X p E i )

=1−

S r (E)

S r (E ) S r (E i ) This formula is correct only when Sr(Ei)≠0, i.e., when there are previous occurrences of Πi. However, when Sr(Ei)=0, it is possible that there is a set S of other events Es denoting the perception of other objects Πs that are at a maximum distance (difference) T ∈ ℜ from Π. Formally, S={Es: Es denotes the event “perceived object is Πs” ∧ s ∈ O ⊂ N ∧ Difference(Πs, Π)≤T}. Then it is intuitively reasonable to approximate the value of P(X p/Ei) in the above formula by the mean of the values of all P(X p/Es) for s ∈ O. This is based on the main assumption underlying Case-Based Reasoning that says that a problem P1, with similar features as an earlier one P2 solved with solution S2, is likely to have the same solution S2. Moreover, if in the previous several occurrences of P2 different solutions Si (with i ∈ Ν) were assigned to it with a probability P(Si/P2) respectively, we may infer that P1 is likely to have the same set of solutions Si with the same probabilities P(Si/P2). Formally, this may be denoted by: P(Si/P1)=P(Si/P2). Using probability theory, Faltings (1997) has previously shown that although this can be wrong for particular instances, it is guaranteed to be correct on the average. Therefore, considering Xp as the solution of the problem Ei, we may approximate P(Xp/Ei) by P(Xp/Es1), with s1 ∈ O ⊂ Ν, since Ei and Es1 have parts in common. However, it is also intuitive and implicitly said in the Case-Based Reasoning principle that the more distant Es1 is from Ei, the less certainty there is in the approximation of P(Xp/Ei) by P(Xp/Es1). Therefore, such approximation should reflect the difference between Ei and each Es1, which may be achieved multiplying P(Xp/Es1) by the uncertainty factor 1-Difference(Es1,Ei). Since there might be several Es (with s ∈ O), we may compute the mean of the P(Xp/Es): i

P( X p / E s )↔ (1− Difference( E s , E i ) P( X p / E i )∪s O

|O|

Considering the frequency definition of probability comes, S r ( X p E s ) I ( X p / E i )∪1−

s O

S r (E s )

↔ (1− Difference( E s , E i ) √ √ ↵ |O|

Computing the value of I(Xp/Ei) for all the pieces p of ∏ as described above, and then summing them and dividing the result by the number of pieces of ∏, we obtain an approximation for the Degree of not Expecting the event E given a set of |O| events Es:

different from 0. This fact seems to be contradictory because all the houses have that roof. The problem is that the agent knows that the object is a house (because is similar to previous houses) and this is a conditional event. Then, when the agent knows the category (represented by a prototype in the semantic memory) to that the object perceived belongs to, a method to avoid that problem may be confining the computation of the degree of not expecting a piece only for those pieces that are not present in such prototype. The degree of not expecting the remaining pieces (those that are present in the prototype) is obviously set to 0. Notice that in this case, the degree of not expecting an object depends on the category of objects we are considering it belongs to.

Object

Description

I(X p /E i ) Degree _ of _ not _ Expecting ( E )=

House with a rectangular door

House with a squared door

40

30

Absolute Frequency

Church Church with a with an triangular ached door door 15

5

Shop with a pentagonal door

Shop with a hexagonal door

9

1

p W

|W |

Let us illustrate the application of these formulas with an example. Suppose an agent has previously perceived 100 objects, from which 70 were houses, 20 were churches and 10 were shops. From the 70 houses, 40 had rectangular doors, and 30 had squared doors. From the 20 churches, 15 had arched doors, and 5 had triangular doors. From the 10 shops, 9 had pentagonal doors, and 1 had hexagonal doors. See Figure 2 for a description of these objects. Suppose the agent has stored these experiences in memory. When confronted with another house and considering the set S described above as comprising all the houses, the agent will expect that it has a rectangular door with 57% of probability (40/70 = 0,57), it will expect the house has a squared door with 43% of probability (30/70), and it will expect an arched door with 0% of probability (there is no previous house in its mind that has an arched door, and therefore 0/70=0), and so on. Thus, for instance, if the agent sees a house with an arched door it will be surprised because such a door has an improbability of 100% (1-0=1). Notice that this is only the improbability of the door, and not the improbability of the entire house (which will be obtained computing the mean of the improbability contributions of all the pieces that belong to the house). Notice also that the agent confines the support of the probability computation in its previous perceptual experience. Suppose now that we want to measure the degree of not expecting a roof with a triangular shape in a house that has an arched door. Notice that the roof with this shape happens in all the houses. Since there is no house with an arched door in the memory of the agent, the uncertainty factor will be less than 1 and so the degree of not expecting the roof with a triangular shape will be

Figure 2 - Descriptive example of the previous 100 perceptions of an agent. ARTIFICIAL SURPRISE

CURIOSITY

AND

ARTIFICIAL

From what we have been saying, we may propose the below functions to model human curiosity and human surprise in an artificial agent. The curiosity induced in an agent Agt by an object Obj is of the difference of Obj relatively to the set of objects present in the memory of Agt: Curiosity(Agt,Obj)=f1(Difference(Obj, Agt(Memory)), arg1, …, argn) The surprise provoked in an agent Agt by an object Obj is function of the degree of not expecting Obj, considering the set of objects present in the memory of Agt: Surprise(Agt,Obj)=f2(Degree_of_not_Expecting(Obj,A gt(Memory)), arg1,…, argn) f1 and f2 are the exact functions of surprise and curiosity that might take into account parameters (represented by argi with i=1, …, n) other than the measure of difference and the measure of the degree of not expecting an object, respectively. For instance, according to previous definitions of surprise given in the literature (see Introduction), f2 might be obtained as follows:

Surprise(Agt,Obj)=f2(Originality(Obj,Agt(Memory)), Appropriateness(Obj))=f2(h(Degree_of_not_Expecting (Obj,Agt(Memory)), Difference(Obj, Agt(Memory))), Appropriateness(Obj)) h is a possible function for originality. AN EXAMPLE

Let us present an example that shows the importance of providing a robot with surprise and curiosity functions. We conceived a scene where a robot agent is in an environment with four objects: a house with a rectangular door; a church with a triangular door; a house with a squared door, a shop with a hexagonal door; and, a building with a pentagonal shape, and with a pentagonal door (Figure 3). The last building is new for the robot, considering that its current knowledge level comprises the objects of Figure 2. The behaviour of the robot agent starts by focusing its attention on the new building, which is the one that causes more curiosity and more surprise. Programmed to approach and analyse unknown objects the robot agent moves towards the new building, ignoring the other ones because they are not new for it. Suppose now we hide the pentagonal building. All the other objects are not new, and hence, at first glance, none deserves the attention of the robot agent. However, the shop with a hexagonal door is the less expected object of the all, according to the values of improbability computed for them. Therefore, this object is the one that causes more surprise to the robot and therefore it is the object that deserves more attention from the robot agent. This example shows that having perceived an object much more times than another object, an agent provided with a surprise function becomes less conscious of the first than of the second. It also shows, for the same reasons, that even when all the objects are already known, there might be one (or more) that deserves more attention because it is less expected than the others.

Figure 3 - A robot agent in an environment comprising different types of buildings. RELATED WORK

The issue of measuring difference, similarity, distance or novelty has been extensively covered in the literature (see Forbus, Gentner & Law, 1995). Particularly, when knowledge is represented by graphs the required comparison of them usually consists of a graph isomorphism detection (Gati, 1979). Actually, most of the approaches to measure distance of graphs

that have been applied in several areas (Case-Based Reasoning, Pattern Recognition, Computer Vision, etc.), usually rely on the computation of the maximal common subgraph (e.g., Wong, 1992; Börner et al., 1996). Messmer & Bunke (1998) compute the distance between two graphs counting the minimal number of changes (deletions, insertions) needed to convert one graph into an other. This is a similar approach to our own, but we manipulate numerical codes of graphs instead of the graphs themselves. Our approach appears to be faster considering that computing the Hamming Distance between two binary numbers (codes of the graphs) simply requires the application of the XOR operation to them - the number of 1’s in the result is the absolute difference between the two graphs. Although there is a large amount of works on distance metrics, as far as we know, none has made use of them to attempt to simulate human curiosity in an artificial cognitive agent. In what concerns to the emotion of surprise, Peters (1998) has frontally attempted to measure it, although his work was confined to the kind of surprise caused by unexpected movements of the objects in the environment of the cognitive agent. He has previously model and run an experiment in which a perceptual robot focused its eyes in the parts of the environment where changes occurred. The environment of the perceptual robot comprised two discs. If the external signal received from the environment was constant than the visual focus of the robot remained where it was, but if for example the rotation speed of one disc changed, the robot focused its eyes on it. Our approach is more general, since, besides measuring the surprise caused by static objects, it covers also the surprise caused by unexpected movements if we consider the environment of the agent at a specific time as an object. Then if changes caused by movements are produced in that environment, the next environment is not the previous anymore, but instead a new environment and therefore a new object. Successive changes lead to successive environments and consequently to successive objects. The parts (subobjects) of those objects (environments) that are improbable are surprising and then the attention of the agent focuses on them. An alternative approach to measure the degree of not expecting an object could be based on the Bayes’ formula (Lee, Grize & Dehnad, 1987). However, drawbacks of that approach such as the complexity of the computations and the problem of the independence of the conditional events have weighted on our decision of not using it. FURTHER WORK

This paper presents an ongoing work on modelling human curiosity and surprise. Therefore, although the examples show evidence of some correctness of the model, they are too rudimentary to give definite conclusions and then experimental tests are needed to evaluate with more accuracy the model proposed in

this paper. Particularly, we think it would be important to compare the behaviour of the artificial agent with the behaviour of a sample of humans. Furthermore, the function that measures the degree of not expecting an object is to some extent intuitive and based on the CBR principle. A more formal one might be defined based on the theory of probabilities. Furthermore, a parallel work is being done using the same surprise function that we have presented here in the process of constructing creative solutions in our system. This way, the process involved is surprise-guided and not only novelty-guided, as in most of previous works on creativity. CONCLUSIONS

We have presented an approach to simulate curiosity and surprise in artificial perceptual agents. The simulation of curiosity provides artificial agents with the faculty of autonomously seeking for new knowledge and performing learning. On the other hand the simulation of surprise may contributes to provide artificial agents with the ability of evaluating the impression that objects provoke on them. REFERENCES

Aitkenhead, A.M. & Slack, J. M. (Eds.). (1985). Issues in Cognitive Modelling. Hillsdale, NJ: Lawrence Erlbaum Associates. Allen, J. (1985). Maintaining Knowledge about Temporal Intervals. In R. Brachman, H. Levesque (Eds), Readings in Knowledge Representation. Los Altos, CA: Morgan Kaufmann. Boden, M. (1992). The Creative Mind: Myths and Mechanisms. New York: Basic Books. Boden, M. (1995). Creativity and unpredictability. SEHR, 4, 2. Bogart, K. (1988). Discrete Mathematics. Lexington, MA: Heath and Company. Borner, K., Pippig, E., Tammer, E., Coulon, C. (1996). Structural Similarity and Adaptation. Proceedings of the 3th European Workshop on Case-Based Reasoning (pp. 58-75). Berlin: Springer Verlag. Coulon, C., (1995). Automatic Indexing, Retrieval and Reuse of Topologies in Architectual Layouts. Proceedings of the International Conference on Computing in Civil and Building Engineering. Eysenck, M. W., & Keane, M. T. (1990). Cognitive Psychology - A Student's Handbook. East Sussex, United Kingdom: Lawrence Erlbaum Associates. Faltings, B. (1997). Probabilistic Indexing for CaseBased Prediction. Proceedings of the 2th International Conference on Case-Based Reasoning. Berlin: Springer Verlag. Forbus, K., Gentner, D., & Law, K. (1995). MAC/FAC: A model of similarity-based retrieval. Cognitive Science, 19, 144-205.

Gati, G. (1979). Further annotated bibliography on the isomorphism disease. Journal of Graph Theory, pp. 96-109. Hamming, R. (1950). Error Detecting and Error Correcting Codes. The Bell System Technical Journal, Vol. XXVI, 2, 147-60. Lee, N., Grize, Y., & Dehnad, K. (1987). Quantitative Models for Reasoning under Uncertainty in Knowledge-Based Expert Systems. International Journal of Intelligent Systems, Vol. II, 15-38. John-Wiley & Sons, Inc. Lenat, D., and Ghua, R. (1990). Building Large Knowledge-Based Systems: Representation and Inference in the CYC Project. Reading, Massachussetts: Addison-Wesley. Macedo, L., Pereira, F., Grilo, C. and Cardoso, M., (1998). A Computational Model for Creative Planning. In U. Schmid, J. Krems, F. Wysotzki (Eds.), Mind Modelling: A Cognitive Science Approach to Reasoning, Learning and Discovery. Berlin: Pabst Science Publishers. Macedo, L., & Cardoso, A., (1998). Nested GraphStructured Representations for Cases. Proceedings of the 4th European Workshop on Case-Based Reasoning (pp. 1-12). Berlin: Springer Verlag. Macedo, L., & Cardoso, A., (1999). Labelled Adjacency Matrices for Labelled, Directed Multigraphs: their Algebra and Hamming Distance. Proceedings of the 2th European Workshop on Graph-Based Representations. Vienna: Austria Computer Society. Mandler, J. (1984). Stories, Scripts, and Scenes: aspects of schema theory. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Messmer, B., Bunke, H. (1998). A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (5). Peters, M. (1998). Towards Artificial Forms of Intelligence, Creativity, and Surprise. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, pp. 836--841, Madison, Wisconsin. Russel, S. & Norvig, P. (1995). Artificial Intelligence A Modern Approach. Prentice Hall. Schank, R. (1982). Dynamic Memory. NJ: Cambridge University Press. Tulving, E., & Donaldson, W. (Eds). (1972). Organization of Memory. New York: Academic Press. Wong, E. (1992). Model matching in robot vision by subgraph isomorphism. Pattern Recognition, 25(3):287-304.