Experiments with a Creativity-Support System based on Perceptual Similarity Bipin INDURKHYA, Kailash KATTALAY, Amitash OJHA, Pradhee TANDON International Institute of Information Technology Gachibowli, Hyderabad-500 032, INDIA email:
[email protected]
Abstract: Metaphors based on perceptual similarity are considered to be one of the hallmarks of creativity. We propose here an image-based retrieval system for generating pairs of perceptually similar images. A number of examples of images paired by the system are presented and discussed. Often, these images are conceptually very different, and hence can serve as anchors for perceptual metaphors that create new conceptual similarities. Keywords: creativity, perceptual metaphor
1. Introduction Consider the following verse from Stephen Spender’s well-known poem Seascape: There are some days the happy ocean lies Like an unfingered harp, below the land. Afternoon gilds all the silent wires Into a burning music for the eyes. On mirrors flashing between fine-strung fires The shore, heaped up with roses, horses, spires Wanders on water tall above ribbed sand. Here the poet compares the ocean to an unfingered harp. In reading the poem, one’s attention is invariably drawn to the way the sunlight reflects on the ripples of a calm ocean, making them look like the strings of a harp. A kind of perceptual resonance or synergy is created between the two images that is difficult, if not impossible, to describe verbally — the verbal explanations, if possible at all, are long and tortuous, and hopelessly inadequate at times. One needs to imagine, in concrete detail, what an ocean looks like on a calm day, as well as how a harp looks like before it is strummed. The crucial thing here is that the imagining must involve perceptual details. Such synergy of perceptual images, which is essential in understanding the meaning of the poem, is very difficult, if not impossible, to obtain from conceptual analysis alone. The reason being that when we see or describe a thing conceptually, many perceptual details are lost. If you see or conceptualize a harp, you immediately
consider it to be a musical instrument, and all sorts of music-related concepts are activated. It is only when you imagine it concretely by evoking harp-related percepts and creating some sort of perceptual simulation that the possibilities of focusing on its shape, color, how the light reflects off of it, etc. open up. (See also Indurkhya 2006.) Evoking these images one becomes aware of a perceptual resonance between the two: the light reflecting on the strings of a harp and the sunlight reflecting on the ocean waves, each forming a similar visual pattern. It is this perceptual association that carries the metaphor, and renders it meaningful. This ability to see perceptual similarities that are lost or are not apparent in the conventional conceptualization is considered a hallmark of creativity. For example, one of the techniques for creating new meanings (riddles, in this particular case) proposed by Gianni Rodari (1996) in his wonderful book The grammar of fantasy: An introduction to the art of inventing stories is deconceptualization. It consists of three steps as follows: 1) Estrangement: describe the object as though seen for the first time; 2) Association: the ‘clear surface’ of the description opens up the way for other meanings through images — so seek such images through associations; and finally 3) Metaphor: form a metaphor using the images from the last step. (Rodari 1996, p. 29.) The main objective here is to move away from the existing conceptualization of the object, and the method focuses on somehow forcing the cognitive agent to evoke a perceptual image of the object so that new associations may be found. Using this method, Rodari shows how one can come up with the following new way of looking at a pen (in the form of a riddle): “What’s black and needs white, to make its mark look bright”. Lest one may dismiss all such examples of poetry and riddles as fanciful recreational activities that do not serve any practical purpose, research on creative problem solving has also shown that perceptual similarities play a key role in generating insights when solving difficult real-world problems. This is perhaps best illustrated by a case discussed in Schön (1963, 74–76). A product-development team was given the task of improving the performance of synthetic fiber paintbrushes, which left the painted surface gloppy and uneven, as opposed to the smooth and continuous finish of natural fiber paintbrushes. Initially, their understanding of the painting process, and the role of paintbrush in it, was based on seeing it as a smearing process: whereby the paint sticks to the fibers when the brush is dipped in it, and when the brush is moved back and forth on the surface, some of this paint sticks to the surface. From this perspective, when the product development team compared the performances of the natural fiber brush and the synthetic fiber brush, they could find no differences. Yet the overall result, in terms of the appearance of the painted surface, was different. The breakthrough came when a member of the team had a flash of insight suggesting that the painting process be viewed as pumping. In doing so, the concept of painting, as well as the role of a paintbrush in it, was completely transformed. Now the paint is sucked up in the space between the fibers by capillary action as the brush is dipped in the can of paint. When the brush is moved back and forth on the surface, the increased pressure on the inner side of the curved fibers pumps out the paint through the space between the fibers towards the outer side, generating a thin spray at the place where the fibers bend away from the surface. When the performances of the natural fiber paintbrush and the synthetic fiber paintbrush were compared from this perspective, it was found that the natural fibers formed a smooth curve when the paintbrush was pressed against a wall, while the synthetic fibers bent at a sharp angle. This observation led to a number of innovative suggestions for how gradual-bending synthetic fibers
might be fabricated, some of which resulted in bristles that curved smoothly; the paintbrush made from these fibers did, indeed, produce a smooth painted surface. The main thing to emphasize about this example is that the new representation or perspective that was the key to solving the problem requires a perceptual reorganization of the painting process. Though there may be wide individual variations in the imagery employed during this reorganization, it seems obvious that some kind of perceptual simulation of painting process is necessary, which is conceptualized anew. Only then a conceptual organization that is radically different from the existing one can emerge. Even the causal structure that determines why paint sticks to the brush when dipped in a can of paint, and why it gets coated on the surface when the brush is moved across back and forth, is completely transformed in the new conceptualization of the painting process. It is generally accepted that the cognitive mechanisms of deconceptualization and generative metaphor are difficult in that it requires quite some cognitive effort to move closer to perception, and to see an object through perceptual images instead of via familiar conceptualizations. In fact, it is because of this difficulty, creativity researchers have developed various techniques to help people break away from the conventional conceptualization (de Bono 1975; Gordon 1961; Koestler 1964; Rodari 1996; Schön 1963). It should also be noted in this connection that in learning to draw or paint, one must consciously learn how to see blotches of colors and patterns of shadows instead of trees and fields, and to see lines and shapes instead of familiar faces. So we next consider how computer systems can be of assistance in these mechanisms. 2. Computers and Creativity Computers by their very nature are inherently algorithmic; and often the mechanical nature of algorithms is considered an anathema to creativity. How can any creative insight result from following a predetermined set of instructions in a blind and seemingly mindless manner? Elsewhere (Indurkhya 1998) we have argued that computer systems can be designed that have an inherent advantage over humans in generating creative associations. The argument can be summarized as follows: For creativity a cognitive agent needs to break conventional conceptual associations consciously and deliberately. This task is difficult for us people because we are constrained by the associations of our concept networks that we inherit and learn in our lifetime, and it requires a significant amount of cognitive effort to break away from these associations — the conceptual associations that one so dearly needs for all the commonsense reasoning become a major stumbling block when it comes to creativity. Computers, on the other hand, do not have such conceptual associations. In fact, artificial intelligence research has spent a great deal of time and effort in modeling these conceptual associations — for they are a key to commonsense reasoning. Semantic networks, frames, scripts etc. are formalisms developed to capture these associations. So, it follows that it must be easier for the computers to break away from the conceptual association simply because they do not have them to begin with. Therefore, we may conclude that computers are naturally pre-disposed towards incorporating creativity.1 1 Indeed, precisely this reasoning underlies the way in which computers are used to create art that, one could argue, transcends human art at Remko Scha’s Institute of Artificial Art in Amsterdam (Harry 1992, 1997), as
3. An Image-Based Search System to Generate Perceptual Metaphors Let us now see if we can apply this general idea to design a computer system that can generate creative perceptual metaphors — that is, pairs of images that are perceptually similar but are conceptually very different (like the ocean and a harp.) In fact, imagebased search is a hot topic of research these days as the repository of images is growing at an exponential rate and people need fast, efficient and effective tools to search through them. Though most image retrieval systems use some kind of linguistic tags to match images (Al-Hawamdeh et al. 1991; Bertino et al. 1988), a few researchers are also beginning to develop purely image-based matching systems (Datta et al. 2008; Flickner et al. 1995). The architecture of one such image-matching-based image retrieval system called FISH (Fast Image Search in Huge Database) that was developed by one of us (Tandon et al. 2008) is shown in Fig. 1.
Figure :1 FISH System Architecture The system processes each query image into an internal representation and searches for similar images in a large database. Search for similar images is accomplished in sub-second times using an appropriate index structure. Each image X in the system is represented as a vector of numeric feature values [X1,X2, . . . , XD]. The
exemplified in the following passage: “[I]t is practically impossible for human artists to create works of art that live up to the aesthetic ideals of philosophers like Immanuel Kant. Human artists always have rather selfish goals that usually involve money, fame and sex. Anyone who is aware of this, will become much too embarrassed to be able to engage in a disinterested process of aesthetic reflection. Machines are in a much better position to create objects of serene beauty; and computers will finally be able to create endless amounts of such objects, in infinite variety.” (Harry 1997)
space of possible vectors constitutes a D-dimensional space in which each image is a point. The general features used in any image retrieval system are color, texture and shape descriptors. Color descriptors though weak in description, allow flexibility in use through variations ranging from the global histogram to the color layout descriptors. Texture is generally highly dependent on the homogeneity and regularity of the patterns in pixels. Shape is difficult to extract and represent. In FISH we have experimentally selected a weighted combination of generic descriptors accepted by the research community unanimously. The combination includes mean, variance and skew color moments, MPEG-7 Color Layout Descriptor (CLD), MPEG-7 Color Structure Descriptor (CSD) and MPEG-7 Texture Browsing Descriptor (TBD) (Martínez 2002) . We use weights to counter the variation in numeric scales across these characteristics. This numeric vector is then used as the signature for the image. The generic nature of the descriptors discounts any implicit bias towards any concept. As a result the system is able to retrieve images perceptually similar to the query, unconstrained by the concept being queried for. The front end is a web-interface designed with a strong emphasis on ease of use. The design minimizes the efforts the user has to make while interacting with the system. We have strictly avoided the input of low-level query specification like textual or numeric descriptors from the user to allow unbiased perceptual comparison. The user is required to provide the query as an example image. Once the query image is given, the system extracts the numeric signature, which can then be used for comparisons with the images in the database. Based on the signatures, the system retrieves images from the database which it feels are perceptually similar to the query. We use the Mahalanobis metric over the respective signatures to compare the query with images in the database. A subset of the perceptually similar images is then shown to the user as results for his query. FISH allows the user to mark as relevant from among the displayed images the ones he finds are indeed similar to his query. If the user wishes so, FISH uses this feedback to refine the retrieval conceptually, the discussion on which is not the focus of our work. 4. Experimental Results and Discussion In this section the results of some experiments we conducted in generating perceptual metaphors using the FISH system are presented. When we query with a picture, a number of perceptually similar images are retrieved from the database. Here we have selected one of those images that we felt was more interesting in the sense that it was conceptually very different from the query image, and we comment on the perceptual similarities between the two images.
Query 1: Taj Mahal
Result: Bottles
When we queried with a well-known structure Taj Mahal, one of the images that was retrieved was that of wine bottles. Taj Mahal, as a concept, is very different from wine bottles but in this picture they have perceptual similarity: towers of Taj Mahal can be seen as bottles. Thus, the structural aspect of Taj Mahal as having five vertical lines is emphasized.
Query 2 : Aishwarya Rai
Result : Buffalo
One of the most interesting results we found was when we queried with a picture of former miss world and current Bollywood actress Aishwarya Rai. One of the retrieved images was that of a water buffalo. The juxtaposition seems strange at first, but with a careful look reveals that the hair of Aishwarya Rai can be compared with the black body of buffalo. The face can be imagined below surrounded by black legs.
Query: 3 Tree
Result: Helicopter
In another example, an image of a tree was found to be perceptually similar to that of a helicopter. Leaves of the tree can be compared with the body of the helicopter, as both of them are big circular regions in the center of the image, though each has a different color. Legs of helicopter can be seen as the trunk of the tree.
Query 4: Shahrukh
Result: Horse
For the next example, the query is the picture of a well-known Bollywood actor, Shahrukh Khan. The system does not see him as a talented actor but in terms of patches of dark colors, and one of the similar images retrieved is that of a horse. If we see them closely we find that these two pictures are perceptually similar. Even the leaning posture of the actor perceptually resonates with the neck and the head of the horse.
Query 5: Rose
Result: Face
In this example the query was the picture of a rose. An interesting result among the retrieved images is that of a face of a woman. A close examination reveals that the bright petals of the rose correspond to the face of the woman, and the green leaves surrounding the rose are like the dark hair of the woman. An interesting thing about this example is that comparing a beautiful face to a rose is considered a very conventional metaphor, which is based on abstract similarity in terms of beauty. Our experiment, however, provides a perceptual grounding to this metaphor.
Query 6: Cat
Result: Car
In this example, the image of a cat yielded a car. Again, the perceptual similarities between the two images should be obvious. Though the two concepts seem semantically unrelated, the perceptual metaphor suggests possible ways to bring them together: for example a car that is as agile, powerful and quiet as a cat. Such associations have obvious use for advertising and public relations.
Query 7: Cycle
Result: Horse
In this example our query was a bicycle, and one of the perceptually similar images retrieved by the system was that of two horses. Legs of two horses can be perceived as the wheels of the bicycle. This perceptual associations, again, has obvious potential for advertising.
Query 8: Fruit
Result: Rising Sun
In this last example we queried with the image if an apple. One of the interesting results obtained was the image of the rising sun.
5. Conclusion and future research. We demonstrated in this paper by showing a number of examples that an image-based retrieval system is quite effective in generating perceptual metaphors — that is, metaphors that juxtapose conceptually different objects or situations based on perceptual similarities. In reflecting on these juxtapositions, and trying to make sense out of them conceptually, often leads to creative insights. (See also Fauconnier and Turner 2002; Gineste, Indurkhya and Scart 2000; Hofstadter et al. 1995; Hunter and Indurkhya 1998; Indurkhya 1992; 1997.) In this sense, our system acts as a creativitysupport system. In future, we plan to do a study in which the pairs of images generated by the system are presented to people to find out how they interpret them, as a considerable extent of reader’s creativity is involved in this process (Indurkhya 2007). Then we would like to model this interpretation process, and generate a computational system to generate the interpretations autonomously. A system like this will have obvious application in advertising and other domains where offbeat and creative ideas are constantly needed. References [1] Al-Hawamdeh, S., Ooi, B. C., Price, R., Tng, T. H., Ang, Y. H. and Hui, L. (1991). Nearest neighbour searching in a picture archive system. International Conference on Multimedia information Systems ‘91 (Jurong, Singapore). McGraw-Hill, New York, NY, pp. 17–34. [2] Bertino, E., Rabbiti, F. and Gibbs, S. (1988), Query processing in a multimedia document system. ACM Trans. Inf. Syst. Volume 6, Number 1, New York. [3] Datta, R., Joshi, D., Li, J. and Wang, J.Z. (2008). Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys 40: 1 [4] de Bono, E. (1975). New Think: The use of lateral thinking in the generation of new ideas. Basic Books, New York. [5] Fauconnier, G., and Turner, M. (2002). The way we think: Conceptual blending and the mind’s hidden complexities. Basic Books, New York. [6] Flickner, M., Sawhney H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. and Peter Yanker, P. (1995), Query by Image and Video Content: The QBIC System. IEEE Computer 28:9, pp. 23–32.
[7] Gineste, M.-D., Indurkhya, B., and Scart, V. (2000). Emergence of features in metaphor comprehension. Metaphor and Symbol 15, No. 3, pp. 117–135. [8] Gordon, W.J.J. (1961). Synectics: The development of creative capacity. Harper & Row, New York. [9] Harry, H. (1992). On the role of machines and human persons in the art of the future. Pose 8 (September 1992), pp. 30–35. Reprinted in P. Harmsen, E. Koppelman-Valk and M. Vredeling (eds.) Kunst en Technologie: The Beauty and the Beast? Eburon, Delft (1992), pp. 67–73. [10]
Harry, H. (1997). On the mechanism of human facial expression as a medium for interactive art. Catalogue Ars Electronica, September 1997, Linz, Austria.
[11]
Hofstadter, D., and The Fluid Analogies Research Group (1995). Fluid concepts and creative analogies. Basic Books, New York.
[12]
Hunter, D. and Indurkhya, B. (1998) ‘Don’t think, but look’ A Gestalt interactionist approach to legal thinking. In K. Holyoak, D. Gentner, and B. Kokinov (eds.) Advances in Analogy Research. New Bulgarian University, Sofia, Bulgaria, pp. 345–353.
[13]
Indurkhya, B. (1992). Metaphor and cognition, Kluwer Academic Publishers, Dordrecht, The Netherlands.
[14]
Indurkhya, B. (1997). On modeling creativity in legal reasoning. Proceedings of the Sixth International Conference on AI and Law, June 30 – July 3, 1997, Melbourne, Australia, pp. 180–189.
[15]
Indurkhya, B. (1998). Computers and Creativity. Unpublished manuscript. Based on the keynote speech “On Modeling Mechanisms of Creativity,” delivered at Mind II: Computational Models of Creative Cognition, Dublin, Ireland, September 15-17, 1997. URL: http://www.iiit.ac.in/~bipin/
[16]
Indurkhya, B. (2006). Emergent representations, interaction theory, and the cognitive force of metaphor, New Ideas in Psychology, Volume 24, Issue 2, pp. 133-162.
[17]
Indurkhya, B. (2007). Creativity in Interpreting Poetic Metaphors, in T. Kusumi (ed.) New Directions in Metaphor Research, Hitsuji Shobo, Tokyo, pp. 483–501.
[18]
Koestler, A. (1964). The act of creation. Hutchinsons, London.
[19]
Martínez, J.M. (2002). Mpeg-7: Overview of mpeg-7 description tools: Part 2. IEEE MultiMedia, 9(3):83–93.
[20]
Rodari, G. (1996). The grammar of fantasy. Teachers & Writers Collaborative, New York.
[21]
Schön, D.A. (1963). Displacement of concepts. Humanities Press, New York.
[22]
Tandon, P., Nigam, P., Pudi, V. and Jawahar, C.V. (2008). FISH: A Practical
System for Fast Interactive Image Search in Huge Databases Proceedings of 7th ACM International Conference on Image and Video Retrieval (CIVR '08), July 7-9, 2008, Niagara Falls, Canada.