Image Wave - CiteSeerX

1 downloads 0 Views 608KB Size Report
shot composition analysis of a scene from Michael Jackson's. 'griller”. ... Fig.4 Thriller shots length Fig.5 Thriller liminance variance. 2) Subject motion & camera ...
Image Wave -A study on image synchronizationRyotaro Suzuki ATRMEdiaIIltf!gEItifXl&

0 m~tj0ns-IahlE-a~ Seika-cho Smaku-gun Kyoto-fu 619-0238 Japan TEL : +81774 95 1463

[email protected]

Yuichi Iwadate ATRMedial&eg&n& (?KInmti~Research~tories Seika-cho Soraku-gun Kyoto-fu 619-0238 Japan TEL : +81774 95 1460

[email protected]

Mlichihiko

Minoh

Cen&rfbrI&ma~andMultirn& Studis,KyotoUniveniity Yoshidahonmachi Sakyo-ku Kyoto-shi Kyobfu 606-8501 Japan TEL : +8175 753 5995

[email protected]

gradually, and finally, during a climax scene and a certain song (“The sound of music”), they were perfectly harmonious ! The author was so surprised and he tried hard to replay and record this mysterious synchronization. Synchronization was not necessarily the primary topic of Multimedia Montage in its beginnings. However, since we started to make experimental counterpoint movies of Multimedia Montage, we have encountered various cases of synchronization. At the final stage of this study, we decided to concentrate on this synchronization problem and consequently started a new study named “Image Wave”.

ABSTRACT “Image Wave” is a new study paradigm based on the hypothesis which is summarized as follows. 1) Images like movies that exist in time have their own rhythms, and those rhythms are formed as synthesized waves, each of which has its own frequency, phase and fluctuation. 2) Images in the mind themselves are synthesized waves. According to this hypothesis, our study searchesfor a new way of multimedia component synchronization based on the internal rhythm information of the components.

Keywords Image, rhythm, synchronization, counterpoint

2. THE “IMAGE 2.1 Diachronic

1. INTRODUCTION

WAVE”

CONCEPT

Synchronicity

C. G. Jung, a famous psychoanalyst, calls a significant coincidence among events “synchronicity” [3]. Our observation indicates that such synchronization can happen not only among present events but also among past events of distant times and that it can be replayed. In addition, the fact that some synchronization is significant to some person is based on the fact that the synchronization is recognized by that person. Therefore, synchronicity is not a pure objective phenomenon but a mutual phenomenon between outer world events and a human as the observer of these events. ln our lmage Wave study, we call this synchronicity among temporary distant past (or possibly future) events “diachronic

Since 1997 we have been engaged in the study of “Multimedia Montage” [l] [2], which synthesizes multimedia components including video & sound clips, based on a counterpoint structure. In the movie production experiments of this study, we observed among elements to be conspicuous synchronization synthesized. The first synchronization observation was made at the beginning of the Multimedia Montage study. When the author was watching the famous classic silent movie video “Dr. Caligari”, he felt something missing inside and started to listen to a music CD of the Japanesepop music group “Pitticato Five” in parallel, ln doing so, the movie and the music startedto synchronize

synchronicity”.

2.2 Counterpoint permission to make digital or hard copies of ell or part of this work fo, pereonel or cleSS,oom We is granted without fee provided that copies ere not made 0, distributed for profit or commerciel edvent -We a”d that copies bear this notice and the fu~t citation on the first page, To COPYotherwise. to republish, to post on sewe,s or to redistribute f* lists. requires prior specific permission end/o, e fee, ACM Multimedia ‘99 Part 2) lo/s9 Orlando, FL, USA @ ‘999 ACM l-581 1%239mxi/oolo.., $5.00

Documentary

The concept of counterpoint consists of the synchronization among autonomous independent elements [1][2]. Composers of counterpoint musical pieces carefully prepare suitable melodies for counterpoint synchronization. On the other hand, many creators from a long time ago seem to recognize the fact that even past records can be synchronized. A good example of diachronic synchronicity using the counterpoint method can be found in radio documentary works

179

Synchronization Elements

Work Title

such as “The Idea of North” by musician Glenn Gould [4]. In these works, he combined monologue elements, which he recorded independently, to harmonize like a conversation using the counterpoint method. Such works are generally called “counterpoint documentaries”.

(Caligari)

hand motion vs. sound

Dance Canonica [l]

body motion vs. body motion body motion vs. sound body position/motion vs.

INVENTION

body position/motion

-Happy+Sad-

2.3 Image Wave There must be some mechanism that generates this curious phenomenon. The most natural way to explain this would be to think that each recorded image (including sound) has its own rhythm able to synchronize with and a human being has the ability to recognize this rhythmical synchronization by performing synchronization in its own brain. In other words, images are waves both in the outer world and in the human mind. We summarized this hypothesis as follows and named it (and the new study paradigm based on the hypothesis) “Image Wave”. 1) Images like movies that exist in time have their own rhythms, and these rhythms are formed as synthesized waves, each of which has its own frequency, phase, and fluctuation. 2) Images in the mind themselves are synthesized waves.

@k. 21

( body motion vs. sound ) Kazoku Game Game

hand motion vs. hand motion (Pig. 3)

Fig. 1 Types of synchronization

elements

A very similar idea about what an “image” is can be found in neurophysiologist Karl H. Pribram’s “Holonomic Brain Theory” [5][6]. In his theory, he indicates that the image recognition and the imagination process in the human brain corresponds to wave data analysis/synthesis just like Fourier transformation, His research is influenced by N. Wiener’s cybernetics [7] and D. Gabor’s spatial frequency research, the result of the latter is known as Gabor filters [8]. These researches are expected to give various useful hints to our Image Wave study.

3.RHYTHMICSYNCHRONIZATIONOF’IMAGIS The actual goal of our Image Wave study is to synchronize multimedia components based on the internal rhythm information of these components. The target multimedia components are video clips (as sequences of digital image data) including sound. To achieve this goal, it is necessary to determine what rhythm information to extract, how to extract that information from video clips, and how to synchronize the video clips using the rhythm information.

Fig 2. INVENTION

-Happy+Sad-

3.1 Rhythmic Elements of Images We analyzed the above-mentioned synchronization of “Caligari” and found that the major factor in the synchronization existed between hand motion and sound. Different kinds of synchronization were observed in our counterpoint movie experiment. The actual main elements of the synchronization in each work were as follows (Fig. 1).

Fig. 3 Kazoku Game Game

180

As shown in Fig. 1, the actual elements of the synchronization were hand motion, body position/motion, and sound. The synchronization existed among the combination of these three elements. This finding seems to be very natural when viewed from cognitive science and psychology. Therefore, one way to achieve systematic synchronization is to control these elements directly. Such direct control is especially suitable for CG systems. However, our goal system is based on recorded video images. It would be too difficult to extract the above information from recorded video images. In addition, there would be no guarantee that the synchronization elements would be limited to these three. Our Multimedia Montage study owes much to past Russian movie director Eisenstein’s montage theory [9]. If we assume that each movie element (such as camera angle, shot composition, etc.) corresponds to a synchronization element, the theory gives a more general classification of the elements as follows. (The word “rhythm” is treated with a narrower meaning in the following classification than in our study.) 1) Metric Montage : shot length 2) Rhythmic Montage : subject motion & camera work 3) Tone Montage : color ! brightness I contrast 4) Overtone Montage : combination of these elements 5) Intelligent Overtone Montage : symbols / semiotics / semantics 6) Vertical Montage : sound (against motion picture)

3.2 Rhythm

Information

Fig.4 Thriller

shots length

Fig.5 Thriller

liminance variance

2) Subject motion & camera work This element includes hand motion and body motion. Total motion detection can be done by the optical flow method. The detection includes camera work information able to be dissociated. h-r the case of motion detection of a certain part, the template matching method can be utilized, though the condition would be limited as mentioned above. (The optical flow method, too, has its own constraint.) 3) Color / brightness / contrast Color is an independent topic that is not included in our study so far. As far as the luminance distribution is concerned, spatial frequency analysis [5][6][10][1 I][ 121is an effective method. The temporal changes of the whole spatial frequency power spectrum data can be reduced by summing the values up for each frequency band. The following provides result of our analysis of two “Odessa” scenes (a calm port scene and a famous staircase scene) from Eisenstein’s “Battleship Potemkin” (Fig. 6, Fig. 7). a) There is not much difference in the power spectrum between high frequency bands and low frequency bands except in case that soft focus is used. b) There is a correlation between the power spectrum and camera angle. The spatial frequency power spectrum is higher in close up shots as clearly shown in the latter part of

Extraction

Among these six rhythm elements based on the montage theory, we mainly process elements 1) to 4). They can be physically extracted from a sequence of digital image data as follows to form image wave data with temporally variant values from frame to frame. 1) Shot length A sequence of shot length can be extracted semi-automatically based on the average luminance difference of the pixels between neighboring frames. We obtained the following results from a shot composition analysis of a scene from Michael Jackson’s ‘griller”. a) The average shot length excluding the introduction part is exactly the same as the measure of the music (Fig. 4) b) The value of the average luminance difference changes a lot on the shot borders (Fig. 5).

-6izH68PEHP f-M

Fig.6 Odessa Port Spatial Frequency

-_ K ssspgg ‘mm No. Fig.7 Odessa Staircase Spatial Frequency

4) Combination The most basic temporal image variance data is the average luminance difference of pixels between neighboring frames. The data includes both motion and luminance distribution data. The

181

observation, we have noticed the fact that the synchronization can well be felt even when it happens only a few times in a whole series. The counterpoint synchronization gives affine transformations to multimedia components to synchronize their key points as much as possible.

average can be calculated in several ways. Figs. 8, 9 represent the average squared luminance difference of pixels between neighboring frames of the same movie scene as in Figs. 6,7. Very similar patterns are seen in Figs. 6,7.

4. CONCLUSION

Fig.8 Odessa Port Luminance Variance

AND FUTURE WORK

We have just begun this study. Our present main issues are as follows. 1) To develop a new method for rhythm information analysis that is more suitable for human emotions than present timefrequency analysis methods. To develop a prototype system that actually synchronizes 2) video clips based on the rhythm information in the clips.

Fig.9 Odessa Staircase Luminance Variance

The resulting image wave data are to be dissociated to simpler wave elements by using self correlation, short-time Fourier analysis, wavelet, MEM, and other time-frequency analysis methods.

5. REFERENCES [I ]Suzuki,R.,Inoue,S.:Dance Canonica, The 6* ACM International Multimedia Conference -Art Demos-Technical Demos-Poster Papers-, p.37 (1998) [2]Suzuki,R.,Iwadate,Y.:Multimedia Montage -Counterpoint synthesis of movies-, IEEE International Conference on Multimedia Computing and Systems ‘99, Vol. 1, pp.433438 (1999) [3]Peat, F.D.:Synchronicity, Bantam Books Inc. (1987) [4]Page,T.,Ed.:The Glenn Gould Reader, Vintage Books (1990) [S]Pribram, K.H.:Languagesof the Brain, Prentice-Hall Inc.(1971) [6]Pribram,K.H.:Brain and Perception, Lawrence Erlbaum Associates Inc. (1991) [7]Wiener,N.:Cybemetics 2nd edition, The MIT press (1961) [S]Gabor,D.:Theory of communication, J. IEE(London) 93, pp.429457 (1946) [9]Eisenstein, S.M.:The Works of Sergei Eisenstein, Kinema-Junposha Inc. (1980-1993) [lo] De Valois,R.L.,De Valois,K.K.:Spatial Vision, Oxford University Press (1990) [l l]Rosenfeld,A.:Digital Picture Processing, Academic Press Inc. (1976) [12]Pentland,A.P.:From 2-D Images to 3-D Models, Science of Vision, Springer-Verlag (1990)

3.3 Image Synchronization At the present stage, we are planning to use the following three methods for movie image synchronization, 1) Shot Sysnchrouization This is the most basic synchronization method based on the shot length, and is often used as a montage technique. We developed a simple tool named “frame mixer” which merges two video clips with given frame rates. 2) Frequency Syuchronizatiou The results of the rhythm information extraction give actual frequency information along the time axis. The frequency synchronization gives affine transformations (to shift, to scale, or to reverse) to the multimedia components to synchronize their frequencies and phasesbased on the counterpoint method. 3) Counterpoint Synchronization An important characteristic of counterpoint music is that not all parts of the whole melody line are equally synchronized but just some key points (which correspond to the meaning of the word “counterpoint”) control the synchronization. In our

182