Exploration of Large Image Collections Using Virtual ... - CiteSeerX

0 downloads 0 Views 60KB Size Report
used as a front end to a visual information retrieval system. CR Categories and Subject ... some possible generalizations of these ideas to other areas of infor- ... images at a given speed by looking at the strip at a certain angle. The larger the angle the .... Finally, generalized methods to cut-and-paste elements will become ...
Exploration of Large Image Collections Using Virtual Reality Devices Robert van Liere

Wim de Leeuw

Center for Mathematics and Computer Science, CWI

Abstract An image browser for the exploration of image collections is described. The approach taken is to utilize VR input devices to develop more intuitive interaction metaphors that allow users to navigate through large collections of images. The browser presents query results as a strip of images and interaction with the strip is realized by interpreting the users’ head movements. The browser is used as a front end to a visual information retrieval system. CR Categories and Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation; I.3.6 [Computer Graphics]: Methodology and Techniques Additional Keywords: Virtual Reality, Information Visualization

1 Introduction Visual information retrieval systems allow images to be retrieved from data repositories subject to a user defined query. Queries are based on image properties (such as color, texture and object shape) or keywords. One of the key problems is that it is hard to articulate a query which represents the type of image one is searching for. Although querying techniques continue to improve, there is no ultimate solution in sight. Because of this we assume that as data repositories grow larger, the results of these queries will also grow larger. This paper focuses on the presentation and interaction with such large query results. Predominant user interfaces to image retrieval systems present image collections as a set of pages (see for example www.corbis.com). Each page contains a matrix of images in thumbnail format. Users can browse through the set of pages by requesting the next/previous page, or enlarge an image by clicking on the image. We believe that this way of searching the query results can be improved. In particular, novel interaction metaphors can ameliorate browsing, navigation and manipulation of large collections of images. The approach taken in this paper is to utilize VR input devices to enhance user performance for the exploration of image query results. The motivation of this approach is that VR interfaces can be more intuitive when searching large image collections. Our perceptual assumption is that humans are quick at grasping a relevant subset of images to focus attention for further exploration. This perceptual aspect is used to circumvent the limitations in current image retrieval systems, namely poor syntax/semantics indexing and the small/unmanageable screens. However, VR devices also are limited in the simulation of inspection of real photographs. Examples of these limitations are low resolution and a limited space (single screen). In the design of a system these limitations must be taken into account. We discuss our prototype system based on head-tracking and 3D input devices to explore large image collections. Typically, a few thousands of images with varying resolutions will be explored. A typical image is several megabytes of data. The rest of this paper is organized as follows: after we briefly discuss related work, we  Center for Mathematics and Computer Science CWI, P.O. Box 94097, 1090 GB Amsterdam, the Netherlands E-mail [email protected]

motivate some aspects of our approach and we discuss the implemented prototype. In the section on future directions, we present some possible generalizations of these ideas to other areas of information visualization.

2 Related work User interfaces to visual information retrieval systems have recently gained much attention, [1, 2]. Effective access to visual information is obtained by allowing the user to switch back and forth between browsing and querying by content. Research is underway in defining new ways of representing the content of visual archives and the paths followed during a retrieval session. In retrieving visual information, high-level semantic concepts are often used together with perceptual features in a query. Using the concept of immersive, stereoscopic displays for interacting with large information spaces is relatively new. Advancements in key technologies, including graphics processing hardware, stereo graphic display, and six-degrees-of-freedom position trackers, have made many the necessary technologies available to the research community, allowing new interaction paradigms to be developed and evaluated, [3, 4] Hix, Templeman and Jacob have evaluated the pre-screen projection technique, [5]. This allows a user to pan and zoom through a scene, simply by moving her head relative to the screen. Prescreen projection is based on real-world visual perception, namely, that a person’s view changes as the head moves. Although, the authors deal only with the portrayal of textual information, they found pre-screen projection can provide a natural, useful means of humancomputer interaction. Gibson’s work on direct perception ecological optics suggests that relationships between head movements and the visual input disclose new information about the environment and the user’s place in it, [6]. For example, people expect objects to appear larger as they move towards them. Making proper use of such perceptual cues can improve the effectiveness of presenting large information spaces.

3 The StripBrowser prototype To test the described ideas a prototype image browser called StripBrowser was developed.

3.1 Presentation The underlying metaphor for presentation is that of a filmstrip: a strip of cells, with each cell containing an image. For viewing, only a – possible small – section of any filmstrip is drawn. A user drags a section of any strip to view an other part of the image strip, much like a cinematographer inspects a filmstrip. One or more filmstrips are presented to the user at once. The strips are scaled such that the all strips fit within the screen. Figure 1 is a snapshot of the interface consisting of three filmstrips. At the bottom of the screen is a control panel consisting of a number data sliders, some simple

Figure 1: Screen shot of the application. Three image strips and the control panel (below).

operation buttons (discussed in the next section), a trash can and a exit button.

3.2 Interaction The basic idea is to interpret the users’ head movement to navigate through the strips. The movements of the users’ head are translated into navigation commands in a intuitive way. For example if the user moves her head to the screen for a closer inspection of an image, an enlarged version of the image is put on the screen. The user drags the filmstrip by turning her head in the lateral direction (see Figure 2). When the user looks to the right of the screen ( in the figure) the filmstrip will move to the left and vise versa. If the user looks at the center of the screen the strip does not move. To prevent small movements of the strip due to tracker noise, the strip does not move for angles close to zero. The velocity of the filmstrip as a function of the head angle is drawn in figure 3. The speed is proportional to angle between the line of sight and the normal to the screen. In this way, the image which the user looks at will always move towards the center of the screen. This way of navigation gives the user the possibility to browse images at a given speed by looking at the strip at a certain angle. The larger the angle the faster the images will pass by. If a particular image catches the attention of the user she will automatically keep it sight. The relation between viewing angle and velocity will ensure that the image strip will stop when the viewing angle is zero. The user’s line-of-sight is also used to determine the appropriate filmstrip. If a user wants to drag different a filmstrip, she simply looks at different strip. Head-movements are also used to perform zooming. Zooming is realized by moving the head towards the screen. This head movement will enlarge the image that the user is inspecting. Moving the head away from the screen resets the size of the image. We use the metaphor of a human moving her head closer to an object because this is the natural way to inspect an object in detail.

direction of picture strip

44444444444444444444444 screen 44444444444444444444444 44444444444444444444444 α

line of sight

user

Figure 2: The angle of the users’ head (line of sight) determines the velocity of the film strip

This method of navigation is possible because the screen area is large compared to the distance between the user and the screen. The user not only uses eye movement but also head movement to look at different parts of the screen. If the screen is small compared to the distance of the viewer, the techniques could be used as well, but eye tracking would be needed to get a sufficiently accurate line of sight of the viewer. Although the interaction is steered by head movement, we have not noticed that fatigue is a problem because the system reacts to head movements which would normally be made while working with a large viewing angle screen. This is especially true for the strip movement and strip selection. For zooming, by moving the head closer the screen, action by the user is needed. As an alter-

screen strip 1

v

strip 2

no movement 0

α 45

0

strip 3

45

in cache !!! not in cache !!! !!!

Figure 4: Caching of images in texture memory is based on visible images and movement of strips. The top strip moves from left to right, the image textures left of the screen are assigned a high priority. Figure 3: The relation between horizontal line of sight angle and speed of the currently selected strip.

native, a button on the wand can be also used to zoom the image which the user currently looks at. A clipboard facility is also provided as a separate filmstrip. Users select and drag an image onto the clipboard. This is realized with the 3D wand. Finally, the ordering of images in a filmstrip can be altered using sorting algorithms. The menu presents a number of sorting options based on the dominant color of the images. Images can be sorted along a rainbow scale or from light to dark. Sorting operations aid the user to categorize the images. By sorting the images by a certain criterion, the user can direct her interest to a part of the collection.

3.3 Rendering performance High and predictable rendering performance is essential for the described way of browsing. VR literature states that a frame rate of ten frames is a minimum requirement, [7]. In a single session several hundred images might have to be browsed. The demand for interactively poses high demands to system demand. Each uncompressed image uses several hundreds of kilobytes or even a few megabyte. To obtain high rendering performance the images are loaded in texture memory and mapped on a polygon when drawn. A priority based texture caching system has been developed that downloads and manages images in the texture memory. Images that will be displayed reside in texture memory. The texture caching system is implemented by tracking the visible section of each filmstrip and the direction it is being dragged. Each image texture is assigned a priority based on where it will be on screen in the next few frames (see figure 4). If an image is not in the cache it is loaded replacing the texture with the lowest priority. Image textures outside the screen area are assigning a lower priority than image textures moving inside the screen area. The VR equipment consists of a large back projected display with a head-tracker and a 3D joystick devices. The graphics engine is a high-end SGI Onyx2 iR system with 64 MByte of texture memory. This setup can support browsing at interactive speeds of approximately thousand 512x512 images. The capacity of main memory is the limiting factor for browsing additional images. By lowering the maximum resolution at which the images are stored the number of images can be increased. Currently the system is used in monoscopic mode. Stereo viewing of the images would not provide extra information to the user in this application and is therefore disabled.

4 Future Directions The following issues will be addressed in the future:

 user evaluation Although the initial reactions to the system are positive, we plan to perform user evaluations on headtracked navigation through large collections of images. In particular, issues related to fatigue and the relation between head rotation and strip velocity will be evaluated.  alternative presentation Currently, we render images with the original aspect ratio. We plan to render images with a distortion factor; resulting in a fish-eye view of the filmstrip. Such a presentation aid the browsing task since the number of rendered images is increased. Currently the query result is presented as a linear list of images. Other ways of presenting query results are possible by using additional image properties that can be used to organize the presentation. For example, image can be ordered such that clusters of similar images are easily recognizable. Stereoscopic viewing might prove very useful for this type of presentations.

 more general data types We developed ways for the headtracked navigation of linear lists. We believe that this metaphor can be extended to navigate other data types such as matrices, graphs, etc. Generalized methods will be studied to allow a user to navigate and zoom through these data types. Finally, generalized methods to cut-and-paste elements will become available.

5 Conclusion In this paper we described new metaphors for browsing a large collection of images. Image collections are presented as filmstrips. Navigation and zooming are very natural and require little of the user’s attention; they are driven by head gestures. The chosen metaphors through the filmstrip (continuous dragging, zooming by close looking) are more intuitive than of a photo album (discrete clicking the next-page button). In addition, multiple input actions may be performed simultaneously, because head-tracked navigation frees the wand for other functions such as selection. Although this is still work in progress, we believe that the underlying ideas – using virtual reality devices to navigate and interact with large information spaces – are valid. We have given an example how VR devices can be used on lists of images. We believe these metaphors can be adapted to navigate through more general types of information spaces.

References [1] S. Card, J.D. Mackinlay, and B. Shneiderman, editors. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann Publishers, 1999. [2] A. del Bimbo. Visual Information Retrieval. Morgan Kaufmann Publishers, 1999. [3] M. Deering. High resolution virtual reality. In E.E. Catmull, editor, Computer Graphics (SIGGRAPH ’92 Proceedings), volume 26, pages 195–202, 1992. [4] C. Ware, K. Arthur, and K.S. Booth. Fisk tank virtual reality. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel, and T. White, editors, INTERCHI ’93 Conference Proceedings, pages 37–42, 1993. [5] D. Hix, J.N. Templeman, and R.J.K. Jacob. Pre-screen projection: from concept to testing of a new interaction technique. In CHI’95 Conference Proceedings, page todo, 1995. [6] J.J. Gibson. The Ecological Approach to Visual Perception. Houghton Mifflin, Boston, 1979. [7] S. Bryson. Approaches to the successful design and implementation of VR applications. In R.A. Earnshaw, J.A. Vince, and H. Jones, editors, Virtual Reality Applications, pages 3–15. Academic Press, 1995.

Suggest Documents