Improving Photo Searching Interfaces for Small-screen Mobile Computers Dynal Patel, Gary Marsden
Matt Jones
Steve Jones
Computer Science Department University of Cape Town South Africa +27 21 650 2663
Computer Science Department University of Swansea Wales
Computer Science Department, University of Waikato, Private Bag 3105, Hamilton,
{dpatel,gaz}@cs.uct.ac.za
[email protected]
New Zealand,
[email protected]
single (individual photograph) and a property (set of photographs with a common theme). What’s more, these search requests are likely to require data at the primitive level (colors, textures, composition and shapes), generic level (types of objects and activities), specific level (names of people, places, landmarks and events) and abstract level (moods or feeling evoked). The challenge is in providing users with the necessary tools to formulate and conduct these queries.
ABSTRACT In this paper, we conduct a thorough investigation of how people search their photo collections for events (a set of photographs relating to a particular well defined event), singles (individual photographs) and properties (a set of photographs with a common theme) on PDAs. We describe a prototype system that allows us to expose many issues that must be considered when designing photo searching interfaces. We discuss each of these issues and make recommendations where applicable. Our major observation is that several different methods are used to locate photographs. In light of this, we conclude by discussing how photo searching interfaces might embody or support such an approach.
Kinberg et al. [9] provide a useful taxonomy of reasons for image capture. The first dimension delineates whether subjects captured images for affective verses functional reasons. The second delineates social versus individual intentions. Frolich et al. [6], provide another taxonomy that delineates photoware according to a time/place matrix. It illustrates the type of photo sharing technologies that would be appropriate in varying temporal and spatial locations. Most importantly, they show the diverse scenarios under which photo searching might occur.
Categories and Subject Descriptors H.5.2 [User Interfaces]:
General Terms
Beyond purchasing a camera, the cost of taking a photograph is almost nothing [10]. As a result, people are able to amass large photo collections in a short period. As the sales of camera-phones (not to mention personal media devices) far outstrips personal computers, most users will ultimately keep their digital photo collections on mobile devices. Given that photography is not a solitary activity, people are likely to also accumulate photographs from friends, family and colleagues that co-experience events or activities. The challenge then is aggregating, organizing and structuring this explosion of data so that it can be made searchable in the most efficient way on devices with varying capabilities (screen size, storage, networking, processing power and input mechanisms).
Design, Experimentation, Human Factors,
Keywords Photo Searching, PDAs, Interface Techniques
1. INTRODUCTION Mobile devices such as the Sony PSP and Nokia N91 have large storage space, multiple networking capabilities and high resolution displays. These culminate to provide all the necessary ingredients for a truly mobile photo collection. Despite this innovative hardware, substantial breakthroughs on the software front have been somewhat less forthcoming. The challenge has been in designing interface techniques that can truly leverage this powerful hardware, despite the reduced form-factor. In order to address this challenge it is necessary to take a step back and look at some of the challenges surrounding photo searching.
With personal photography, people are often not willing to put in the effort that is required to annotate each photograph as it is a time consuming process [14]. This means conventional text based searches cannot be built into the interface, as there is no consistent and meaningful text metadata to create indexes. As a result, photo searching interfaces rely on extracting and indexing other low level image content such as pixel or EXIF data that is more readily available. The obvious problem is that it becomes increasing difficult to specify higher level or more abstract search requirements.
Firstly, the types of search requests are fairly diverse. According to Rodden and Wood [13], people are most likely to look for: an event (set of photos relating to a particular well defined event); a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
When designing photo searching interfaces it is essential that the issues above are addressed, namely: searching without metadata; supporting a variety of social activities and searching on a variety of levels. To this end, we develop a prototype system to see how these issues manifest on mobile devices. The main goal is to
MobileHCI'06, September 12– 15, 2006, Helsinki, Finland. Copyright 2006 ACM 1-59593-390-5/06/0009...$5.00.
149
Lastly, some researchers [3] have looked at browsing photographs according to the location they were captured. Unfortunately, these techniques have not been evaluated against traditional photo searching interfaces. While location information provides useful context, it may not be adequate in all situations. Also, devices capable of capturing GPS data are not widespread. Although one could use the cell information on cell phones to provide the location data, this technique is inadequate for distinguishing events that occur in the same cell or location.
provide a deeper understanding of the design space and provide insights into viable solutions. In the next section we review a number of photo searching interfaces that have been proposed. We then outline our main research goals. We describe the two experiments that we conducted and go on to present our results. Lastly, we discuss our findings before drawing conclusions.
2. RELATED LITERATURE 2.1 Research Approaches
2.2 Placing our Research Photo searching research on mobile devices is still in its infancy. One reason for this is that devices capable of archiving large numbers of photographs have only just begun to emerge on the market. The bulk of research has focussed on understanding the range of activities that have been enabled using current technology, such as camera phones. While such methodologies are essential in shaping future technologies, they are not the only vehicle one might use when thinking of future technologies.
A common search strategy used in many commercial photo management tools is to rely on browsing as opposed to searching [1][2]. Instead of relying on conventional text based searches, they make use of the fact that human brain is able to process visual information rapidly by presenting images in a flat, scrollable thumbnail grid. This technique works particularly well on high resolution screens where you display a large number of thumbnails. However, the technique is not as effective on PDAs where the lower resolution screens only allow 6-9 images to be shown at any one time.
In this paper we use a less conventional approach where we think how a current activity (i.e. photo searching) might be supported on future devices with a high quality camera and large storage capabilities. We realise that with such an approach it hard to know all the possible scenarios one must support. Furthermore, each scenario is likely to require a unique photo searching interface. To this end, we choose to be consistent with previous work by investigating how to design photo searching interfaces that can be used to support classic activities such as story telling or reminiscing, where users are required to “dig” into their photo collection to find appropriate pictures. In other words, we choose not to support activities such as joking or teasing where the photographs supporting the activity are likely to be the most recent ones, usually captured during the activity itself. Although, these activities may be more frequent, we feel that in terms of photo searching the problem is not as challenging.
Some researchers have tried to port other photo browsing strategies for desktop computers on to PDAs. For example, Pocket Photomesa [8] uses treemaps to display thumbnails in the most screen-efficient way. Unfortunately this novel scheme is not scalable for large photo collections as only a few photographs can be displayed on the screen at a time. Another example is Apple’s browser [2] for the photo iPod. This uses RSVP (Rapid Serial Visual Presentation) to present images within each “roll” – images are displayed rapidly, one after the other, on the full screen of the device. Research [5] has shown, however, that RSVP techniques are no faster than using flat, scrollable thumbnail grid browsers. A common problem with using “rolls” or playlists is the boundary problem, where you cannot easily navigate to an event in the next roll without first moving out of the current roll and then selecting the next one – i.e. the display of images halts when a folder’s images are exhausted rather than continuing to the next (temporally) folder. Besides the additional navigational effort that is required, people also have problems comparing boundary events and photographs.
Preliminary research in this area has shown that our scenario is a believable one. Indeed research [10] has shown that photographs are used for a variety of activities, including story telling and reminiscing. Similarly to Rodden’s research, Kindel et al [9] have observed that people would like to capture and carry singles (individual pictures that contain some value) and properties (digital flipbooks of favourite images). As camera phones continue to replace digital camera as the primary capture device, people are likely to capture and store events such as birthdays or weddings as they have previously done with print and digital photography. Researchers have also observed that when the utility of the image is in the sociability around the image, as opposed to picture itself it is likely to be deleted with time. In other words, photographs supporting these new activities around photography are unlikely to have a same lifetime as photographs archived for story-telling and reminiscing.
Another common strategy is to create automatic time and event based clusters [7][12]. Both of these solutions are susceptible to the boundary problem. While time-based clusters may provide a great overview, they overlook the fact that events are the most natural way of thinking of photographs [13]. Event grouping can also be split between clusters. Automatic event-based clusters are also problematic in that the definition of an event is very subjective. It is unlikely that any algorithm that automatically extracts event-based clusters will match what people perceive to be an event. Any mismatch is likely to complicate navigation as users might be unsure about which event cluster contains the target photograph.
The current model employed by many manufacturers has been to shift the photo organisation activity from the device to the desktop computer then synchronise the photo collection. As a result of this, it is likely that photo organisation and management techniques currently employed on desktop computers will be imposed (or even carried over) on the mobile phone. Fortunately, this means that current research on how people manage and organise their photo collections on desktop computers is still relevant in the mobile space.
Other researchers [14] have looked at clustering photographs according to colour. Although such a scheme may be adequate for a request at the primitive level, it is not sufficient for requests made at other levels. Furthermore, feature based extraction and identification has yet to be solved convincingly by AI research [14].
150
2.3 Photo Searching Process
3. AIMS
When performing activities such as storytelling and reminiscing, people look for events, singles and properties. Unfortunately no previous photo searching interfaces have been designed with the three most common photo searching tasks in mind [13]. When designing photo searching interfaces, it is vital to first understand the processes people go through when locating events, singles and properties.
The main goal of this work was to gain some insight into how to design photo searching interfaces for PDAs. To gain a deeper understanding of the matter, our more specific goals were to: • To conduct a thorough investigation of how people search their collections for events, singles and properties on desktop computers and PDAs. • To investigate the usefulness of a calendar interface and temporal organisation when searching photo collections on PDAs • To investigate the appropriateness of the scroll and zoom interaction techniques in supporting photo searching tasks on PDAs.
According to Rodden and Wood [14], locating print photographs from a particular event is relatively easy whether or not the collection is organised, as long as they have at least been kept together. They found that it was much easier to locate events with digital photographs because event folders could be kept in chronological order. When locating events they also found that it was much easier to remember when an event occurred relative to another event, than recalling its absolute date and time. Rodden and Wood found property searching tasks to be tedious for print and digital photo collections. They found that the chronological ordering or the classification by event did not help much. The search process involved repeatedly trying to think of a matching picture, and then looking for it. They found that exhaustive searches throughout the entire collection were only conducted in exceptional circumstances. Clearly users need software support.
2.4 Scroll & Zooming Interaction Techniques In a previous study [11] we presented two new techniques for browsing photo collections, designed specifically to support observed search behavior. These new interaction techniques were designed to overcome the inadequacies of conventional scrolling and zooming controls. With the AutoZoom technique, scrolling and zooming were inter-dependently controlled via a single user action (as scroll rate increased, zoom level dropped automatically). With the GestureZoom technique, the zooming and scrolling were independently controlled via a single user action (vertical cursor movement controlled scrolling and horizontal movement controlled zoom level). In both interfaces, photographs were presented in a vertical list that is a single image wide, with a chronological ordering placing the most recent images at the top of the view-port. The focus of our previous published study was on the efficiency, accuracy and satisfaction of these techniques in locating events, singles and properties. In a 72-subject usability experiment, we found that the new techniques were at least as good as a thumbnail grid browser. The new techniques were significantly faster than the conventional approach in locating small events or when seeking images containing small detail. They were also significantly more accurate in locating properties. Participants also reported lower levels of physical and cognitive effort and frustration with the new techniques in comparison to the thumbnail grid browser. The results were particularly encouraging as none of the subjects had prior experience with such systems. Of course, efficiency and accuracy are only part of the story. This paper, therefore, builds on our previous research as a platform to enable us to research broader interface design issues around photo searching.
[1]
[2]
[3]
[4]
[5]
[6]
Figure 1: Screenshots of the Augmented AutoZoom interaction Technique
151
collection. In Figure 1(6) new months are shown by alternating the background color between dark and light gray; years are shown by alternating between back and blue.
4. PROTOTYPE SYSTEM To investigate our goals, we developed the AutoZoom (see Figure 1) and GestureZoom (see Figure 2) interaction techniques further. We ported both techniques to the Symbian and Windows Mobile platform and improved our algorithms so that there no longer any restrictions on the amount of memory required or the maximum number of photographs.
We also removed the acceleration function from the AutoZoom technique. Previously this was applied when the minimum zoom threshold level was reached to allow you to scroll quickly through the photo collection. This resulted in visual blurring. We were interested in having one technique automatically control visual blurring (i.e. AutoZoom), while the other allowed users to control this manually (e.g. GestureZoom). This also allowed us to see how effective these techniques are for minimizing the dizziness or motion sickness that has been reported with continuous scroll and zoom interfaces [4].
4.1 How it Works For those not familiar, we will briefly re-cap how the two systems work. AutoZoom is shown in Figure 1 (1) which depicts the startup screen. As the scroll speed increases (see Figure 1 (2)), the photos become smaller and smaller, zooming out to get an overview and reducing the effects of visual blur. Scrolling at different rates allows users to view photographs at different zoom levels as seen in Figure 1 (3 & 4).
1
Previously users could increase the zoom level instantly by dragging towards the stylus down position. This resulted in problems when trying to move back up and locate an image that had just scrolled past. As zoom level was changing dynamically in response to each small movement, acquiring a target became difficult – users had to resort to make small, deliberate movements. To overcome this issue, we decided to lock the zoom level when the stylus is dragged back towards the stylus down position. Figure 1 (3) illustrates this situation. The white arrows indicate how far the user needs to drag before the zoom is altered. The zoom level could be increased by releasing the stylus from the screen. The view would then automatically animate in. The animation process can be stopped by tapping and holding the stylus on the screen. This provides a mechanism to view photographs at any desired zoom level. User can also scroll at this zoom level as long as the drag distance is not greater than the length of the white arrows (see Figure 1 (4)).
2
We also decided to remove the scroll bars from the initial prototype. The primary reason was that it allowed us to investigate the utility of scroll and zoom techniques as users would have no option but to use them. It also allowed us to investigate how our calendar overview is used as we replaced the scroll bar with a linear calendar widget that shows the current position in the photo collection. This allowed us to investigate the importance of temporal and spatial navigation.
Figure 2: GestureZoom Figure 1 (5) illustrates the interaction when the edge of a collection is reached. In this case the zoom is locked and you can only scroll upwards. Figure 1 (6) shows the overview when zooming out. Figure 2 below shows the modified GestureZoom interaction technique. For the GestureZoom interface, vertical scrolling operations control scroll speed and direction as with the AutoZoom interface, but don’t control the image size (zoom level). Zoom level is controlled by horizontal movement of the pointing device away from the horizontal centre of the viewport to the right-hand or left-hand side of the display. The image size is inversely proportional to the horizontal drag distance. The horizontal white arrows are used to indicate how far one needs to drag before the zoom is unlocked. Apart from these subtle changes the interface behaves in a similar manner to the AutoZoom interface.
Lastly, we also incorporated metadata gleaned from the file structure of how photo collections were stored on disc. We did so by using the information label at the top of the screen to show the path of the photograph whose bounds intersected the stylus down position (see Figure 1). We were able to incorporate photo collection structures by presenting flattened hierarchical folder structures as a path. As there is a limit to the number of characters one can show on the screen, we decided to show as many characters as possible from top-level folder structures to deeper level folder structures. The photograph path was updated when entering event folders. By including pre-organized photo collection structures we hoped to see when they were useful and also get some insight into how they could be exposed.
4.2 Interface modifications We made a number of interface modifications to allow us to investigate our goals. Firstly, we decided to lock the zoom level when the height of the photo strip is less than, or equal to, the height of the screen. This is different from our previous system where the zoom level could not be reduced beyond a user specified minimum size threshold. This was done to allow us to investigate the utility of the images when zoomed out. It also allowed us to provide users with a temporal overview of their
5. SUBJECTS We recruited 12 participants for the study, five were female and seven were male. Nine of the participants were postgraduate students, while 3 were university lecturers. All participants had an undergraduate degree with one or more majors in computer science, mathematics, statistics, engineering, physiology and English. While this sample does not represent the population as a
152
and still pick up sufficient detail to make out what is in the photo. When more detail was required users would zoom in, but would then usually return the comfort zoom level.
whole, these subjects, all early adopters, had digital cameras, computers and large collections of photographs which was essential in testing the utility of the interface. Photo collections were given wholesale without any editing or pruning. Our 12 participants had varying numbers of photographs. The mean number of photographs was 1489.08 (s.d. 1224.15). The minimum was 419, while the maximum was 4317.
The process of finding photographs was similar on both AutoZoom and GestureZoom for small and large events at short or long navigational distances. The process was rather more dependent on how well the event was remembered.
5.1 METHOD
7.2 Singles
We randomly split our participants into two groups. One group was required to use the AutoZoom interface, while the other used the GestureZoom interface. We began by conducting a conceptual model extraction. The aim of this was to ensure participants were familiar with all the functionality before beginning the main experiment. Following this, participants were requested to scroll through their entire photo collection to familiarize themselves with the format of their photographs. They were then given 10 minutes to get comfortable with the technique. During this period they were encourage to think of photographs and look for them.
When locating individual photographs participants try to associate it with an event. In this case the first step is to locate this event using the methods discussed above. Once the event has been located, the next step is to locate the target photograph.
7.3 Properties We found two different strategies were used to locate properties: a structured approach and a more sporadic approach. With the structured approach, participants would systematically work through the entire photo collection from newest to oldest. With the sporadic approach, participants worked through event folders from one likely event folder to the next.
For the main experiment participants were required to complete 27 tasks. Being consistent with our previous research we asked them to locate events and singles at short and long navigation distances. Events could also be small (3 or fewer photos) or large (more than 3 photos). A photograph feature could be small or large. A small feature was one that was 1/8th or less of the total image size (e.g. a small child in a forest scene), while a large feature was one taking up more than 1/8th of the image (e.g. a skyscraper). The main goal of the experiment was to investigate the photo searching process for each of these factor combinations. In doing so we looked at the use of temporal and spatial aids. We also looked at how the photo collection structures were used. Lastly, we investigated the utility of the AutoZoom and GestureZoom interface in performing these photo searching tasks.
7.4 Pre-organized photo collection structures The pre-organized photo collection structures were central to locating events. Event folder names were used to locate events in all photo searching task types. Participants preferred to use their own event folder names to distinguish events as opposed to scanning the photographs. Participants stated that they would have liked the ability to navigate through photo collections using these structures.
7.5 Temporal Photo Organization We found that our participants tend to think of their photographs in terms of events. In light of this we found that our calendar interface performed three major functions. Firstly, as each event has a unique temporal location it provided a way of distinguishing events. Secondly, it provided a way of arranging events in the order in which they occur. Thirdly, it provided a quick way of jumping to temporal location. However, it did not provide a way of selecting a time period. When participants cannot pinpoint the exact temporal location of an event they would think of the most likely time period. As such, they felt it was necessary to support this form of navigation. It was evident that temporal navigation was also important when event names cannot be remembered.
Subjects were encouraged to think-aloud when performing tasks and were allowed to take breaks when needed. Following the experiment, we conducted an informal, semi-structured interview. All the sections were video recorded and later transcribed.
6. MATERIALS The AutoZoom and GestureZoom interfaces were deployed on a HP iPaq PocketPC 4100 series. A video camera was used to record all interaction on the device. The audio was captured using a microphone. Both feeds were combined to generate a video for each participant.
7.6 Thumbnail Sizes and Quality
7. RESULTS 7.1 Events
As mentioned above, we found that users would find a comfort zoom level to hover over the photographs. This level was different for every person. Participants stated that they liked the fact that they could easily find a zoom level that was appropriate for the task. It was hard to put a value on the minimum or maximum zoom level, because it was largely dependent on the level of detail (or data) that was required to complete a search task.
When initially considering the position of an event, participants do not always think of the immediate surrounding events as a navigational aid. Sometimes events are positioned relative to “close” landmark events. On encountering the landmark event, they would then move more cautiously towards the target event, paying close attention to the event names and photographs. This information would be used to provide an indication of target location’s displacement and direction. We found that when approaching the target location, participants would hover at a “comfort” zoom level. At this zoom level participants felt that they were able to get a sufficient overview to distinguish events
As a memory optimization, high quality thumbnails were replaced with lower quality thumbnails at high scroll speeds. With AutoZoom, the scroll speed was inversely proportional to the zoom level, so the higher the scroll speed the more pictures would be displayed on the screen. The use of lower resolution
153
thumbnails ensured the memory usage was minimized. Also lower resolution thumbnails could be read quicker than higher resolution ones. These optimizations were necessary to also ensure smooth and fluid zooming and scrolling. All participants noticed that higher resolution thumbnails were swapped for lower resolution ones. Surprisingly, all participants stated that this did not interfere with their ability to search events, singles or properties. The reason being, at low scroll speeds high quality thumbnails were loaded so users could view as much detail as they wished, while at high scroll speeds participants were not really looking at the pictures as they were moving too fast to make out what they were and also primary goal was to move quickly to the target location. What was interesting was that participants were more forgiving as the device was a PDA.
8.3 Properties Primarily providing fast access to events involves providing fast access to events and also providing the necessary support needed to locate photographs. However, it is also necessary to support the two approaches we found that are used to located properties, namely the structured approach and the sporadic approach. With the structured approach the jumps from one likely event to the next are usually small. With the sporadic approach, the length between events is much more variable. The challenge is in designing a control widget that provides the correct level of granularity for each approach. Also when jumping between events, it is essential that the interface “snaps” to the beginning of the event. Photo searching interfaces should support the entire process of selecting, reviewing, filtering and grouping photographs. The interface should also support navigation between these groups and the original event folders to allow you to review the current grouping and flip back to an appropriate event folder. For each selected photograph users should be able to flip back to the original event to get some context.
8. DISCUSSION 8.1 Events From Rodden and Wood’s study, to our follow-up study it is evident that locating events is absolutely the most important photo searching task. Other forms of searching are also based on the event: locating singles involves first locating the encapsulating event; likewise, locating properties involves locating the most likely events and going through each of these events to locate matching photographs. Therefore, when designing photo searching interfaces the main goal should be to provide rapid access to events. This task is can be made easier by using the event folder names. An obvious solution to providing rapid access to events simply involves providing rapid access to these event folders. If an event name can be remembered accurately, the photo searching interface should provides instantaneous access to it using keyword or letter searches. If an event can be placed accurately in time, then the photo searching interface should provide rapid navigation to this temporal location. Both searching mechanisms should be integrated seamlessly, allowing the user to swap between the two as needed. We also found that the position of the target event was continuously reevaluated or revalidated as more information become available. To this end, photo searching interfaces should also support this continual refinement and also allow people to backtrack if necessary.
For the sporadic approach, it is important to maintain a search history of which events have been visited and also how recently. This will prevent you from accidentally visiting the same location twice. It will also provide an indication of which areas of the collection are yet to be explored.
8.4 Pre-organized photo collection structures Perhaps the most significant, yet obvious finding is that people organize their photographs into events. This is supported by the fact that all of our participants organized their photographs into event folders. This finding is particularly interesting given the fact that many previous studies have noted that people are not willing to annotate or put effort into organizing their photo collections. In fact, not only are events given meaningful names, relationships between events are often encoded by organizing an event into the main event and sub-events. We also found overwhelming evidence showing property folders: arbitrary photo groupings, working directories, export groupings, picture grouping and event groupings. The diverse nature of these property folders clearly indicates the ranges of different tasks that people are likely to perform. It also tells us that people organize their photo collections much more than previous literature suggests. Therefore, it is essential that photo searching interfaces provide mechanisms to exploit this information and organization
8.2 Singles Primarily providing fast access to individual photographs involves providing fast access to the encompassing event. It is equally important however to provide the support that is required to “pick out” target photographs once the event folder has been found.
As stated above, we found that events were organized hierarchically into main events and sub-events. Main events were given more generic names that served as an overview for subevents. Sub-events were given more descriptive names to allow you to distinguish them. There were far fewer main events than sub-events making it easier to remember main events and their temporal position in relation to one another. Selecting one of these main events allowed you to filter off irrelevant information. Photo browsing interfaces should seek to exploit these hard-coded structures and should make the transitions as fluid as possible.
To this end, we found that it is essential for photographs to be ordered temporally. This provides a dual function, in that it allowed users to discern the displacement and direction of other photographs. We also found that it is important to provide an overview of the event to allow photographs with distinct features to be picked out quickly and efficiently. For photographs with less distinct features it is vital for the interface to allow users to quickly sample the images and then quickly return to the overview to allow to you search for another potential location. As the amount of data needed to sample an image may vary on each search request, it is essential for users to be able to rapidly zoom in or out to obtain the level of detail that is required. Both AutoZoom and GestureZoom meet this criterion in a highly effective way.
We found that some of the property folders, such as picture grouping or export groupings, were created as a result of a photo searching task. By creating a property folders participant ensured that subsequent similar requests could be processed quickly.
154
sequence. A linear structure also makes it easier to map spatial locations as each photograph, or event, has a specific location. With a thumbnail grid browser it is possible that an event is split on one row, or photographs from more than one event share a row. By including zooming capability, a linear arrangement can also provide an overview of an event giving it all the best features of a thumbnail browser. It also makes it easier to support the type of navigation that is required to locate photographs, moving backward and forwards though the collection and also zooming in and out to compare photographs.
8.5 Temporal Photo Organization Temporal navigation is particularly important when event names cannot be remembered. It becomes increasingly important for large collections where it is more difficult to recall event names accurately. While it easy to remember recent events, it becomes increasing difficult to recall older events. When events are arranged temporally users can discern when an event occurs in relation to each other (e.g. is it before or after) and also get a sense of how far one event might be from another. These cues are useful for indicating the direction in which to travel and also give a sense of speed and distance that can be covered before needing to sample the data to re-orientate temporally again.
When compared to a hierarchical browser, it does away with having to click in and out of folders and allows users to navigate seamlessly between folders. It does this by flattening the hierarchical structure. The hierarchical organization can still be superimposed on this arrangement. The white space on either side of the photo strip can also be used to provide further navigation aids. Furthermore, having a linear arrangement makes it easier to support multiple input devices as navigation is constrained to a single dimension.
The time dimension is particularly useful as most events occur sequentially in time. This means that at any point in time there will usually only be one event. For example, the same cannot be said of the location dimension as it is possible to make multiple visits to a single location. The temporal dimension provides an absolute dimension on which events can always be distinguished. Photo searching interfaces should support this form of navigation as the data for this dimension is readily available for digital photographs,
For these, on a PDA we advocate for the use of a linear photo arrangement.
Rodden and Wood [13] found that it was easier for people to remember when an event occurred relative to another than its absolute date and time. Although our study also confirms this result, we found that remembering where an event occurred in relation to another was not always easy. In fact, when events were poorly remembered, it would sometimes require a great deal of mental effort. We found alternative strategies would sometimes by deployed. For example, if there was a large temporal distance to the target, it was much easier for participants to think when the most likely time period would be and move to that location than to discern how each event along the way is related temporally to the target event. Even better, if the spatial location in the list was remembered, participants found it far easier to just move to that location as it required minimal thought processing. However, when searching close to the target location, the temporal relationships between events became increasing important as it became more difficult to think of more precise time periods or spatial locations. Photo searching interfaces must look at ways of supporting this interplay between searching strategies.
8.7 Thumbnail Sizes and Quality In any photo searching interfaces, users should be able to view and compare photographs at any zoom level. It is difficult to predict how much detail is required for each photo searching tasks. Indeed, different photo searching tasks require varying levels of detail. Users should be given the capability to adjust the zoom level as required to complete the task. In our experiments, participants agreed that photographs are not really useful when they are minutely small. However, determining what the minimum cap should be is very difficult. For example, even when a picture is 2 or 3 pixels in width, colors can still be used to distinguish events with contrasting colors. Similarly, determining a limit for the maximum size is also difficult. For example, for very similar photographs, a lot of detail might be required to select the better picture. What is essential is that users are able to select a suitable zoom level and are able to easily compare and contrast one or more photographs at any zoom level. As a general rule of thumb, thumbnails can be replaced by lower resolution versions as long as the level of detail is sufficient for the photo searching task when required. For example, with our interface, when participants were scrolling slowly, high resolution thumbnails were required; whereas when they were scrolling quickly, lower resolution thumbnails were adequate as their primary goal in this case was moving quickly towards the target location. This finding is particularly important for rapid movement and memory optimizations.
If temporal navigation is to be supported in a photo searching interface it is critical that events and photographs are placed in the order in which they are captured, with the correct date and time. Another problem with organizing photographs temporally is that property folders are broken and scattered across the photo collection. This can result in duplicate copies of images. It does not make sense to include property folders in a temporal arrangement as their temporal relationship to events folders is not useful. However, within each property folder, the temporal arrangement of events and photographs is important for photo searching. Property folders should be viewed as shortcuts to special collections of photographs or even playlists if we use the iPhoto metaphor.
8.8 Scroll and Zoom Interaction Techniques Thumbnail grid browsers work well on a desktop computers as they make use of the human visual system’s ability to process hundreds of images quickly. Thumbnail grid browsers do not work as well on a PDA as they are unable to fully exploit the capabilities of the human visual system. In comparison to desktop computers, only a few images can be displayed on a PDAs screen at a given time. It is worth exploiting the human visual system given that about half the cerebral cortex is dedicated to image
8.6 Linear arrangement While a linear arrangement is less efficient at using screen space, its simplicity makes it easier to map onto time. Having a linear structure makes it easier to read events as events occur in
155
control widgets with varying levels of granularity to reduce the effort in navigating varying distances and also avoid making repetitive tasks tedious. While searches are being conducted it is vital to keep track of which events, singles and properties were accessed most recently and most frequently. For repetitive photo searching tasks it is essential that that hardwired “one click” access is supported to provide instant access. This should be done both manually and automatically using tracking information.
processing [15]. The challenge for photo searching interfaces on PDAs is how to make full use of this inborn ability to process images quickly. In other words, how to design photo searching interfaces that allow you to skim over thousands of photographs in a matter of seconds, enabling you to scheme over irrelevant ones and providing you instantaneous access to the ones you want. In doing so, photo searching interfaces should ensure that images are presented to the user at an optimal rate at any level of detail, to ensure images are processed as quickly as possible, as required by the search task. What’s more, the photo searching interface should provide simple and intuitive control that can be used to adjust these values.
In this paper we have advocated the use of several different methods to locate photographs. For example, supporting spatial and temporal navigation and also letter and keyword searches. We found evidence to show that multiple methods are sometimes used for a single photo searching task. These methods should be combined seamlessly so that people easily flip between different interfaces as required for the task. It is of vital importance that the context for each method is maintained.
Given these requirements, scrollable or paged thumbnail grid browsers are inadequate, as users cannot easily adjust the level of detail. It is also difficult to maintain the optimal scroll or page rate at any level of detail. These values are not static, but need to be adjusted dynamically as required for the photo searching task.
10. REFERENCES
Our previous studies have shown that the AutoZoom and GestureZoom techniques are better suited to photo searching tasks. The AutoZoom technique allows users to maintain that optimal scroll rate at any level of detail. The GestureZoom technique provides sensitive controls that can be used to dynamically adjust the level of detail or the scroll rate. However, the ideal solution would seem to be one that combines the merits of both techniques. This would minimize the effects of dizziness and fatigue and also provide users with maximum control. Both these techniques provide very fine control with a high level of granularity. This makes them suitable navigating small distances. As such, they are not as effective when moving quickly over large distances. As they stand, both techniques and any combination thereof, are not adequate for supporting photo searching tasks. In returning to our requirements above, their strength is in helping users locate target photographs when a location close to target has been found. They still need to be integrated with other techniques that allow you to rapidly skim over irrelevant photographs.
[1] ACDSee [2] [3] [4]
[5] [6] [7]
9. CONCLUSION
[8]
In this paper we have exposed many issues that need to be considered when designing photo searching interfaces. Perhaps the most significant finding is the use of pre-organized photo collection structures in providing context for photo searching. We found that event folder labels encode the context that is important to the user at search time. Therefore, photo searching interfaces should exploit and expose this context.
[9] [10]
We found that locating events is central to all photo searching tasks. This task is made easier given the fact that people organize their photographs into event folders. Locating an event is dependent on how well the folder name is remembered. If the name can be remembered accurately, letter and keyword searches should be used to provide instantaneous access. If the date is remembered, then a calendar interface should provide rapid access to the temporal location. When events cannot be remembered well, people are likely to think in terms of time frames. If the spatial location is known, people prefer to browse spatially than temporally as it requires less cognitive effort.
[11]
Finding events alone is insufficient. It is also necessary to provide the support necessary to locate, select, compare, filter and group photographs. In addition to this it is essential that to provide
[15]
[12] [13] [14]
156
http://www.acdsystems.com/english/products/acdsee/acdseenode.htm. 2001. Apple iPhoto. http://www.apple.com/iphoto 10. 2002. Aris, A., Gemmell, J., and Lueder, R. Exploiting Location and Time for Photo Search and Storytelling in MyLifeBits. Microsoft Research Technical Report, MSR-TR-2004-12. Cockburn, A., and Savage, J. Comparing Speed-Dependent Automatic Zooming with Traditional. Scroll, Pan, and Zoom Methods. In People and Computers XVII: BCSC on HCI, Bath, England, 2003, pp. 87-102. Derthick, M. Interfaces for Palmtop Image Search. Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, (2002), 340341. Frolich, D., Kuchinsky, A., Pering, C., Don, A., Ariss, S. Requirements for photoware. Proc. CSCW 2002, New Orleans, ACM Press (2002) 166-175. Graham, A., Garcia-Molina, H., Paepcke, A., and Winograd. T. Time as Essence for Photo Browsing through Personal Digital Collections. In Proc JCDL, ACM Press, 2000. Khella, A., and Bederson, B.B. Pocket PhotoMesa: A Zooming Image Browser for PDA's. In Proc. of Mobile and Ubiquitous Multimedia (MUM 2004), ACM Press, p.19-24. HCIL-2004-20 , CS-TR-4654. Kindberg, T., Spasojevic, M., Fleck, R., & Sellen, A. The Ubiquitous Camera: An In-Depth Study of Camera Phone Use. IEEE Pervasive Computing, 4(2), 42-50. 2005. Koskinen, I., Kurvinen, E., and Lehtonen, T. The Mobile Image. IT Press, 1-135, 2002. Patel, D., Marsden, G., Jones, S, and Jones, M. An evaluation of image browsing schemes for small screen devices. 6th International Symposium on Mobile Devices and Services (Mobile HCI 2004). Springer LNCS. 2004. Platt, J C., Czerwinski, M., and Field, B. PhotoTOC: Automatic Clustering for Browsing Personal Photographs. Microsoft Research Technical Report MSR-TR-2002. 2002. Rodden, K., and Wood, K. (2003). How do people manage their digital photographs? In Proc. of Human Factors in Computing Systems, CHI (2003), 409-416, ACM Press. Rodden, K. Evaluating similarity-based visualisations as interfaces for image browsing. Technical report, UCAM-CL-TR-543, ISSN 1476-2986, Univeristy of Cambridge, 2002. Zeki, S. A Vision of the Brain. Oxford. Blackwell Scientific Publications, 1993.