We consider the location and recognition of flashlight projections given the image sequence supplied by a fixed camera monitoring a physical surface.
Location and Recognition of Flashlight Projections for Visual Interfaces Jonathan Green, Tony Pridmore, Steve Benford and Ahmed Ghali School of Computer Science & IT, The University of Nottingham, UK {jzg,tpp,sdb,ahg}@cs.nott.ac.uk Abstract We consider the location and recognition of flashlight projections given the image sequence supplied by a fixed camera monitoring a physical surface. A quotient method extracts a description of the flashlight projection that is independent of the reflectance of the illuminated surface. The information recovered is used to recognise individual flashlights and trigger audiovisual events in response to users’ actions. A demonstration application, an interactive poster, is described and directions for future work identified.
1. Introduction Flashlights are interesting devices upon which to build human-computer interaction technologies. They are cheap, robust and fun. They are available to and understood by a sizeable user base: most people are familiar with flashlights and can use them to search for, select and illuminate objects and features of interest in the physical environment. Flashlight-based interfaces should be of interest to and usable by large sections of the public. Indeed, previous trials in which over one hundred members of the public used flashlights to trigger ghostly voices when exploring a series of undeground caves at a museum reported an enthusiastic response [1]. Despite their simplicity, flashlights have the potential to provide surprisingly rich interfaces. The area(s) illuminated and the motion of the flashlight beam across the physical world provide valuable information regarding the user’s interests and intentions. The projection of a flashlight beam onto a physical surface varies considerably with the physical structure, position, and orientation of the flashlight. This raises the possibility of both recognising individual devices and recovering their pose and other properties from images of flashlight projections. Flashlight-based interfaces have a wide variety of potential applications. Flashlights are available in many shapes, sizes, weights and mountings. Tightly focused, hand-held pencil flashlights can be used in the detailed examination of small features and objects. In contrast,
floor mounted searchlights can be used to illuminate large sections of, e.g., buildings. Flashlights are particularly appropriate to situations where visitors explore dark places such as the caves, tunnels, cellars and dungeons that can be found in museums, theme parks and other visitor attractions. They are also suited to interacting with technology outdoors at night. Stronger flashlights can be used in more brightly lit situations, e.g. when interacting with projected graphical displays. In larger spaces it is natural for several flashlights to be used simultaneously. This provides interesting opportunities for group interaction, but may lead to separate projections intersecting and interfering with each other. In what follows we consider the location and recognition of flashlight projections given the image sequence supplied by a fixed camera monitoring a physical surface. The information recovered is used to trigger audiovisual events in response to users’ actions. Related work and earlier versions of the flashlight interaction system are discussed in Section 2. Sections 3 and 4 describe the detection and recognition of flashlight projections respectively while Section 5 presents a demonstration application: an interactive poster. Finally, directions for future work are identified and conclusions drawn in Section 6.
2. Related Work Vision-based interface technologies have attracted increasing interest in recent years. Image analysis techniques have been used to track facial features [2], recognise bodily gestures and movement [3], recover the placement of physical objects on a variety of surfaces [4] and determine the movement of specially tagged objects and clothing [5]. They have also been used to detect and track light sources pointed at interactive surfaces, though to date attention has focused on special purpose devices. Davis et. al. [6] and Olsen et. al. [7] use laser pointers to manipulate graphical objects on large shared displays. Laser pointers are attractive for several reasons. Their projections are well-localised, highly distinctive spots of light that are comparatively easy to detect and track. Laser pointers allow fine pointing and manipulation of objects
and seem a highly appropriate technology for meeting rooms, lecture halls and similar environments. While flashlight projections are unlikely to provide positional information as accurately as laser pointers, they do have a key advantage: the image of a flashlight projection contains information that can be used at the interface. The initial flashlight interface [1] ensured that the flashlight projection was brighter than its surroundings and so could be located by thresholding image intensity. A two-frame tracking algorithm was used to allow young children interacting with an immersive tent-like interface to drag graphical balloons across the sky in a 3D virtual world. A subsequent implementation used image differencing to locate moving flashlight projections, allowing visitors to trigger ghostly voices by pointing the beams at predefined physical targets while exploring a series of underground caves [8,9]. The remainder of the paper describes the current flashlight system. This employs a quotient method to obtain a description of the flashlight projection that is independent of the reflectance of the illuminated surface and goes on to recognise and respond differently to individual flashlights.
eF I −1= T eB IA
(4)
Equation (4) therefore describes a measure of flashlight illumination that is scaled only by the background illumination field. If this remains constant, as it can be expected to do in many applications, we can use eF/eB –1 to both detect and characterise flashlight projections.
a. Patterned surface under background illumination.
3. Locating Flashlight Projections Given an image of a surface of unknown and nonuniform reflectance lit simultaneously by fixed background illumination and one or more user-controlled flashlight beams, our goal is to extract a description of the incident light contributed by the flashlight and use that description to locate flashlight projections. A background image is first captured in which the surface is lit solely by background illumination; no flashlight projections are in view. The light emitted from the viewed surface is then given by
eB = I A R
(1)
where IA is the background light field and R captures the local orientation and reflectance properties of the surface. If the camera is sufficiently far from the viewed surface, image intensity is proportional to eB. A foreground image is then acquired in which a flashlight is also projected onto the viewed surface, giving
eF = ( I A + IT ) R
b. Two flashlights projected onto the lower half of the surface, in addition to the background illumination.
(2)
Assuming once again that image intensity is proportional to eF, standard background image subtraction gives
eF − eB = I T R
(3)
The usual difference image is therefore a function of IT but is scaled by surface reflectance. However, dividing each foreground image pixel by the corresponding background image pixel and rearranging gives
c. eF/eB –1 computed from the images of a and b. Figure 1. Locating flashlights with a quotient operator
Figure 1a and b show background and foreground images respectively of a heavily patterned piece of fabric mounted on a planar board and viewed via a standard, domestic quality webcam. Figure 1c shows the output of the quotient operator, scaled for presentation purposes.
a
b
c. Figure 2. The quotient operator vs background subtraction. See text for details.
thresholding algorithm is currently employed. Figure 3 shows the result of thresholding the image of Figure 1c.
4. Recognising Flashlights Supra threshold responses are clustered and feature vectors created and used to recognise individual flashlights (Figure 4). At present each feature vector contains the values of a twenty bin histogram of the raw response of the quotient operator, computed over one cluster. The current recognition engine is the K Nearest Neighbours algorithm. On initialisation the operator is requested to sweep each flashlight slowly and systematically over the target area. Regularly spaced training images are acquired and a feature vector extracted from each. These vectors are labeled with the identifier of the torch that created them. At run time a feature vector is extracted from the current image and the K nearest neighbours of that feature vector in the training set (using Euclidean distance in the feature space as a metric) are recovered. The new feature vector is assigned the same label as the majority of its K nearest neighbours. In the current implementation K is a parameter, but is usually 20.
Figure 2 illustrates the quotient operator’s independence to reflectance changes. In Figure 2a background subtraction is used to locate a single flashlight projected onto the bottom left of the surface of Figure 1. The underlying pattern is clearly visible. Figure 2b shows the application of the quotient operator to the same data. In Figure 2c the quotient operator is used to locate the same flashlight projected onto a uniform surface. Note the similarity between b and c, and the lack of a visible surface pattern in b. Figure 4. Recognition of the flashlights in Figure 1. Initial experiments with the system showed that, as long as the flashlights chosen produced visibly different projections, recognition was successful in approximately 99% of cases. As the system currently runs at 25fps, however, this still generates an erroneous identifier approximately once per minute. To counteract this, the two-frame tracker of [1] is employed, not to extract motion information, but to allow temporal smoothing of the output of the recognition engine. Figure 3. Supra-threshold regions of Figure 1c.
5. An Interactive Poster
Flashlight projections are detected by thresholding eF/eB –1. To avoid previous reliance [1,8,9] on userdefined values, Rosin’s [10] adaptive unimodal
Figure 5 shows one possible application of a flashlight interface, developed as a testbed for the current system. A wall poster depicting the planets in our solar system was created by a group of primary school children. Each of the
planets was painted, cut out and stuck to a large (planar) black panel. The interactive poster was mounted on the wall and viewed by a ceiling mounted webcam. Following [8,9] a rectangular image area around each of the planets was selected and became a target. Directing a flashlight beam onto a target causes an associated audio clip to be played. Two flashlights were employed, one intended to give a response designed for children and one for adults. Shining the childrens’ flashlight onto a planet triggers a short description of that planet recorded by the children. Shining the adults’ torch at the same target triggers a piece of related music: the various movements of Holzt’s Planets Suite appear, along with e.g. the Beatles’ “Here Comes the Sun”. Figure 5a shows a user directing the two flashlights towards the poster, their projections are clearly visible in Figure 5b.
interface. That system is currently undergoing more formal evaluation. Future extensions will include an enhanced feature vector being fed into a more powerful pattern recognition engine. Other potential improvements include extending the current grey scale operator to colour and extracting other properties of the flashlight. Throughout this work the developing system will periodically be deployed and publicly tested in a museum setting. This is both to evaluate its potential as an individual interface technology and examine ways in which it can be integrated into larger user experiences.
7. References [1] Green, G., Schnadelbach, H., Koleva, B., Benford, S., Pridmore, T., Medina, K., Harris, E., and Smith, H., Camping in the Digital Wilderness: Tents and Flashlights as Interfaces to Virtual Worlds, Proceedings of CHI, 2002. [2] Gorodnichy D.O., Malik S., Roth G., Nouse: Use your Nose as a Mouse - a New Technology for Hands-free Games and Interfaces, 15th Int.l Conf. on Vision Interfaces Calgary, Canada, May 2002. [3] Bradski, G. and J. Davis, Motion Segmentation and Pose Recognition with Motion History Gradients, Machine Vision and Applications, Vol. 13, No. 3, pp. 174-184, 2002.
a.
[4] Rekimoto J., Matushita N., Perceptual Surfaces: Towards a Human and Object Sensitive Interactive Display, Workshop on Perceptual User Interfaces (PUI’97), 1997. [5] Kato, H., M. Billinghurst, I. Poupyrev, K. Imamoto, K. Tachibana, Virtual Object Manipulation on a Table-Top AR Environment, Proceedings of ISAR 2000, Oct 5th-6th, 2000. [6] Olsen D., Nielsen T., Laser Pointer Interaction, Proc CHI 2001, pp17-22, 2001. [7] Davis, J. & Chen, X., Davis, J. & Chen, X., LumiPoint: multi-user location-based interaction on large tiled displays, Displays, Vol. 23-5, Elsevier Science 2002.
b. Figure 5. A user (a) directing the childrens’ and adults’ flashlights at (b) the Interactive Poster
6. Conclusions and Future Work Flashlights have considerable potential as interface devices. Methods have been developed which allow extraction of descriptions of flashlight projections that are independent of the reflectance of the underlying physical surface. Those descriptions have been used to locate and recognise individual flashlights and support a multi-user
[8] Ghali, G., Benford, S., Bayomi, S., Green, G, and Pridmore, T., Visually-tracked Flashlights as Interaction Devices, Proceedings of Interact 2003. [9] Ghali, G., Bayomi, S., Green, G, Pridmore, T., and Benford, S., Vision-mediated Interaction with the Nottingham Caves, Proceedings SPIE Conf. on Computational Imaging, 2003. [10] Rosin, P.L., Unimodal Thresholding, Pattern Recognition, vol. 34, no. 11, pp. 2083-2096, 2001.
Acknowledgements The work reported here was supported by the Engineering and Physical Sciences research Council and the EQUATOR IRC.