Sceptre - An Infrared Laser Tracking System for Virtual Environments Christian Wienss Igor Nikitin Gernot Goebbels Klaus Troche Martin Göbel fleXilution GmbH, an IC:IDO company Gottfried-Hagen-Str. 60 D-51105 Köln
[email protected]
Lialia Nikitina
Stefan Müller
Mathematisches Institut Weyertal 86-90 D-50931 Köln
[email protected]
Universität Koblenz-Landau Universitätsstrasse 1 D-56070 Koblenz
[email protected]
ABSTRACT In this paper a 3D tracking system for Virtual Environments is presented which utilizes infrared (IR) laser technology. Invisible laser patterns are projected from the user(s) to the screen via the input device Sceptre or the appending headtracking device. IR-sensible cameras which are placed near the projectors in a backprojection setup recognize the pattern. That way position and orientation of the input devices is reconstructed. The infrared laser is not seen by human eye and therefore does not disturb the immersion.
Categories and Subject Descriptors I.3.1 [Hardware Architecture]: Input devices; I.3.6 [Methodology and Techniques]: Interaction techniques; B.4.2 [Input/Output Devices]: Channels and controllers
the camera in the virtual environment has to move and orient in the same way like the users head. Thus the users feeling of being and acting in the virtual world is upgraded. In most cases, the installations are equipped with optical or electro-magnetic tracking systems. Both have advantages like precision and low latency, but also disadvantages like range, cables and tracking field distortion (electro-magnetic) or the fact that the observing cameras have to be inside the cave (optical tracking). For very large 5-sided caves (7m x 3m x 3m) or even closed 6-sided caves, the usage of a cable-based system is uncomfortable and the high-precision range of the available products is limited. Optical tracking cameras inside the cave and therefore in the field of view of the user would disturb the immersion in the virtual environment, because they do not belong to the scene. The idea is to develop a tracking system which is cable free, light to don and without the need of additional hardware inside the cave.
General Terms Algorithms, Human Factors
Keywords IR-Laser, Tracking, 3D-Reconstruction, Laser pattern
1.
INTRODUCTION
In Virtual Reality (VR) projection systems like caves or powerwalls, the knowledge of the position and the orientation of the user is indispensable for immersion. For example
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. VRST’06, November 1–3, 2006, Limassol, Cyprus. Copyright 2006 ACM 1-59593-321-2/06/0011 ... 5.00.
2. RELATED WORK Laser pointing devices have been used in several approaches, where the laser pointer replaces the functionality of the mouse. The common technique is working with one or several cameras placed near the projectors for the recognition of the laser spots. Some approaches work with red laser [1, 2, 3, 5, 6, 8, 9], which evokes the following problems: • Calibration: The camera and the projector have to be arranged and calibrated for correspondence between the visible red laser point and the cursor. • Latency: If a cursor is displayed on the projection screen, the movement of the laser pointer is straighter than the movement of the cursor and the cursor lags. • Visibility: If no cursor is displayed, the user can have difficulties to recognize the red laser point on red displayed objects. • Tremor: The natural unavoidable movement of the hand is visible on the screen and therefore selection or doubleclicking on small items is difficult.
• Light conditions: Under very bright light conditions like direct sun, the recognition of the laser point by the camera is difficult. • Beamer hotspot(s): With the use of fast black/white cameras, the reflection of the projection lamps in backprojection systems appear similar to laser spots and may be mixed up. Other Systems use infrared laser techniques [4, 7], which solve some of the above mentioned problems. Since the laser spot is invisible to the user, a cursor must be displayed by the system. The user is unaware of calibration errors, since the cursor doesn’t need to correspond exactly with the laser spot. Furthermore, the latency is not as apparent as with a visible laser. Thus tremor effects can be also reduced, because the position of the cursor can be smoothed over the real laser positions. The difficulties with light conditions and the beamer hotspot(s) remain. The selection actions are performed variably in the presented approaches. They can be split in two main branches: The first is transmitting the selection information via other channels like unfocused infrared or radio. The second possibility is to use the laser itself. The clicking information is communicated with interruptions of the beam or variations of the projected pattern [7]. In this approach of Matvejev and G¨ obel, three spots are emitted to the screen. The user is able to change the angle between one of the spots and the other two fixed beams. Thus the system can recognize two clicking operations, dependent of the enlargement or diminishment of the angle. Since the triangle shaped by the three points is not isosceles, a rotation of the selected objects around the z-axis can be performed. By calibrating the Stylus to a certain distance, a measurement of the distance to the screen can be applied. Considering the presented approaches, the decision was made to use an infrared laser, because a visible laser spot pattern would disturb the users immersion. Additionally, the authors decided to stick to one communication channel and transmit button clicks via laser interruption. The changing of the pattern topology for button click transmission like in [7] would be too complex and costly.
3.
Figure 1: Schematic hardware setup with the Sceptre, one headtracking device mounted on INFITEC glasses, two projectors and one camera. For illustration, the laser beams and spots are pictured in red, though they are invisible infrared.
Figure 2: The prototypes of the Sceptre tracking device and the headtracking device mounted on INFITEC glasses. The spectacle lenses are taken out for mono projections, though the tracking system is designed for stereo systems as well.
HARDWARE
The main approach in this paper is the use of laser spots for the tracking of the device, not only for pointing on the screen. The idea was to project an individual pattern for each input device, separate them in the computational process and reconstruct the position and orientation of the devices. In a backprojection system, the camera is positioned on the same side of the screen as the projector (Fig. 1). The monochrome camera provides a standard personal computer with 29 interlaced pictures per second, which are recorded by a framegrabber. The personal computer recognizes the points and reconstructs the position of the input devices. The obtained data can for example emulate a Polhemus fastrack device via an RS232 connection (see Fig. 8).
3.1 Sceptre The Sceptre input device is a handheld stick which projects an individual IR laser pattern to the projection wall (Fig. 3). The following components need further explanation: • Laser diode: 3mW, 780nm IR, class IIIb.
Figure 3: The handheld IR laser tracking input device Sceptre with individual pattern mask.
• Microcontroller: Programmable power controller for multiple current interruption. The interruptions are recognized by the computational process and interpreted as different button clicks. • On/off switch: Well discernible visible indicator if the invisible laser is on or off. • Security distance: For eye security reasons at minor laser beam splitting angles, a hard case protection releases the laser not before having reached a save spacing
between the beams. Thus only small harmless parts of the laser can penetrate the human pupil concurrently. • Splitting grating: The laser is divided in 49 subrays of equivalent energy. The angle is optimized for average working distance depending on the VR hardware setup. • Pattern mask: The individual mask absorbs all beams except the ones that are designing the individual pattern.
3.2 Headtracking
Figure 4: Headtracking device with separated power and laser components for weight balance mounted on 3D glasses.
system related to the screen. The z-axis is perpendicular to the screen and directed towards the user. The origin can be anywhere on the screen, the resulting position of the Sceptre will be returned in the same coordinate system. The screen positions should be measured in meters, i.e. conversion from pixels to meters should be already performed. The resulting calibration positions are written to the list: v0[0] = vec3(x0 , y0 , H), v0[1] = vec3(x1 , y1 , H), . . . Then the list is processed by a recognition procedure (recognizeFFr(v0)), described in details in section 4.2. The procedure rearranges the list, putting it in a certain standard order. Then the list is processed by the calibration procedure (calibrate(v0)), which subtracts the x- and y-positions of the point v0[0] (central point) from the x- and y-positions of all points, and divides the obtained vectors by the distance H. Thus the pattern data becomes independent of the position of the origin, the distance to the screen and it represents the structure of the pattern itself. The pattern is then additionally flipped horizontally. This is necessary because the directions of the Sceptre and z-axis are opposite during the calibration. The resulting data can be saved to a file for further usage without repetition of the calibration procedure.
4.2 Recognition In Fig. 4 is pictured how the headtracking device is assembled. The spacing between the pattern mask and the front end of the 3D-glasses serves as additional security distance.
3.3 Security The utilized laser is a 780nm IR laser with 3mW output power. The laser is classified as IIIb in DIN EN 60825-1, which is specified as hazardous. This laser class ranges from 5mW to 500mW. Though a 3mW laser is used, it is not classified as generally harmless IIIa, because the natural lid blink mechanism doesn’t react on invisible infrared laser. Therefore the laser energy is split by a grating (Fig. 3 and 4) in 49 evenly distributed beams. Only 7 rays of the split laser beam pass the individual pattern mask. Dependent on the splitting angle, the security distance is computed. At this distance and the correct pattern mask, the possibility to get more than one subray into the pupil (diameter = 1.9 mm to 3.6 mm) is averted. Thus the eye penetrating energy is reduced to 0.06mW, which is nearly class 1M (e.g. supermarket barcode scanner) and harmless.
4.
ALGORITHMS
A software was developed which recognizes the laser spots on the screen, allocates them to patterns and reconstructs the 3D position of the input devices. The tracking procedure is divided in three steps: 1. Calibration, 2. Recognition and 3. Reconstruction.
4.1 Calibration The first step in the calibration is taking a picture of the empty screen with switched on projectors. This reference image is made because broken pixels and the infrared part of the projectors reflections could be assumed as laser spots. The Sceptre is located at some known distance from the screen (e.g. dist. H=1.5m) and oriented perpendicularly to the screen. In the projected pattern the positions of the points (xi , yi ), i = 1, 2, . . . are measured in the coordinate
Figure 5: Pattern projected by Sceptre and Headtracking device. The two patterns can not be mixed up, because they don’t match in any angle or any rotation. The first step is to recognize the projected patterns on the screen. The algorithm recognizes the overlapping F and Fr (Fig. 5 and 6) patterns for N = 14 points. It is also applicable for 7 ≤ N < 14 points. In this case it tries to recognize at least one of the patterns. The algorithm starts with the recognition of straight lines in the pattern. For this purpose, it lists all triples of points (maximum C314 = 364 variants), finds the largest side of the triangle formed by the triple and marks the opposite vertex as “midpoint”. Then all triangles (or line candidates) are sorted with respect to the following wellness criterion: w=2*(area of triangle)/(largest side squared) +0.01*((medium side)/(smallest side)-1) The first term measures the straightness of the line, the division by the largest side squared is done to make the criterion dimensionless. The second term penalizes the cases when the midpoint is close to one end, which means that the smallest side vanishes. The coefficients are found empirically. From the sorted list the first 20 lines are passed for further processing.
Then it is multiplied on the left to two vectors ei1,2 orthogonal to vi0 , i.e. ei1,2 vi0 = 0: ei1,2 RT vi + ei1,2 r˜0 = 0.
(3)
The vectors can be explicitly constructed as ei1 = (1, 0, −vi0x ), ei2 = (0, 1, −vi0y ).
(4) T
This is a uniform linear system for unknowns R , r˜0 ; i.e. if (RT , r˜0 ) is a solution, λ(RT , r˜0 ) will be also a solution for arbitrary λ. To fix this scaling degree of freedom, a gauge condition was temporarily imposed: r˜0z = 1. Thus, proper scaling is applied after finding the unique solution. By writing the term ei1,2 RT vi in the matrix form Figure 6: The F and Fr patterns are recognized even if they are overlapping. x and y are quoted in scaled screen coordinates.
Then the lines are selected which are forming the topological F and Fr patterns. For this purpose, two lines possessing a common end are found (corner) and the line closing the two open ends of the corner is detected (triangle). Then the line which connects two midpoints in the triangle, one by endpoint, one by midpoint, is searched. Now a F or Fr pattern is recognized. This allows to identify all points in the pattern according to the standard assignment shown in Fig. 5. Then F and Fr patterns are distinguished by their orientation and placed in separate lists of pattern candidates. For each pattern, the wellness is estimated as sum of wellnesses for all lines in the pattern. The algorithm considers all pairs of F and Fr which do not have common points and tries to select the pair with the best sum of wellnesses. If this fails, it tries to select the best F or Fr separately. In the case that this fails, too, it returns 0.
eix 1,2
eiy 1,2
r0 + Rvi0 ti = vi
(1)
where r0 is the position of the Sceptre, R is the 3x3 rotation matrix describing the orientation of the Sceptre. The third column R∗3 is the direction of the Sceptre, the first two columns R∗1 , R∗2 are two normal vectors to the direction. vi0 = (vi0x , vi0y , 1) are the calibrated pattern vectors (Fig. 5), ti > 0 are the ray coefficients and vi = (vix , viy , 0) are the positions of the point on the screen. The index i runs from 0, the center of the pattern, to N − 1. At this stage of the tracking mechanism N is 7, since the F and Fr patterns are already recognized. This is multiplied from the left to RT and the orthonormality condition RT R = 1 is used: vi0 ti = RT vi + r˜0 T
where r˜0 = −R r0 .
(2)
R11 R12 R13
∗ vix ∗ viy , ∗ 0
R21 R22 R23
(5)
we see that R3∗ elements drop out from the equation. Expanding the matrix product, we have a system
x eix 1,2 vi
iy
e1,2 vix
....
y
x eiz 1,2 vi
eix 1,2 vi
∗
R11 R12 R13 R21 R22 R23 r˜0x r˜0y
....
iy
y
e1,2 vi
y
eiz 1,2 vi
.... −eiz = 1,2 ....
eix 1,2
iy
e1,2
(6)
which is symbolically written as
4.3 Reconstruction For the reconstruction of the 3D positions of the input devices, we start with the equation:
eiz 1,2
M x = a.
(7)
Here the matrix M has the size 2N × 8 and represents 2N equations, one per each vector ei1,2 for 8 unknowns (R1∗ , R2∗ , r˜0x,y ). The right hand side of the system comes from r˜0z = 1 contribution. The system is clearly overdetermined and is solved by means of general least square fit, see Numerical Recipes1 (NR) Chapter 15.4. The idea is to find x, which minimizes 2 the total residual χ2 = i (M x − a)i . This leads to the other linear system Cx = b
(8)
with C = M T M , b = M T a. The matrix C is 8x8 square symmetric positively defined. The solution of the system is achieved with the Cholesky algorithm, see NR Chapter 2.9. Though NR Chapter 15.4 advices to use the SVD algorithm, the decision was made to use the Cholesky algorithm, because it is much faster than the SVD algorithm and never 1
www.nr.com
failed during the tests which were performed. The residual χ2 is returned to the user as error estimator. The solution vector s = (x, r˜0z ) is rescaled: s → λs, so that the normalization condition 2 2 2 2 = R11 + R12 + R13 =1 R1∗
(9)
is satisfied. This is one of the orthonormality conditions 2 for R, while for two others: R2∗ − 1 = 0, R1∗ R2∗ = 0 the residual is computed and returned to the user as a second error estimator. In addition, sign correction is done. For this purpose the equation 2 is estimated for i = 0 and its z-component is taken: t0 = (RT vi + r˜0 )z > 0.
(10)
If this condition is violated, the solution vector is taken with the opposite sign: s → −s. After that, Gram-Schmidt orthonormalization is applied to R to ensure its exact orthonormality:
(a) The relation between the interaction distance and the relative reconstruction error. The testing system was initially calibrated to 150 cm.
R2∗ := R2∗ − (R2∗ R1∗ )R1∗ , R2∗ := R2∗ /|R2∗ |, R3∗ := R1∗ × R2∗ .
(11)
Then the vector r0 = −R˜ r0 is computed and the result is returned in matrix form:
R
0
r0
1
(12)
Note: the algorithm is applicable at N ≥ 4, when the system in equation 6 is well defined. For non-planar screens, a similar algorithm can be written. Since in this case R3∗ terms should be also taken into account, the system will be well defined at N ≥ 6.
5.
RESULTS
5.1 Precision The system which is utilized is a 180×130 cm mono backprojection screen which is monitored by one infrared sensitive camera. The camera resolution is 768×494 pixels, which means that 1 pixel scans 2.34×2.63 mm. The precision tests were performed with a very cheap laser diode (about 23 ) with direct battery connection. The usage of a potentiometer-based laser increases the precision but is more expensive. In Fig. 7 (a), the relation between the relative error and the distance to the screen is displayed. It is transparent that the best values are achieved between 100 and 200 cm. At smaller distances, the relation between the point sizes and the pattern is adversarial, because small errors in the computation of the center of the laser spot result in large errors for the reconstruction. If the Sceptre is located very near to the projection screen (≤ 20 cm), the laser spots begin to merge. In the distance range between 200 and 400 cm the reconstruction is still good, but with increasing distance, the points begin to vanish due to the small output energy. In Fig. 7 (b), the relation between the camera resolution and the resulting precision is shown. The best performance-cost ratio is achieved at 0.18 cm per pixel.
(b) The relation between the pixel size and the relative reconstruction error. Figure 7: Precision statistics.
5.2 Delay The system is able to reconstruct 19 times per second (tps) the position of the Sceptre and the headtracking device, including smoothing of the 3D Positions and with the usage of a position prediction algorithm. This can be upgraded to 29 tps for interlaced camera mode with small reduction of the precision. In one computation step executed on a 3GHz P4, the grabbing takes 45.361 ms (non-interlaced), the point-finding algorithm takes 6.672 ms (non-interlaced), the recognition procedure (section 4.2) takes 0.246 ms and the reconstruction algorithm (section 4.3) takes 0.072 ms.
5.3 Usability As pictured in Fig 8(a) and (b), the presented system is working together with Virtual Design 2. To achieve this, a driver was written which emulates the data format and
transmission protocol of a Polhemus Fastrackwith one stylus (position, orientation and one button) and one transmitter (position and orientation).
picture grabbing procedure and the point finding algorithm. The bare recognition of the pattern and the reconstruction of the 3D position can be performed about 3400 times per second.
7. FUTURE WORK The presented results can be advanced and upgraded by the following enhancements:
(a) The Sceptre is controlling the virtual hand.
• By the use of other frame grabbing techniques (e.g. firewire), the grabbing procedure can be highly accelerated. • The current recognition algorithm has problems if the pattern is partially projected to several screens. The additional projection of the same pattern in some other angle (e.g. upwards) or the usage of patterns with more redundant points can solve this problem. • The calibration algorithm for several cameras is not yet implemented. • For the case that the projection screen is not planar but curved (e.g. i-cone), the reconstruction algorithm should be reformulated. • Computation and compensation of distortion of the camera objective have to be integrated for wide-angle-lenses. • Point detection enhancements described in [1] can be included.
8. ACKNOWLEDGEMENTS We would like to thank the following companies: • vrcom GmbH for providing us with the testing license for Virtual Design 2 and support. • Viscon GmbH for hardware support and funding. • mabotic Robotics & Automation for assembling the prototypes and help. (b) The camera is connected to the headtracking device, the Sceptre is controlling the virtual hand. Figure 8: The input devices controlling a scene in Virtual Design 2. The positions and orientations of the input devices are transmitted to the program by emulating a Polhemus Fastrack device.
6.
CONCLUSION
In this paper, a 3D tracking technique is presented and prototypically implemented. The input device Sceptre and the similar working headtracking device project infrared laser spot patterns on the screen. These patterns are recognized and the 3D position of the devices is reconstructed. The input devices are very light and comfortable to wear. No additional hardware like cables or cameras inside the field of view of the user are needed. The error (see Fig. 7 (a)) is completely caused by hardware inaccuracies and is no result of the reconstruction algorithm. With generic values, the algorithm is precise up to 12 post decimal positions. The main responsible factors for the error are the camera resolution, laser energy fluctuation and the distraction of the laser beams by the pattern mask. It is pointed out in Section 5.2, that most of the computation time is consumed by the
9. REFERENCES [1] B. A. Ahlborn, D. Thompson, O. Kreylos, B. Hamann, and O. G. Staadt. A practical system for laser pointer interaction on tiled displays. In Proceedings of ACM Virtual Reality Software and Technology 2005, pages 106–109. ACM, ACM Press, 2005. [2] X. Bi, Y. Shi, X. Chen, and P. Xiang. upen: laser-based, personalized, multi-user interaction on large displays. In H. Zhang, T.-S. Chua, R. Steinmetz, M. S. Kankanhalli, and L. Wilcox, editors, ACM Multimedia, pages 1049–1050. ACM, 2005. [3] D. Cavens, F. Vogt, S. Fels, and M. Meitner. Interacting with the big screen: pointers to ponder. In Proceedings of ACM CHI 2002 Conference on Human Factors in Computing Systems, volume 2 of Interactive Posters, pages 678–679, 2002. [4] K. Cheng and K. Pulo. Direct interaction with large-scale display systems using infrared laser tracking devices. In CRPITS ’24: Proceedings of the Australian symposium on Information visualisation, pages 67–74, 2003. [5] J. Davis and X. Chen. Lumipoint: Multi-user laser-based interaction on large tiled displays. Displays, 23(5), 2002. [6] D. Laberge, J.-F. Lapointe, and E. M. Petriu. An auto-calibrated laser-pointing interface for large screen displays. In DS-RT, pages 190–194, 2003. [7] S. V. Matveyev and M. G¨ obel. The optical tweezers: multiple-point interaction technique. In VRST, pages 184–187, 2003. [8] D. R. Olsen and S. T. Nielsen. Laser pointer interaction. In CHI, pages 17–22, 2001. [9] M. Wissen, M. A. Wischy, and J. Ziegler. Realisierung einer laserbasierten interaktionstechnik f¨ ur projektionsw¨ ande. In ERCIM News, No. 46, pages 31–32, 2001.