A 2D-3D Integrated Interface for Mobile Robot Control Using Omnidirectional Images and 3D Geometric Models Kensaku Saitoh∗
Takashi Machida†
Kiyoshi Kiyokawa‡
Haruo Takemura§
Grad. School of Information Science and Technology,
Cybermedia Center, Osaka University
Cybermedia Center, Osaka University
Cybermedia Center, Osaka University
Osaka University
Grad. School of Information
Grad. School of Information
Grad. School of Information
Science and Technology, Osaka University
Science and Technology, Osaka University
Science and Technology, Osaka University
A BSTRACT This paper proposes a novel visualization and interaction technique for remote surveillance using both 2D and 3D scene data acquired by a mobile robot equipped with an omnidirectional camera and an omnidirectional laser range sensor. In a normal situation, telepresence with an egocentric-view is provided using high resolution omnidirectional live video on a hemispherical screen. As depth information of the remote environment is acquired, additional 3D information can be overlaid onto the 2D video image such as passable area and roughness of the terrain in a manner of video see-through augmented reality. A few functions to interact with the 3D environment through the 2D live video are provided, such as path-drawing and path-preview. Path-drawing function allows to plan a robot’s path by simply specifying 3D points on the path on screen. Pathpreview function provides a realistic image sequence seen from the planned path using a texture-mapped 3D geometric model in a manner of virtualized reality. In addition, a miniaturized 3D model is overlaid on the screen providing an exocentric view, which is a common technique in virtual reality. In this way, our technique allows an operator to recognize the remote place and navigate the robot intuitively by seamlessly using a variety of mixed reality techniques on a spectrum of Milgram’s real-virtual continuum. CR Categories: H.5.2 [Information Systems]: Information Interfaces and Presentation—User Interface; I.3.7 [Computing Methodologies]: Computer Graphics—Three-Dimensional Graphics and Realism; I.4.8 [Computing Methodologies]: Image Processing— Scene Analysis Keywords: remote robot control, omnidirectional image, 3D geometric model 1
I NTRODUCTION
A remote control mobile robot system is useful for a variety types of remote surveillance at unknown places such as disaster investigation and planetary exploration. Providing sufficient information of a remote environment to the operator is a key factor in such a mobile robot control system. For example, a video camera provides an egocentric view of the remote scene, a Global Positioning System (GPS) and a gyro sensor provide position and orientation of the robot. There have been many studies to improve the op∗ e-mail:
[email protected] [email protected] ‡ e-mail:
[email protected] § e-mail:
[email protected] † e-mail:
Panorama image
Background 3D geometric model
Perspective image
Miniatured 3D geometric model
Virtual object
Figure 1: Screenshot of the proposed interface.
erator’s sense of telepresence and to provide rich information of the remote environment. For example, Nagahara et al. [1] used an omnidirectional camera and proposed a nonlinear Field-of-view (FOV) transformation method to present a super wide FOV beyond the display’s visual angle. They showed that the peripheral visual information improves the efficiency of remote robot operation. However, it is often difficult to understand the remote situation from a 2D image even when a wide FOV is provided. Some studies have employed a range sensor to acquire and visualize 3D geometric information of the remote environment. 3D information is useful to perceive accurate geographical situations, such as a distance to an obstacle and 3D structure of the environment. Keskinpala et al. [2] designed a PDA-Based interface that has three screen modes; one mode provides a camera image only, another provides a top view with scanned range data, and the other provides range data overlaid on the camera image. Although helpful, this interface requires frequent mode-switching. Sugimoto et al. [3] proposed a technique to provide a virtual exocentric view by rendering a wireframe robot properly overlaid on past images. However this technique does not allow an arbitrary viewpoint. Ricks et al. [4] used a 3D model of the remote scene rendered from a tethered perspective above and behind the robot. However, 2D video images and a 3D model are not integrated but just shown together. This paper proposes a novel visualization and interaction technique for remote surveillance by integrating 2D and 3D omnidirectional scene data (see Figure 1). Our technique allows an operator to recognize the remote place and navigate the robot intuitively by seamlessly using a variety of mixed reality techniques on a spectrum of Milgram’s real-virtual continuum [5]. Normally, an egocentric view using high-resolution omnidirectional live video is presented on a hemispherical display in a way of telepresence. Additional 3D information can be overlaid onto the live video such as passable area and roughness of the terrain in a manner of video see-through augmented reality (AR). Path-drawing function allows
Range Sensor
Camera Dome Display
Range Data
Electric Wheelchair
Joystick
Hyperboloidal mirror
Libretto U100
Scorpion
Control
Wheel Angles Control Robot PCs Robot System
Omni Images
Presented Images
LMS200
Commands
Wireless LAN
ARS-136-HP
Operation PC
Emu-S
Operation System
Figure 2: Overview of the proposed system.
Figure 3: Robot system.
to plan a robot’s path by simply specifying points on screen. Pathpreview function provides a realistic image sequence seen from the planned path using a texture-mapped 3D model in a manner of virtualized reality. In addition, a miniaturized 3D model is overlaid on screen providing an exocentric view as a World-in-miniature (WIM), a common technique in virtual reality [6].
2
S YSTEM OVEWVIEW
Figure 2 illustrates an overview of the proposed system. Figure 3 shows the mobile robot. A custom-made omnidirectional camera (HOV: Hyper Omni Vision) [7] with a hyperboloidal mirror and a high-resolution camera (Point Grey Research, Scorpion), and a laser range sensor (SICK, LMS200) are mounted on a turn stage (Chuo Precision Industrial, ARS-136-HP and QT-CD1). The omnidirectional image (800×600 at 15Hz, see Figure 4(left)) is appropriately distorted to a panorama image or a standard perspective view in real-time. On the other hand, an omnidirectional range data is acquired by horizontally rotating the line range sensor. The omnidirectional range data is translated as a 3D point cloud, and is further made to a polygonal mesh model (See Figure 4(right)). As it takes about 18 seconds to measure an omnidirectional range data, 3D geometric models are not made in real-time, but sporadically at discrete places whenever the operator issues the command. An electric wheelchair (WACOGIKEN, Emu-S) works as a sensor dolly by lading the above two devices and two laptop PCs (Toshiba, Libretto U100). The wheelchair has two driving wheels in front and two sub wheels at the back. The remote operator can control the wheelchair by sending commands through RS-232C. The wheelchair estimates its own position and orientation in two steps. First, initial estimation is made based on the internal sensors, i.e., odometry and an orientation sensor (InterSense, InertiaCube2). Second, when a new omnidirectional range data is acquired, selftracking accuracy is improved as a result of the ICP registration process [8] between the new data with old ones. In our experience, self-tracking accuracy is about 1% of moving distance in indoor environments and less than 10% of that in outdoor environments. The operator can see the remote environment through a hemispherical dome display system (Matsushita Electric Works, CyberDome, see Figure 5(left)) [9] with a total delay of about 400 ms via a wireless LAN (IEEE802.11g, 54Mbps). CyberDome provides a wide FOV of 140 by 90 degrees in a standard sitting position. The operator sends commands to the robot using a joystick and a throttle (see Figure 5(right)).
Figure 4: An omnidirectional image (left) and a 3D model (right).
3
2D-3D I NTEGRATED I NTERFACE
This section describes a set of interaction techniques in the proposed system. In our techniques, 3D scene data is integrated into 2D live images in a manner of omnidirectional video see-through AR. Note that calibration between the omnidirectional camera and the range sensor is required to integrate 3D information to 2D image. For this purpose, we calculate 3D transformation matrix using the least square method by sampling a set of feature points from an omnidirectional image and a corresponding 3D model in advance. 3.1 Visualization Figure 1 shows a screenshot of the remote control system. The dome screen is divided into three areas. A perspective image area shows a 140-degree FOV front image in actual angular size. As depth information is known, virtual objects such as a 3D pointer can be correctly overlaid onto the perspective image. Figure 6 shows passable area and roughness of the terrain overlaid onto the live video which were analyzed automatically by using spatial frequency analysis of 3D data. A panoramic image area on top shows the rest of rear and side image in a smaller visual angle. These two areas provide an omnidirectional live video image. Thirdly, a texture-mapped miniaturized 3D model (WIM) is optionally overlaid on these images. As the robot moves, the WIM is automatically translated and rotated accordingly so that the CG counterpart of the robot stays in the middle heading upward. These three types of images provide both egocentric and exocentric views of the remote environment at the same time, help the operator perceive the situation intuitively without the need for switching screen modes. 3.2 Interaction In the following, a set of interactions provided by the system are described.
Joystick and Throttle CyberDome Figure 5: Operation system; a dome screen(left), a control device(right).
Figure 9: Path drawing.
3.2.3 Range Sensor Control Figure 6: Analytic overlay.
When the operator presses a dedicated button on the device, the turn table on the robot starts to rotate and an omnidirectional range data is transferred to the operator. While a range data is acquired, the operator cannot move the robot. After data transmission has been finished, a new 3D model is made and shown on the screen. 3.2.4 Path Drawing
Figure 7: Display control.
Path drawing is an intuitive function to plan a robot path on screen [10]. First, the operator selects 2D points on screen by a coneshaped, green cursor, where he/she wants the robot moves through. As depth information is known, selected points have 3D coordinates, yielding a 3D B-Spline curve. The cursor turns red when the position is too low or too high for the robot to move through. A blue cone with an ID number is shown at each selected point. The blue cones (selected points) can be moved or deleted freely. Color of the curve indicates its curvature. Positions of the selected points in the robot’s coordinates are displayed on screen in real-time. Figure 9 shows screenshots of path drawing.
Figure 8: Robot control.
3.2.5 Path Preview 3.2.1 Display Control The perspective and panoramic images as well as the WIM model can be rotated horizontally together by using a rudder of the joystick (see Figure 7(middle)). The WIM model can also be scaled and rotated vertically by joystick buttons (see Figure 7(right)). Four semi-transparent blue triangles are displayed in the middle of the screen while the operator controls the images. These operations have no effect on actual robot’s orientation. That is, the operator can check the surrounding situations without turning the robot. 3.2.2 Robot Control The operator can move the robot by manipulating the joystick toward each of four directions. Then the robot moves forward, backward and turns to the left and to the right accordingly. Moving velocity is controlled by the throttle. Four semi-transparent yellow triangles are displayed in the middle of the screen while the operator controls the robot. Current moving direction is indicated by a large highlighted triangle (see Figure 8).
Path preview is a function to examine the realistic virtual view from the planned path. When path preview has started, live video shown in the perspective image area gradually becomes transparent, and a texture-mapped 3D model rendered from the same viewpoint gradually appears. In this way, a real view is seamlessly changed to a virtual view to maintain the operator’s sense of immersion. After this transition, the viewpoint and the viewing direction are either manually or automatically changed along the planned path until the operator quits the preview mode. If the operator is satisfied with the preview result, then he/she can command the robot to actually follow the path. The robot then controls its driving wheels automatically to move along the planned path and stop at the end. Figure 10 shows some screenshots of previewed images (left column) and real images (right column). We can see that previewed images approximate real images well. 4
E XPERIMENT
We investigated the usefulness of the panoramic image and the WIM in the experiment using a virtual scene representing a remote
Operability
Searchability
5 4.5 4 3.5 3 2.5 2 1.5 1
w/o panorama w/o WIM
with panorama w/o WIM
w/o panorama with WIM
with panorama with WIM
Figure 11: Subjective evaluation.
images only in both robot operation and information gathering. 5
C ONCLUSION
In this paper, we have proposed a 2D-3D integrated interface for mobile robot control. Omnidirectional images provide real-time visual information at the remote environment, while a miniaturized 3D geometric model intuitively shows the robot’s position and orientation, and 3D structure of the remote environment. With depth information of the remote environment, a variety of 3D visualization and interaction techniques are available on 2D live video image in a manner of omnidirectional video see-through AR. Future work includes improvement of 3D model quality and rigorous user studies to validate the effectiveness of the proposed remote control system. R EFERENCES
Figure 10: Preview (left column )and real view (right coloumn).
environment. In this experiment, nine subjects in their early twenties took part in two tasks with four visualization conditions. The first task (operation task) is to move the robot from a start point to a goal as fast as possible. The second task (search task) is to collect items in the environment as many as possible in a limited time. Four conditions consist of combinations of with and without the panoramic image and the WIM. Each of the subjects performed the two tasks with four conditions (totally eight trials) in a randomized order. After the experiment, subjects were asked to evaluate operability and searchability of each visualization condition in a scale of 1 (worst) to 5 (best). Figure 11 shows average scores of the questionnaire. It was confirmed through a one-way ANOVA that the panoramic image improved searchability but not operability. In the operation task, rear and side images were rarely used because forward moving was mostly sufficient to complete the task. In the search task, on the other hand, rear and side information was highly necessary for efficient searching and for backward moving from a dead end. As a drawback of the panoramic image, however, some subjects felt it difficult to estimate the distance to obstacles in rear and side directions. It was also confirmed through a one-way ANOVA that the WIM improved both operability and searchability. It was clear that the 3D map was useful in both tasks to grasp the robot position, distance to obstacles, and the robot’s progress through the environment. Experimental results show that our visualization technique is more useful than traditional visualization techniques with forward
[1] H. Nagahara, Y. Yagi, and M. Yachida. Super Wide View Teleoperation System. In Proc. IEEE Int. Conf. Multisensor Fusion and Integration for Intelligent Systems (MFI2003), pp.149–154, 2003. [2] H. K. Keskinpala, J. A. Adams, and K. Kawamura. Pda-Based Human-Robotic Interface. In Proc. IEEE Conf. on SMC, 2003. [3] M. Sugimoto, G. Kagotani, H. Nii, N. Shiroma, M. Inami, and F. Matsuno. Time Follower’s Vision: A Teleoperation Interface with Past Images. In IEEE CG&A, Vol.25, No.1, pp.54–63, 2005. [4] B. Ricks, C. W. Nielsen, and M. A. Goodrich. Ecological Displays for Robot Interaction: A New Perspective. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004. [5] P. Milgram and F. Kishino. A taxonomy of mixed reality visual display. In IEICE Trans. on Information and Systems, E77-D(12), pp.1321–1329, 1994. [6] R. Stoakley, M. J. Conway, and R. Pausch. Virtual Reality on a WIM: Interactive Worlds in Miniature. In Proc. of SIGCHI ’95, pp.265–272, 1995. [7] K. Yamazawa, Y. Yagi, and M. Yachida. Omnidirectional Image Sensor - Hyper Omni Vision -. In Proc. Int. Conf. on Automation Technology, 1994. [8] P. J. Besl and N. D. McKay. A method for registration of 3-d shapes. In IEEE Trans. PAMI, Vol.14, No.2, pp.239–256, 1992. [9] N. Shibano, P. V. Hareesh, M. Kashiwagi, K. Sawada, and H. Takemura. Development of VR Experiencing System with Hemi-Spherical Immersive Projection Display. In Proc. Int. Display Research Conf./Int. Display Workshops (Asia Display/IDW), 2001. [10] T. Igarashi, R. Kadobayashi, K. Mase, and H. Tanaka. Path Drawing for 3D Walkthrough. In 11th Annual Symposium on User Interface Software and Technology (UIST ’98), pp.173–174, 1998.