A Case Study of 3D Stereoscopic vs. 2D Monoscopic Tele-reality in Real-time Dexterous Teleoperation Wai-keung Fung, Wang-tai Lo and Yun-hui Liu
Networked Sensors and Robotics Laboratory Department of Automation and Computer-Aided Engineering The Chinese University of Hong Kong Shatin, N. T., Hong Kong {wkfung,wtlo,yhliu}@acae.cuhk.edu.hk
Abstract— This paper reports a case study of using single 3D stereoscopic visual feedback for real-time teleoperation of dexterous tasks. In traditional teleoperation systems, real-time visual feedbacks of multiple monoscopic views of the robot workspace are provided for remote operator. However, it is difficult for the operator to control remote robot to perform dexterous tasks by looking at multiple video feedbacks at the same time. During teleoperation, remote operators usually find multiple 2D visual feedbacks confusing, especially when performing dexterous tasks that require accurate positioning and orientating of robot end-effectors. In this paper, we propose to provide single real-time 3D stereoscopic visual feedback for remote operators so that they perceive remote robot workspace with the sense of depth. This sense of 3D empowers remote operators to accurately position and orient robot end-effector with confidence. Experiments have been conducted to reveal the usefulness of real-time 3D stereoscopic video feedback over multiple monoscopic video feedback in real-time teleoperation. Index Terms— Teleoperation, 3D Stereoscopy, Passive Stereo, Tele-reality
I. Introduction Teleoperation and networked robotics have become one of the fastest developing areas in robotics, control and automation areas for nearly a decade [1][2]. Since the 80’s, the rapid development of computer networks and information technology has still been giving major thrust to the breakdown of the geographical limitations that operators or controllers should be situated in the vicinity of robots. In a teleoperation system, operators and robots are connected via various kind of communication channels, especially the Internet for its ubiquity, low cost and high bandwidth [3][4][5] and World Wide Web technology [6]. However, one of the biggest challenges in teleoperation research is how to solve various kinds of performance hits of teleoperation systems, including instability, desynchronization and efficiency, caused by poor network quality. In particular, random time delay inherited in the Internet communication is well known as This research is partially supported under the Hong Kong RGC Grant CUHK4127/01E and the National Science Foundation of China Grant N_CUHK404/01 and 60131160741.
Ning Xi
Robotics and Automation Laboratory Department of Electrical and Computer Engineering Michigan State University East Lansing, MI 48824, USA
[email protected]
one of the prominent causes for teleoperation system performance degradation. Due to the time-varying network load and changing routing path, the transmission time of data packets over the Internet from one point to another is random and therefore cannot be predicted accurately. The effects of network delay in the Internet on teleoperation system stability and synchronization are further amplified when routing path between operator’s and robot’s sites is large. In order to remedy the deterioration of teleoperation system performance, A non-time based motion reference has been introduced in the control loop of teleoperation system via the Internet so that the stability of the system is insensitive to time delay in closed loop due to network delay and the sensor feedbacks in the system are synchronized [7]. Moreover, several efficiency improving strategies have been proposed for teleoperation systems under poor network quality. In the control aspect, a controller gains adaptation scheme has been developed to place gains within efficiency region in the gain plane according to network delay using quadratic programming so that fast system response and small overshoots are achieved under poor network quality [8]. In the planning aspect, a command negotiator has been proposed to regulate delayed operator’s command, which is conflicting to current robot states, under poor network quality [9]. A task driven dynamic bandwidth allocation scheme has also been developed, in the resource allocation aspect, to provide maximally available bandwidths for various data streams during teleoperation [10]. Although several methods have been proposed to maintain system performance and efficiency under poor network quality, remote operator sometimes fails to perform dexterous manipulation tasks in real-time teleoperation, especially those tasks that require accurate and swift positioning and orienting of robot end-effectors, like pick-and-place tasks and obstacles avoidance. Tele-reality technology [11], which achieve virtual presence in real scenes, brings remote real and existing scenes (robot workspace) to local operator site in real-time teleoperation systems. In conventional teleoperation systems, multiple real-time monoscopic visual feedbacks of the remote robot
workspace are provided for remote operators for accurate positioning robot end-effector. However, operators usually have difficulties in accurately and dexterously controlling remote robots by looking at multiple video feedbacks simultaneously as conflicting perceptions may be obtained from the streamed video feedbacks. In order to get rid of the confusing interpretation of remote robot workspace derived from multiple real-time visual feedbacks, we propose to provide single real-time 3D stereoscopic video feedback for remote operators so that they have sole 3D interpretation about the remote robot workspace. Section II investigates the characteristics of monoscopic and stereoscopic visual feedbacks and compares their influences on system performances of real-time dexterous teleoperation. On the other hand, the working principle of 3D stereoscopy is described in Section III. The techniques and equipments for generation and display of 3D stereoscopy visual feedback are also presented in this section. Experimental studies are reported in Section IV and Section V concludes the paper. II. Monoscopic and Stereoscopic Visual Feedbacks Teleoperating robots for dexterous tasks have been attempted in recent years [3]. Since the task dexterity requirements for such tasks are so high that it is also difficult for local operator to accomplish the tasks. In specifics, high accuracy and swiftness in positioning and orienting robot end-effectors is demanded in real-time dexterous tasks completion. In order to allow remote operator to position and orient robot end-effector accurately and swiftly, multiple realtime monoscopic (2D) visual feedbacks, which display the robot workspace, are transmitted to the operator. Different perspectives of the remote robot workspace are sent to the operator at the same time so as to facilitate the operator in positioning and orienting the robot end-effector. Usually, three real-time video streams, that show the front, side and top views of the remote robot workspace, are sent to the operator. Figure 1 depicts a typical scenario of multiple 2D video feedbacks for real-time teleoperation experiment. Remote operator infers or guesses 3D relations among objects and robot in the workspace by 2D images only, which only capture projections of 3D workspace under certain perspectives. In other words, part of the 3D information is lost, which is important for operator in position judgment. For example, suppose we look at objects A and B in an image, where A is occluded by B. Based on our experience and (maybe) other visual cues in the image, our brain conclude that A is behind B, but we do not how far A is behind B. Multiple 2D visual feedbacks are transmitted to operators so that they may find/infer/guess the missing information from other visual feedbacks showing different perspectives of the robot workspace. However, in fact, multiple visual feedbacks
sometimes do not help much in end-effector positioning for real-time dexterous teleoperation. Operators may find, in some cases, multiple 2D video feedbacks convey conflicting perception of the 3D spatial relationship among objects in the robot workspace. This is because it is difficult to provide mutually orthogonal (front, side and top) views for remote operator to position and orient robot end-effector accurately in 3D space for dexterous tasks, like obstacles avoidance and pick-and-place tasks. The judgment of the relative positions of the end-effector and objects in workspace is especially hard in the “depth” direction, ie. the direction that normal to the screen.
Fig. 1. Multiple 2D Visual Feed- Fig. 2. Simulated Effect of 3D backs (Front, Top, and Side Views). Stereoscopic Visual Feedback.
In order to overcome the aforementioned disadvantages of multiple monoscopic visual feedbacks in conventional teleoperation systems, we propose to provide single realtime 3D stereoscopic visual feedback for remote operators so that they can directly perceive the 3D structure of the robot workspace, including the relative positions among the objects (obstacles) in the workspace and the robot end-effector. Since it is difficult to demonstrate the effect of 3D stereoscopic image using ordinary printing techniques, a simulated effect is constructed in Figure 2. As shown in Figure 2, objects displayed in 3D stereoscopic fashion may be perceived to be in front of or behind the screen. Therefore, 3D positions of objects can be distinguished clearly not only in dimensions parallel to the screen, but also that is normal to the screen in 3D stereoscopic view. In other words, depth information is directly perceived from 3D stereoscopic visual feedback, instead of inferring from images in other perspectives as in the case of multiple monoscopic visual feedbacks. Remote operator can now confidently position and orient robot end-effector accurately and swiftly without colliding with obstacles as 3D positions of all objects in the workspace are directly “seen” by the operator. No more inferring is required. The single 3D stereoscopic visual feedback can eliminate the confusion experienced by human operator when he/she perceives remote robot site with multiple 2D views during real-time teleoperation. Dexterous tasks can then be completed successfully in real-time teleoperation. The overall efficiency of the system is also improved. In additions, this also decreases the bandwidth for real-time video feedback as
only one image sequence is transmitted to remote operator, instead of multiple visual feedback streams. Multi-operators multi-robots teleoperation systems [12][13] are particularly benefited in this aspect. III. 3D Stereoscopic Visual Feedback This section briefly describes the fundamental principle of 3D stereoscopy and how to produce and display 3D stereoscopic images or video in low cost for ordinary laboratories. A. Principle of 3D Stereoscopy 3D Stereoscopy is defined as the process to re-create the illusion of depth by looking at two slightly different perspectives of a scene with two eyes. Human has two eyes that are displaced on average by 6cm. Specifically, this is the distance between the centers of the left and right eyeballs. When we look at any object, two slightly different of images of the object are projected independently onto left and right retinas. The parallax between the left and right images on retinas is due to the different viewing perspectives of the two eyes caused by eye separation. Figure 3 illustrates the concept. It is well known that depth information can be obtained from stereo pair of images. Geometrically, binocular parallax exists in corresponding image points in a stereo pair is proportional to the depth of the corresponding object point in real world. Our brain processes the left and right images to infer 3D structure of the scene we look at. Based on this principle, we can “artificially” perceive 3D stereoscopic views by presenting any stereo image pair to our eyes. Our brain goes through the same depth extraction processes and tries to infer 3D structures of the scene captured by the stereo pair, as shown in Figure 3.
Fig. 3.
Depth Illusion by Stereopsis.
B. Production Real-time 3D stereoscopic visual feedback for teleoperation systems is produced by making use of human depth perception capability based on the aforementioned principle of 3D stereoscopy. The required equipments include a stereo camera system and a PC with two frame-grabbers installed.
Figure 4 shows a stereo camera system, which is for capturing real-time video of robot workspace. The stereo camera system comprises two CCD cameras mounted on a pan-tilt stand. The cameras are separated by 6cm, which is the average human eyeball separation. The two cameras mimic human eyes for depth perception. The two frame-grabbers capture live image sequences from the stereo cameras and the image sequences are ready for transmission to remote operators. The higher the resolution of the captured images, the better the quality of the 3D stereoscopic visual feedback. Since network bandwidth allowed for real-time teleoperation is usually limited, video streams for real-time visual feedback, in most cases flood the communication channel and thus introduce large response latency and unsmooth responses in teleoperation. In order to save network bandwidth, the 3D stereoscopic visual feedback stream is preprocessed and encoded before sending to remote operator. Each stereoscopic frame is composed of information from the corresponding left and right frames captured by the stereo camera system. In each stereo frame, we interleave the left and right frames to form a single frame only by sacrificing part of the resolution (image quality) from the stereo image pair. In other words, the stereo frame is generated by taking odd rows only from the left frame and even rows only from the right frame. The interleaved stereo frames are then encoded to MPEG-2 format using the libavcodec library provided by FFMpeg Multimedia System software [14]. Each MPEG-2 encoded frame is then sent to remote operator via the network. Figure 4 illustrates the process in real-time 3D stereoscopic visual feedback production at the robot side C. Display Figure 5 illustrates the flowchart in displaying real-time 3D stereoscopic visual feedback at the operator side of teleoperation systems. At the operator side, the received 3D stereoscopic MPEG-2 stream is first decoded using the libavcodec library by FFMpeg Multimedia System software [14]. Each decoded frame is then separated back to left and right frames, as shown in the “Stereo Frame Separation” block in Figure 5. Since each stereo frame is generated by taking odd rows only from left frame and even rows only from right frame, even rows are missing in separated left frame and odd rows are missing in separated right frame. The loss of image quality of the decoded and separated left and right frames exists for saving network bandwidth for other data streams, like data streams for operator’s command and force feedback, in real-time teleoperation systems. Linear interpolation is exploited to recover the missing rows in the separated left and right frames, which is depicted in the “Intra-frame Interpolation” block in Figure 5. At this stage, the left and right frames are recovered at the operator side and passed to stereo projector system. A Passive Stereo Projection system is employed in our study due to its high performance-
Fig. 4.
Production of Real-time 3D Stereoscopic Visual Feedback at the Robot Side.
Fig. 5.
Display of Real-time 3D Stereoscopic Visual Feedback at the Operator Side.
to-cost ratio. Our pass stereo projection system consists of two DLPTM projector with high lumen (1500 ANSI Lumen) (as shown in Figure 5) and a graphics workstation with dual VGA output. We have developed a display software for sending the left and right frames independently to the two projectors. In order to direct the left image to operator’s left eye and right image to operator’s right eye, we exploit light polarization property in separating the left and right projected views. Linear polarizers are mounted in front of the lens of projectors and the polarizers are aligned in 90 out of phase. The images are projected on a non-depolarizing screen (as shown in Figure 5) so that the polarization of reflected light does not change. Operator needs to wear a pair of polarizing glasses (as shown in Figure 5), which is aligned with polarizers mount on the projectors) to perceive the real-time 3D stereoscopic visual feedback captured by the stereo camera system at the robot side (refer to Section III-B. IV. Experimental Studies Our experimental setup consists of a five-fingered robotic arm system, which includes five 3DOF direct drive fingers, five 6DOF force/torque sensors mounted at each finger tip and a Mitsubishi PA-10 7DOF robotic arm, in total 22DOF in the robot systems. The multi-fingered hand system can be
Fig. 6.
Block Diagram of Teleoperation System.
controlled in real-time by remote operator via the Internet using force feedback joystick. Figure 6 depicts the block diagram of our teleoperation system with real-time 3D stereoscopic visual feedback. The task that the operator is requested to perform pick-andplace task with obstacles avoidance. Figure 7 shows the robot workspace. The operator teleoperates the multi-fingered hand
[15] for the whole process of one of the successful trial. It is worthwhile to point out that the perceived depth in 3D stereoscopic visual feedback is not equal, but proportional to the actual depth. Careful calibration of the stereo cameras separation, distance between cameras and objects in robot workspace, observer position and viewing direction relative to the screen, etc. are necessary to have the perceived and actual depths being equal. However, different observers perceived different depths even under the same conditions. Extensive studies have been conducted to investigate the psychological effect of perceived depth in stereoscopic viewing on various factors [16][17] and even for moving objects [18]. Fig. 7. The requested pick-and-place task. Task states are marked to be the same as the labels in Figure 10.
system to first grasp the white cylinder out of the stand and then pass it around the three obstacle balls and finally insert back to the cylinder stand as shown in Figure 7. The inner and outer diameters of the white cylinder are respectively 4.2cm and 8cm. The diameters of the obstacle balls are 9cm and 10cm with separation of 10cm. There are only 2cm tolerance for the white cylinder to pass around the obstacle balls without colliding with them. The obstacle balls are placed, but not fixed on the 11cm tall stand and thus the obstacle balls fall down easily if the operator teleoperate the robot hand (including the cables hanging out the robot fingers) to touch the balls gently. The distance between the white cylinder stand and the center obstacle ball is 30cm. We have conducted several trials to complete the requested task with multiple monoscopic visual feedback and single 3D stereoscopic visual feedback in real-time. Figures 8 and 9 depicts the experimental setup for the case with multiple monoscopic and single stereoscopic visual feedbacks respectively. Three (Front, Top and Side) monoscopic views are fedback to the operator in real-time under mutually orthogonal perspectives. On the other hand, only Front view is provided in single 3D stereoscopic visual feedback. Under the case of multiple monoscopic visual feedbacks, the operator only succeed to accomplish the pick-and-place task with obstacle avoidance in the last trial out of seven trials. The operator push down the obstacle balls in all six previous trials. In the last trial, the operator takes 237 seconds to complete the task. The operator spends the majority of time to re-position the robot hand for avoiding collision with the obstacle balls and grasping and inserting the white cylinder from/to the stand. On the contrary, the operator can accomplish the requested in all three trials, with average task completion time of 94 seconds under single 3D stereoscopic visual feedback. The operator saves more than 60% of time to complete the task with 3D stereoscopic visual feedback. Snapshots during task execution are shown in Figure 10 and the task state labels are defined in the Figure 7. Interested readers may refer to
Fig. 8. Setup for 3 Views for Mono- Fig. 9. Simulated Effect of 3D scopic Visual Feedbacks. Stereoscopic Visual Feedback.
V. Concluding Remarks In conclusions, we have demonstrated the usefulness of exploiting single 3D stereoscopic visual feedback in realtime dexterous teleoperation. The advantages of single 3D stereoscopic visual feedback over multiple monoscopic visual feedback in real-time teleoperation have also been discussed and verified experimentally. In the reported experiments, remote operators failed the requested pick-and-place task using multiple monoscopic (2D) visual feedbacks. The teleoperated multi-fingered hand system always cannot avoid obstacles due to the confusion of 3D positions of robot end-effector generated by multiple visual feedbacks. On the other hand, remote operators achieved the requested pickand-place task swiftly without any collision with obstacles when 3D stereoscopic visual feedback is provided. It is because operators perceive the 3D spatial relationship between robot end-effector and the remote robot workspace directly from stereoscopic visual feedback during teleoperation. In additions, network bandwidth is conserved as only one video stream, instead of multiple streams, is needed to sent to remote operators. This is especially beneficial for the case for real-time teleoperation systems with multi-robots and multioperators. References [1] C. C. M. Meng, P. X. Liu, and M. Rao, “E-Service Robot in Home Healthcare,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2000, pp. 832–837.
(a) Time=6s
(b) Time=21s
(c) Time=31s
(d) Time=33s
(e) Time=40s
(f) Time=53s
(g) Time=60s
(h) Time=87s
Fig. 10.
Snapshots taken during the teleoperation experiment.
[2] A. Kheddar, P. Coiffet, T. Kotoku, and E. Tanie, “Multi-robots Teleoperation – Analysis and Prognosis,” in Proceedings of the 6th IEEE International Workshop on Robot and Human Communication, Sept. 1997, pp. 166–171. [3] M. R. Stein, “Interactive Internet Artistry,” IEEE Robotics and Automation Magazine, vol. 7, no. 2, pp. 28–32, June 2000. [4] C. S. K. Goldberg, S. Gentner and J. Wiegley, “The Mercury Project: A Feasibility Study for Internet Robots,” IEEE Robotics and Automation Magazine, vol. 7, no. 1, pp. 35–40, Mar. 2000. [5] W. R. Hamel and P. Murray, “Observations Concerning Internet-based Teleoperations for Hazardous Environments,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2001, pp. 638–643. [6] D. Schulz, W. Burgard, D. Fox, S. Thrun, and A. B. Cremers, “Web Interfaces for Mobile Robots in Public Places,” IEEE Robotics and Automation Magazine, vol. 7, no. 1, pp. 48–56, 2000. [7] N. Xi and T. J. Tarn, “Stability Analysis of Non-time Referenced Internet-based Telerobotic Systems,” Robotics and Autonomous Systems, vol. 32, pp. 173–178, 2000. [8] W. K. Fung, N. Xi, W. T. Lo, and Y. H. Liu, “QoS based Control of Teleoperation via Internet,” in Proceedings of the 15th IFAC World Congress on Automatic Control, 2002. [9] ——, “Improving Efficiency of Internet based Teleoperation using Network QoS,” in Proceedings of 2002 International Conference on Robotics and Automation, vol. 3, 2002, pp. 2707–2712. [10] W. K. Fung, N. Xi, W. T. Lo, B. H. Song, Y. Sun, and Y. H. Liu, “Task Driven Dynamic QoS based Bandwidth Allocation for Realtime Teleoperation via the Internet,” in Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, Oct. 2003, pp. 1094–1099. [11] R. Szeliski, “Image Mosaicing for Tele-reality Applications,” in IEEE Workshop on Applications of Computer Vision (WACV’94). IEEE Computer Society Press, Dec. 1994, pp. 44–53. [12] W. T. Lo, Y. H. Liu, I. Elhajj, N. Xi, Y. Wang, and T. Fukuda, “Co-
[13]
[14] [15]
[16] [17] [18]
operative Teleoperation of a Multi-robot System with Force Reflection via Internet,” IEEE/ASME Transactions on Mechatronics, 2004, to be published. I. Elhajj, J. Tan, N. Xi, W. K. Fung, Y. H. Liu, T. Kaga, Y. Hasegawa, and T. Fukuda, “Multi-Site Internet-Based Cooperative Control of Robotic Operations,” in Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS’2000, vol. 2, 2000, pp. 826–831. “FFmpeg.” [Online]. Available: http://ffmpeg.sourceforge.net/ W. T. Lo, W. K. Fung, Y. H. Liu, K. C. Hui, N. Xi, and Y. C. Wang, “Real-time Teleoperation via the Internet with 3D Stereoscopic Video Feedback,” in Proceedings of 2004 IEEE International Conference on Robotics and Automation, Apr. 2004, video T. H. Kusada, in Proc. SPIE, vol. 1666, 1992, p. 476. I. Yuyama and M. Okui, “Stereoscopic HDTV,” in Three-Dimensional Television, Video, and Display Technologies, B. Javidi and F. Okano, Eds. Springer-Verlag, 2002, pp. 3–34. S. Shioiri, A. Morinaga, and H. Yaguchi, “Depth Perception of Moving Objects,” in Three-Dimensional Television, Video, and Display Technologies, B. Javidi and F. Okano, Eds. Springer-Verlag, 2002, pp. 397–427.