Viewing and Reviewing How Humanoids Sensed ... - Semantic Scholar

Viewing and Reviewing How Humanoids Sensed, Planned and Behaved with Mixed Reality Technology Kazuhiko Kobayashi*, Koichi Nishiwaki†, Shinji Uchiyama*, Hiroyuki Yamamoto*, and Satoshi Kagami† Email: {kobayashi.kazuhiko, uchiyama.shinji, yamamoto.hiroyuki125}@canon.co.jp Email: {k.nishiwaki, s.kagami}@aist.go.jp Abstract— How can we see how humanoids sensed, planed, and behaved in the actual environment? A tool for viewing and reviewing the humanoid’s representation in the 3-D space is proposed in this paper. In this tool, the representation is treated as data streams, and distributed and stored to log servers in real-time. For viewing and reviewing in 3-D space, the data is converted from numerical representation into 3-D graphical one. The graphical data is rendered and displayed on a video see-through head mounted display using Mixed Reality technology. Since the information is overlaid at the places related the physical objects in 3-D space, it is easy for humanoid developers to perceive multi sensory information on the actual humanoids. This feature brings us efficient development and debugging. This paper describes implementation of the system with a full size humanoid, HRP-2, and shows some experimental examples.

I. INTRODUCTION A development platform for humanoids was recently made available [1]. This platform enables developers to implement and test each component, such as sensing, perception, planning, and action, independently and to integrate them into a humanoid system. Furthermore, we can analyze them in the real environment. Although the platform is useful, it is difficult to debug a humanoid in real space. This is because the humanoid is not only complicated, but also dangerous for developers. Debugging the system, not in the real environment but in virtual space, could reduce this problem. For this purpose, a dynamics simulator in virtual space, without using a real humanoid, is now popular [2]. It is, however, impossible to simulate the real humanoid and real environment with the virtual simulator, because it is too expensive and difficult to model the variable real world precisely. It is indispensable to check and modify the simulated results for the humanoid development. Although these tools are utilized, current difficulties in * Human Machine Perception Lab, Canon Inc. 30-2, Shimomaruko 3-chome, Ohta-ku, Tokyo 146-8501, Japan † Digital Human Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-41-6, Aomi, Koto-ku, Tokyo, 135-0064, Japan

debugging of actual humanoid are summarized as follows: 1) Binding between data and physical states of humanoid is a complicated task, 2) Extracting problematic data from huge logged data set is time consuming, 3) Overview the huge data set is not useful. Mixed Reality (MR) technology can give us a solution for the problem described above. MR technology has been studied in many research studies [3], and a large number of applications have been discussed [4,5]. By using MR technology, it is possible to overlay what the humanoid robot perceives, plans and controls to the real environment around it and its body. Based on MR technology, Chestnutt et al. overlaid planned footsteps onto the camera images that captured the actual environment [6]. Biped foot placements generated by the footstep planner are shown in the physical ground surface in the captured image. Since the positions of obstacles such as tables or chairs are tracked by a motion capture system, the footstep planner computes adaptive footsteps dynamically in order to avoid the obstacles. Developers can observe the changes of the footsteps on the image captured of the real environment. In principle, any type of sensed and planned data can be projected on the related physical position. This feature is very useful for humanoid debugging since we can intuitively perceive the data. The proposed system [6] can only show the information based on the current humanoid status and sensory data. In other words, the system provides a function of on-line viewing. On the other hand, off-line reviewing is important for humanoid debugging. For example, developers may want to see internal status of robot once the robot collapse. The internal data is updated more than one kHz, it is impossible for them to check the data on-line. For such a situation, off-line reviewing function allows us to see the data step-by-step and to find what is wrong more easily. In this paper, we introduce an MR debugging system that supports both on-line viewing and off-line reviewing functions. The system makes it possible to see both the status and recorded status in the same view. Developers can

compare the status. In Section 2, we summarized the related works. Our approach and the implementation with a full size humanoid, HRP-2 is given in Section 3 and 4, respectively. Then, we show some examples in Section 5. Discussions and conclusions are given in Section 6. II. RELATED WORKS In the field of MR, a couple of applications for operating a robot arm using MR technologies have been proposed. For instance, Klinker, et al. have proposed a system that instructs how to control a LEGO-based robot arm with overlaying a virtual arm [7]. This application shows the potential of MR technologies for improving human-friendliness in robot operations. However, their approach for constructing as an MR application is too simple to be directly applied to an integrated complex autonomous robot such a humanoid. Recently, an approach that uses a laser projector to indicate industrial robot trajectories has been proposed [8]. In this approach, the trajectories of the end of the robot hand are displayed onto an actual workpiece by the laser projector placed on the ceiling. This approach allows us to understand and confirm the trajectories easier than conventional ways. However, there is a big limitation that this system can render graphics only onto physical surfaces. In any case, we note that an industrial robot is significantly simple in comparison with a humanoid, which should move around autonomously. As a deeply related work, Chestnutt, et al. have proposed a kind of MR system [6]. As described in the section 1, while their system provides only on-line viewing, our system can provide off-line reviewing additionally. Moreover, their system has the following issue. It is very important what type of display is used for MR viewing. For debugging a humanoid, a developer must keep watching the behavior of the humanoid carefully. In the system described in [6], however, an ordinal large flat display is used to show the results of the footsteps, even though we have to look away from the humanoid. In contrast, our system allows us to keep watching the humanoid. III. APPROACH In debugging of humanoids, the developers want to observe differences between variables in the internal models designed by them and physical properties of the actual environment. The differences cause unsettled behavior of the humanoid even though the models of the behavior had been well designed with simulation. It is difficult to find out cause of the unexpected behavior in the integrated complex system. Therefore, for efficient debugging, the heuristic method related to the actual environment supports the developers with graphical representations of Mixed Reality. In our approach, the humanoid’s representation can be displayed on the functional existent places. It provides an intuitive grasp of plural multi-dimensional information as the internal representation of the humanoid's behavior. One

CG models

Camera pose tracker Camera pose Camera coordinates

Model view matrix

CG rendering

Camera Captured image

+

Mixed Reality image

Pixel composite

Fig. 1. Mixed Reality image composition for a video see-through HMD.

of the functions is that, sensory measurements of the humanoid are shown on the place of the actual sensor during the behavior in the actual environment from arbitrary viewpoint of the developer. The 3-D graphics, describing the humanoid's representation, is drawn on the captured image of the operator's viewpoint, which is related to the pose of an actual camera installed in a Head Mount Display (HMD). Moreover, we provide the system of viewing and reviewing how the humanoids sensed, planned, and behaved. We describe the methods of our approach below. A. Video see-through HMD The developer must keep watching the humanoid in debugging for safety reasons. Separation of displaying is not good in such a critical operation. Therefore, the display method to superimpose the information about the humanoid’s representation on the actual scene is one of the solutions to present the useful debugging environment. An HMD is suitable for the case. In MR applications, there are two kinds of the HMD. One is an optical see-through type, and the other is video see-through type. In our approach, the graphical representation of the humanoid is many, and formed from 3-D graphics. Using the Video-See-Through Head Mount Display (VST-HMD), the observers can distinguish the representation overlaid on the actual scene clearly. Fig.1 shows a deployment diagram to display MR composite images by the VST-HMD. The VST-HMD consists of cameras and display panels in front of the operator’s eyes, for displaying the composite images. The camera is used to capture the actual scene. The pose of the actual camera just captured the frame is used as the model view matrix, the pose as a viewpoint and an eye direction in rendering of 3-D graphics. In current MR researches, the tracking of the camera pose is the topics. We can use a camera pose tracking method for this. Then, the coordinates of the models are transformed by the camera pose. Rendering hardware draws the 3-D graphics of the models on the captured image as a background. Finally, the MR image is shown on the display panel in the VST-HMD. B. Graphical representation of the humanoid internal In the previous section, we mentioned the way of the MR visualization with the VST-HMD. The 3-D graphics, the representation of the humanoid's internal process, is an important component in the debugging. For instance, while

an arrow is used to present 3-DOF sensory measurements, the graphical properties of the arrow, such position, orientation, and length, can be assigned to the measurements. Although the color and line-width of the graphic can be used also, these properties are illegible in transparent view. In our approach, 6-DOF force sensor measurements consist of two arrows, one is representation of x-, y-, and z-axis force measurement, and the other is representation of a couple of torque along these axes. Other results of the internal processes, like the footstep planning, are prepared for 3-D graphical representations to recognize the variables by the developer. The 3-D graphical representations are updated dynamically from the notification of servers, which treats variables of the humanoid's internal process. In addition, it is useful to select which contents of the 3-D graphical representations of viewing. The developers want to change a focus of the debugging target, since the number of the contents is so many. The scene graph tree, which is implemented in the OpenInventor, is useful to show the graphical representation and to control their properties. We provide a management function of synchronization between properties of the transformation nodes in the tree, and the sensory information in the actual behavior. It allows the developers to show the interactive behavior of the humanoid against their stimulation in real time. C. Measure the actual behavior and the HMD in the same coordinates. The position of the humanoid’s odometer is instable by the biped walking. Therefore, it demands to measure the attitude externally for objective observation. The pose of the HMD, the device for the objective observation, is also measured in the same coordinates of the humanoid’s behavior. To realize the condition, we use an optical motion tracking system. The tracking system can track some marker objects by plural cameras with stereo. If the transformation between the marker and the physical center of the humanoid is calibrated in advance, the other poses of the body components of the humanoid can be obtained by the angles of the joints. D. Log the humanoid’s behavior for reviewing In simulation of the humanoid's behaviors, it is practical to reappear of the behavior with prepared virtual data. Conversely, in the actual environments, it is difficult to reappear of the behaviors under the unexpected environment. Therefore, the developers require the log of the behaviors in the actual environment to review them. The process of walking control is too fast to see it in real time. Reviewing the motion with a slow or backward replay function is convenient for the developer in some cases, such control behaviors. The developer wants to see the past behavior to compare the current one also. The past representation should be shown in the same frame for

Fig. 2. Video see-through HMD: VH2002 (Canon) with markers. Laser sensor pose Neck angle Chest pose

HMD markers pose

Chest markers Chest rigid pose

Right camera pose Wrist pose Torso pose

Ankle pose Humanoid robot

Left camera pose Video see through HMD

Mocap coordinates World coordinates

Fig. 3. Coordinates respect to Mixed Reality visualization.

comparison them. We propose a distributed log system to satisfy the demands. The log system consists of simple distributed log module related to each internal process of the humanoid. The output of the process is treated as data streams. The streams is stored and restored without interference of the current humanoid's behavior. Therefore, the proposed system allows us to view and review the humanoid's behaviors in any time on demand. IV. IMPLEMENTATION In this section, based on the component descriptions in Section 3, we describe about the proposed system. A. Display device of VST-HMD We use a Canon's COASTAR -type video see-through HMD, VH-2002 [10], which is shown in Fig. 2 as one of the key devices to achieve this system. The VST-HMD has two cameras and two display panels for stereo viewing. Each pixel resolution of the cameras and the panels is VGA. Since the optical axes of the center of the camera lens and the center of the panel are aligned straight configuration, we can see a scene naturally even though we have to see through videos. As shown in the figure, several retroreflective markers of the motion capture system are placed above the HMD in order to measure its pose. The local transformation from the origin of the marker coordinates to the origin of the camera coordinates is calibrated in advance. B. Transforms between the humanoid and the HMD It is necessary to obtain accurately the poses of the humanoid and the HMD's camera in the actual environment. There are many local transformations between the

TCP/IP Reference time controller

Perception Perception process process

FIFO Socket Server

FIFO Socket Planning Server Planning process FIFO process Control Control process process Humanoid

UI

Log Log server server unit unit

TCP/IP Socket Server

User Interface

TCP/IP

Stream Stream storage storage unit unit Log Log server server unit unit

TCP/IP

Stream Stream storage storage unit unit Log server unit Log server unit Stream storage Stream storage unit unit

Distributed log module

Stream receiver 3D graphics converter

Rendering unit

Stream receiver 3D graphics converter

Rendering unit

Viewer module

Fig. 4. The distributed log system for viewing and reviewing of the humanoid’s behaviors.

humanoid and the HMD shown in Fig. 3. The markers of the HMD is described in the previous section, the markers of the humanoid are put on its chest. Once the local transformation between the chest's markers and the chest body is calibrated, the attitude constructed by the physical joints of the humanoid can be described in the world coordinates by using it. In the preparation of the visualization process of rendering the measurements of the force sensor, the set of the local transformations appeared in the figure is used adaptively to adjust to render the measurements from the viewpoint. C. Distributed log system The processes, such perception, planning, and control of the humanoid, contain various update frequencies, highly frequent control such as the generation of ZMP trajectories, less frequent but heavy computation like the surroundings recognition by a stereo vision [1]. On the other hand, the frequency of displaying is usually 60 or 30Hz. The management of these various frequencies of the processes is too complicated to handle such a simple log method like a printout. Accordingly, in the proposed system, the variables reported by the actual humanoid's internal processes are treated as data streams shown in Fig. 4. The streams are established over the remote systems by TCP/IP network protocol among the processes, distributed log modules, and viewer modules. Since the internal processes are run on the CPUs of the humanoid, each process sends the internal own variables to a respective socket server via FIFO, named pipe. Each socket server sends the variables to connected clients whenever the updated variables are arrived. The distributed log module is the central function of the log system. The log module behaves like a network caching proxy server. It consists of a log server unit and a stream storage unit. The server unit has three kinds of socket port. One port is connected to the socket server of the humanoid to obtain the variables of the internal process. When the updated variables are sent by the socket server, the log unit

brings it to the storage unit, if possible. The stream storage unit has a function of dual access method. It is possible for the storage unit to pick up the past data referred while new data is added without interference. The data is serialized to files by the storage unit during interruption of the behavior. The control socket port of the log server unit waits a reference time packet from a reference time controller. The reference time packet means the timestamp of the data, which is related to an observer's viewing. If the connection from the viewer module has been established, the data requested by the reference time controller is picked up from the storage unit by the log unit. Then the log unit forwards the data to the viewer module immediately. The reference time controller is operated by a user interface, such as a jog dial. It allows slow or reverse reviewing by control of the sequence of the reference time packet. Plural log modules can be increased on demand, and the modules can be controlled by multiple reference time controllers. Since all modules accept multiple client connection, the system is extendable and flexible. D. Viewer module The viewer module consists of a stream receiver, a 3-D graphics converter, and a rendering unit. The stream receiver works under the 3-D graphics converter to hold the current values sent by the log module or the socket server. The graphic converter engages in presentation of the holding values as 3-D graphics. At the updated timing of the rendering unit for redrawing the current Mixed Reality image, the converter unit prepares 3-D graphical data before the rendering. In order to adjust the attitude and the status of the sensory measurements, the converter unit also calculates the current poses with the measurements of the motion capture. The rendering unit has a scene graph tree of the humanoid’s models. Visual effects of rendering shadow and environmental texturing are utilized for realistic visualization in the rendering unit.

Fig. 5. The experimental environment. HRP-2, a humanoid robot, and cameras of an optical motion capture system placed on the ceiling.

The viewer module is also possible to connect to the plural log module and the socket server of the humanoid. In comparison visualization, with the plural server connections, the system is possible to show the graphical representations as the plural humanoid's internals of the behaviors in real time and/or in past.

Fig. 6. Overlay the laser range sensory measurements. The triangle shape from the head represents the measurements.

V. APPLICATION EXAMPLES This section describes the current system implemented for a full size humanoid, HRP-2 [10], and the experiments carried out in order to show applicability and effectiveness of the system. The height of the HRP-2 is 1.54 meters in a standing default posture, and its total weight is about 58kg. The HRP-2 has totally 38 joints. Two CPU modules; one of which works for vision processes and the other works for control processes, are built in its chest. These processes run in a multithreaded form under Linux. To communicate with other PCs, Wi-Fi systems are also installed. The appearance of the experimental environment composed mainly of the HRP-2 and a motion capture system is shown in Fig. 5. The motion capture system, manufactured by Motion Analysis Corporation [11], is used with ten infrared cameras placed around this area. The capture system brings us the measurement area of about 5 m x 5 m x 2m. The measurement results are provided via TCP/IP on the Ethernet and updated at 120Hz. A PC is prepared for the distributed log module that connects to the humanoid's socket server. For the visualization modules, a PC equipped with high performance graphics cards (GeForce7800, nVidia) is prepared. Although the HMD requires two VGA inputs for stereo display, the performance of the visualization module is over 25 frames per sec. Below, we show examples of viewing the humanoid’s representation overlaid on the actual scene. A. Visualization of laser range sensor measurements The HRP-2 has a laser range sensor (URG1, Hokuyo) in the head. The sensor has a range of 4 meters with a reported accuracy of 10mm, and scans a single plane through 240 degrees (769 range measurements). Fig. 6 shows the visualization of the measurements of the sensor overlaid on the actual scene. The 6DOF pose of the sensor is linked to the head. In the figure, the transparent triangle shape from

Fig. 7. An MR image is visualization of force sensory information. The arrows represent measurements of the force sensor while external force is given to the left wrist of the humanoid by an operator.

Fig. 8. Footstep result is overlaid on the actual environment. The latticed blocks are virtual obstacles. The rectangle array from the humanoid’s feet represents the planned footprints avoiding the obstacles.

the head represents the measurements of the sensor. The triangle changed of the shape is affected of the stairs placed in front of the humanoid. B. Visualization of force sensory information The motion control requires the measurements of the force-sensors of the humanoid to realize physical interaction during walk [12]. The measurements are overlaid on the physical position of the sensors shown in Fig. 7. The length of the arrow represents the force measurement while an operator inflicts force on the humanoid's left hand. The ball at the hip is the estimated concentrated mass position of the humanoid. The force measurements of the ankle's force sensor are seen also. C. Footstep planning An MR image of footstep planner result by [6] is shown in Fig. 8. In the figure, the latticed blocks as virtual

Fig. 9. MR visualization images of viewing and reviewing of the humanoid’s walking. In each figure, the right hand is the actual humanoid, the left one is the representation replayed by the distributed log module. Arrows from the foot represent 6DOF measurements of the force sensor at the ankle. The sphere and vertical arrow at the hip represent the center of gravity of the humanoid and the ZMP vector.

obstacles are placed on the ground. The footstep planner computes the footprints from the current position to the observer to avoid the obstacles. Array of the rectangles represents the planned footprints. In the footstep planner example, we can see the interactive results by a change of the goal position. Since the footstep planning allows us to change the obstacles tracked of the physical objects, it is possible to interactive debugging of the footstep planner. D. Viewing and reviewing of humanoid walking Captured images of the viewing and reviewing of humanoid's walking are shown in Fig. 9. In the each MR image, the left humanoid is the actual humanoid, and the right humanoid is the graphical representation recorded by the distributed log module. Walking pattern generators in [13] is used. Arrows from the foot represent measurements of force and torque of the 6DOF force sensor at the ankle. We can see both the current behavior and the past one simultaneously to compare the walking behavior. VI. CONCLUSIONS AND DISCUSSIONS In this paper, we have presented a developmental system for a humanoid. The system with Mixed Reality technology allows us to view and review how the internal processes of the humanoid sensed, perceived, planed and behaved. The features of the system, on-line viewing and off-line reviewing, are useful to understand intuitively how each module in the humanoid affects the total system and how one is affected by the humanoid’s surrounding environment. The effectiveness of the system is not proved quantitatively with a practical development. Because enough samples of developmental cases cannot be collected, the total system is too expensive, so that it is difficult for us to prove the effectiveness with general cases. Currently, we extend the system owing to use a general robot as a part of a developmental platform. In addition to the development, the technology implemented in the system seems to be useful for general Human Robot Interaction (HRI). By wearing an HMD, everyone can see what and how robots think. The feature as communication equipment can be a potential alternative to other approaches such as controlling facial expression of humanoids. The current system utilized visual information only, with other sensory channels, it is possible to enhance the developer’s perception. Tangible interface is a good

candidate for the system to realize bi-directional communication. As described, we do not have any concrete proof on how the system is effective for the humanoid development. We will examine the effectiveness through practical humanoid development. REFERENCES [1] K. Nishiwaki, J. Kuffner, S. Kagami, M. Inaba, and H. Inoue. The experimental humanoid robot H7: A research platform for autonomous behaviour. Philosophical Trans. of the Royal Soc. A, 365(1850), pp. 79–107, 2007. [2] J. Kuffner, S. Kagami, M. Inaba, and H. Inoue. Graphical simulation and high-level control of humanoid robots. In Proc. of IROS’00, pp. 1943-1948, 2000. [3] R. Azuma. A survey of Augmented Reality. In Presence: Teleoperations and Environments, 6, 4, pp.355-385, 1997. [4] H. Tamura et al, Mixed reality: feature dreams seen at the border between real and virtual worlds. In IEEE Computer Graphics and Applications, 21, 6, pp.64-70, 2001. [5] H. Yamamoto. Case studies of producing mixed reality worlds. In Proc. IEEE SMC’99, 6, pp.42-47, 1999. [6] J. Chestnutt, P. Michel, K. Nisiwaki, M Stilman, S. Kagami. and J. Kuffner. Using real-time motion capture for humanoid planning and algorithm visualization. In Video proc. of the 2006 IEEE Int. Conf. on Robotics and Automation (ICRA’06), Video-220002, 2006. [7] G. Klinker, H. Najafi, T. Sielhorst, F. Sturm, F. Echtler. M. Isik, W. Wein and C. Trubswetter. FixIt: An approach towards assisting workers in diagnosing machine malfunctions. In Proc. of the Int. Workshop on Design and Eng. of Mixed Reality Systems (MIXER’04), 2004. [8] M. F. Zaeh and W. Vogl. Interactive laser-projection for programming industrial robots, In Proc. of the Intl. Symp. on Mixed and Augmented Reality (ISMAR’06), pages 46–55, 2006. [9] A. Takagi, S. Yamazaki, Y. Saito, and N. Taniguchi. Development of a stereo video see-through HMD for AR systems. In Proc. of ISAR2000, pp. 68–80, 2000. [10] K. Kaneko, F. Kanehiro, S. Kajita, H. Hirukawa, T. Kawasaki, M. Hirata, K. Akachi, and T. Isozumi. Humanoid robot HRP-2. In Proc. of the 2004 IEEE Int. Conf. on Robotics and Automation (ICRA’04), 2, pp. 1082–1090, 2004. [11] MotionAnalysis - http://www.motionanalysis.com/ [12] K. Nishiwaki, W. Yoon, and S. Kagami. Motion control system that realized physical interaction between robot’s hands and environment during walk. In Proc. of the Humanoids’06, pp. 542-547, 2006 [13] K. Nishiwaki, S. Kagami, Y. Kuniyoshi, M. Inaba, and H. Inoue. Online generation of humanoid walking motion based on a fast generation method of motion pattern that follows desired ZMP. In Proc. of Int. Conf. on Intelligent Robots and Systems (IROS’02), pp. 2684–2689, 2002