TH-J2-1
SCIS&ISIS2006 @ Tokyo, Japan (September 20-24, 2006)
Advanced Soft Remote Control System in Human-friendliness Seung-Eun Yang1, Jun-Hyeong Do2, Hyoyoung Jang3, Jin-Woo Jung4, and Zeungnam Bien5 Department of Electrical Engineering and Computer Science, KAIST1, 2, 3, 5 Human-friendly Welfare Robot System Engineering Research Center, KAIST4 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea {seyang1, jhdo2, hyjang3, jinwoo4}@ctrsys.kaist.ac.kr,
[email protected] Abstract – This paper describes a human-friendly interface based on hand gesture. Using two USB pan-tilt cameras, this system enables the user to operate home appliances with hand pointing and ten hand motions free from the user’s position. The main contribution of the system lies in its user-friendly initialization methods including interaction between the user and the system. The user can initialize or modify the map describing the location of home appliances to control simply by hand gesture. Embedded feedback mechanism enhances the overall performance by adjusting user’s miss pointing.
following two issues: (1) by using two USB pan-tilt cameras, production and installation cost can be reduced and (2) newly adapted user-friendly initialization methods enhance usability. The remainder of this paper is organized as follows: Section 2 introduces overall configuration of the ‘advanced’ Soft Remocon System, Section 3 provides details of the targeted home appliance location initialization method, Section 4 shows experimental results with discussion, and Section 5 summarizes the study and concludes. II. ADVANCED SOFT REMOTE CONTROL SYSTEM
Keywords: 3D position recognition, Hand pointing command, Arbitrary camera location
I. INTRODUCTION Recent advances in computer science are extended into the concept of ubiquitous, pervasive or convergence, computing, where the user-friendly interfaces for effective and efficient operation of the devices are required. Gesture is one of the intuitive means in user friendly interface. Gesture-based interfaces achieve superiority over interfaces based on speech recognition in two reasons: in voice-operated system, a sensitive microphone nearby the user is essential, and gesture is more convenient to express spatial information. Nowadays, many home appliances are controlled by remote controllers. However, it is cumbersome because it requires the user to keep remote controllers close to him/her and select the appropriate one every time he/she wants to control some appliances. For the elderly or disabled people, these inconveniences are more serious. Recently, studies on gesture recognition to control home appliances have been attempted, but few of them focused on hand recognition to control multiple home appliances in the unstructured environment [1, 10-13]. We have already developed the ‘Soft Remote Control System’ to control multiple home appliances based on hand gesture using three CCD cameras attached on the ceiling [1-3]. In this paper, as an extension of the previous Soft Remote Control System, we propose an improved human-friendly gesture-based interface system which is focusing on the
Figure 1. Conventional soft remote control system
Figure 2. Advanced soft remote control system
Figure 1 shows the conventional soft remote control system using 3 CCD cameras attached on the ceiling [1, 9]. Because the position of each camera is fixed, the available range of user position for command is rather limited even though they have pan/tilt capability. On the other hand, the advanced system shown in Figure 2 allows arbitrary position of two USB cameras. So, user can move them wherever he/she wants to set them up in the residential place. With the pattern that is attached on the side of one camera, the advanced system builds relative 3D axis. It makes possible to calculate relative position of home appliances in arbitrary camera location. For the user friendliness, the system use simple hand pointing gesture to calculate 3D position of home appliances. Because of the diversity of pointing gesture depending on people, we define the pointing command as stretching user’s hand toward an object so the center of the object is concealed by the hand. When a user wants to control a certain home appliance, he/she points the appliance with his/her hand to select it. But user can’t point it perfectly every time. If user failed to point
- 218 -
the appliance, system finds the nearest appliance and let user know the direction and distance from the pointed position to the position of the nearest appliance. This concept of feedback enables user to select proper appliance, so the errors in recognizing the user’s pointing directions will be reduced. To control each home appliance, 10 different hand gestures are used as shown in Figure 3. With the assumption that a hand motion as a command gesture is performed in high speed and linking motions are generated before and after commands, the system determines the start and end points of commanding hand motion. For the recognition of hand command motion, a hierarchical classifier based on HMM is adopted [9].
Figure 5. Flow chart of home appliance position recognition (a)
1-dimensional motion
B. Recognition of the pattern (b) 2-dimensional motion Figure 3. Command gesture using hand motions
The information that we need to formulate relative axis from the pattern is the distance and angle between two cameras. Because the view of USB camera is limited, the size of pattern should be small enough. In order to achieve the needed information in arbitrary environment, we utilize a pattern which consists of red colored circles as shown in Figure 9.
III. USER-FRIENDLY INITIALIZATION A. Overall structure of initialization Figure 4 and 5 shows flow chart of home appliance position storing and recognition respectively. For face/hand detection and tracking, we adopt a dynamic cascade structure using multimodal cues which is used in the former system [6]. The system first recognizes a specific pattern automatically which is attached on the side of a camera by panning reference camera. Then, it calculates distance and angle between two cameras based on the result from the pattern analysis. With the information, it can calculate 3D position of the user’s face and hand. To store the 3D position of the home appliance which the user wants to control, he/she execute hand pointing command at different two places. When user is pointing, the system calculates directional vector from user’s face to hand. The cross point, or the center of perpendicular of two extended directional vectors is stored as the 3D position of the pointed appliance.
Figure 6. Procedure of pattern recognition
Figure 4. Flow chart of home appliance position storing
Figure 6 shows the pattern recognition procedure. Since RGB components are sensitive in light condition, we convert RGB image to YCrCb and discard Y component which contains luminance information. After split into Cr (red color information) and Cb (blue color information), detect red blobs using threshold. During the procedure, closing method is adopted to remove the fragments of red blobs. In order to remove other red blobs that are not belonged to the patter, we adopt the characteristics of distances between the red circles of the pattern. As shown in Figure 7, we can easily discriminate between 9 circles of the pattern and other two red blobs.
- 219 -
D. Calculation of 3D position To calculate the 3D position of home appliance, we should know the 3D position of user’s face and hand first.
Figure 7. Distance plot between centers of red area
C. Calculation of the distance and angle between two cameras Figure 10. Calculation of 3D position After recognizing the pattern, we should calculate the distance and angle between two cameras. For this, we After detection of user’s face/hand, we find the two lines, measured focal length of the camera using GML MatLab TM s 1 and s2 from center of each camera C M ,1 , C M ,2 and the Camera Calibration Toolbox and MATLAB (MATHWORKS, USA) [8]. user’s face/hand position p1 , p2 in each camera image,
respectively as shown in Figure 10 [7]. Then, we get the 3D position by finding the midpoint of the line segment PM ,1PM ,2 which is perpendicular to both s1 and s2 . But in this case, if the two cameras are located too closely, error will be increased because of limited information from stereo image. So, for this system, we assume the minimum distance between cameras is 0.5m. E. Storage of home appliance 3D position
Figure 8. Distance calculation from camera to pattern
Figure 8 shows the way to calculate the distance between two cameras. Using focal length, we can calculate the length a1 and b1. And b2 is the distance between circles on the pattern, so, we already know the length. Then, we can calculate the distance from the reference camera to the pattern by the proportionality. Figure 11. Storing 3D position of home appliance by user
After calculating the 3D position of user’s face/hand by the method discussed in section D, we can get a directional vector which starts from user’s face to hand. To calculate 3D axis of a specific object, user should point to the object from different two positions as shown in Figure 11. When user moves from position 1 to position 2, two USB cameras track user using pan-tilt function. In this case, to modify the relation about 3D axis, we defined translation matrix for panning and tilting angle.
Figure 9. Angle calculation using pattern
We can form a triangle using two columns on the pattern and the center point of reference camera shown in Figure 9. The distance from a camera to each column on the pattern can be calculated using proportionality which was discussed in Figure 8. If we know the three lengths of triangle, we can calculate the inside angle using the law of cosines.
F. Home appliance selection through hand pointing command and feedback for wrong pointing command To select home appliance, user performs hand pointing command. The procedure to select home appliance is shown in Figure 12.
- 220 -
by user’s view point. Every Y axis is parallel in the figure and we already know the directional vector from center of reference axis to stored home appliance position. So, the only information for axis transformation is the angle between axis_a and axis_d. The directional vector DIR is not always perpendicular to Plane C. But the imaginary plane DCF (Plane D) is always perpendicular both Plane B and Plane C. Because X_b and X_a are parallel, we only have to know the value of Ɵ to formulate translation matrix between axis_a and axis_d by kinematics. Through calculating the angle 90o+Ɵ between unit directional vector u(1,0,0) and Plane D, it’s possible to define translation matrix. Using the translation matrix, we can define user dependant axis from reference axis.
Figure 12. Storing 3D position of home appliance by
To recognize the position of home appliance, we extend the directional vector which is start from user’s face and ends at hand. The extension rate is determined by the length from user’s face to each home appliance. Figure 12 shows the case that the positions of 3 home appliances are stored. At first, the system calculates the distance between user and each appliance. And calculate the candidate position by extending directional vector for each extension rate. If the candidate position is included in the selection range which is shown as 3 boxes in Figure 12, then the device is selected. The above shows the case that VCR is selected. But, image data is very sensitive to environment. And hand pointing command can’t be done precisely every time. So, false recognition is inevitable problem. The most incident situation is that noting is selected when user is pointing to specific appliance. To reduce this problem and accommodate user interaction, we provide feedback information. The feedback tells user the nearest positioned appliance from pointed 3D position by user. It also provides the information of direction and distance to select properly the nearest positioned appliance.
IV. EXPERIMENTAL RESULTS A. 3D position of home appliance The stored 3D position of each home appliance is very important for the operation of entire system. Because, if the stored position is wrong, even though user points to appliance many times, he/she fails to select proper appliance. To evaluate the accuracy of proposed system, we executed experiment and the results are as follows.
Figure 14. Measured position and result position from the system
Figure 13. Axis conversion from reference to user’s view dependent axis
Figure 15. 3D position from different two subjects
But the positional information should be transformed Figure 14 shows the result of storing three different home correspond to user because the view point is changed appliance positions (The unit of each axis is cm). The point according to user’s seeing direction and position. Figure 13 inside rectangle is measured point and other 4 points in a circle shows the procedure of axis transformation where the axis_a is is experimental results. We can see that each result positions determined by the reference camera and axis_d is determined are in a cluster around measured point.
- 221 -
Figure 15 shows the result of different two subjects. The result from different subject is discriminated by rectangle. We can find the variation between the results especially in case A and case C. Because people can’t always do point exactly same, the variation of result is inevitable. But, the result shows that each person has their own tendency to point something. And additional experiments show that the result is very sensitive in different environment especially light condition. The result of cluster C in Figure 15 shows the case (One point is located exceptionally far from other points). B. Recognition of home appliance position and feedback Because each user has own tendency at pointing, it’s more meaningful to analyze the result of home appliance position recognition from stored position by same subjects. Table 1 shows the recognition rate. User pointed 5 times to each appliance (Appliance A ~ Appliance C) respectively, and ten experiments were executed. Table 1. Success rate of home appliance selection
Success rate
Appliance A
Appliance B
Appliance C
96% (48/50)
84% (42/50)
88% (44/50)
Table 2 shows the result of feedback. To evaluate the accuracy of feedback, we checked if the nearest positioned appliance is detected or not when no appliance is selected. And also, when the nearest positioned appliance is selected correctly, we observed whether the feedback direction is correct or not. Each experiment was done separately. Table 2. The accuracy of feedback
Success rate
Nearest appliance detection
Feedback direction
89% (89/100)
83% (83/100)
V. CONCLUDING REMARKS
REFERENCES [1] J.-H. Do, J.-B. Kim, K.-H. Park, W.-C. Bang and Z.Z. Bien, “Soft Remote Control System using Hand Pointing Gesture,” Int. Journal of Humanfriendly Welfare Robotic Systems, vol.3, no.1, pp.27-30, March 2002. [2] J.-W. Jung, J.-H. Do, Y.-M. Kim, K.-S. Suh, D.-J. Kim, and Z. Bien, “Advanced robotic residence for the elderly/the handicapped: realization and user evaluation,” Proc. of the 9th Int. Conf. on Rehabilitation Robotics, pp. 492-495, 2005. [3] Dimitar H. Stefanov, Z. Zenn Bien, and Won-Chul Bang, "The Smart House for Older Persons and Persons With Physical Disabilities: Structure, Technology Arrangements, and Perspectives," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 12, no. 2, pp. , June 2004 [4] Z. Zenn Bien, Kwang-Hyun Park, Jin-Woo Jung and Jun-Hyeong Do, "Intention Reading is Essential in Human-friendly Interfaces for the Elderly and the Handicapped," IEEE Transactions on Industrial Electronics, vol. 52, no. 6, pp. 1500-1505, 2005 [5] Hyoyoung Jang, Jun-Hyeong Do, Jinwoo Jung, Kwang-Hyun Park, and Z. Zenn Bien, “View-invariant Hand-posture Recognition Method for SoftRemocon System,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2004), Alberat, Canada, pp.295-300 , September 28-October 2, 2004 [6] J.-H. Do and Z. Bien, “A Dynamic Cascade Structure Using Multimodal Cues for Fast and Robust Face Detection in Videos,” Pattern Recognition Letters, submitted, 2005. [7] M. Kohler, “Vision Based Remote Control in Intelligent Home Environments,” Proceedings of 3D Image Analysis and Synthesis, Erlangen, Germany, pp. 147-154, November 18-19, 1996 [8] Vladimir Vezhnevets, GML Matlab Camera Calibration Toolbox, Graphics & Media Lab, Dept. of Computer Science, Moscow State University, November 1, 2005 [9] Jun-Hyeong Do, Hyoyoung Jang, Sung Hoon Jung, Jinwoo Jung and Zeungnam Bien, “Soft Remote Control System in the Intelligent Sweet Home,” Proc. Of IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, pp. 2193-2198, August 2-6, 2005 [10] N. Jojic, B. Brumitt, et. al, “Detection and Estimation of Pointing Gestures in Dense Disparity Maps,” Automatic Face and Gesture Recognition, Proc. 4 IEEE Int. conf. on, pp. 468-475, 2000. [11] S. Sato and S. Sakane, “A Human-Robot Interface Using an Interactive Hand Pointer that Projects a Mark in the Real Work Space,” Proc. of the 2000 IEEE ICRA, pp. 589-595, April 2000. [12] C. Colombo, A. D. Bimbo and A. Valli, “Visual Capture and Understanding of Hand Pointing Actions in a 3-D Environment,” IEEE Tr. on systems, man, and cybernetics, Part B: Cybernetics, vol. 33, no. 4, pp. 677-686, August, 2003. [13] K. Irie, N. Wakakmura, and K. Umeda, “Construction of an Intelligent Room Based on Gesture Recognition,” Proc. of IEEE Int. conf. on IROS, pp. 193-198, 2004.
In this paper, we proposed a method to calculate 3D position and recognize it only by means of hand pointing command. This executes indispensable task, initialization for soft remote control system which is capable of controlling various functions of home appliances through natural hand gesture. The method allows arbitrary camera position, so user position for command can be expanded. Moreover, it provides feedback when user’s pointing command is ambiguous. These not only enhance overall performance of the system, but also provide user friendly interface. So, the method is also applicable to other interface between human and intelligent system. For the further study, we will focus on enhancing the robustness of the system by using user-adaptation technology and devising new pattern. ACKNOWLEDGEMENT This Work is fully supported by the SRC/ERC program of MOST/KOSEF (Grant #R11-1999-008).
- 222 -