Target Object Identification and Localization in Mobile Manipulations

4 downloads 0 Views 605KB Size Report
[7] Travis Deyle, Hai Nguyen, Matt Reynolds, and Charles C. Kemp, “RF vision: RFID receive signal strength indicator (RSSI) images for sensor fusion and ...
Proceedings of the 2011 IEEE International Conference on Robotics and Biomimetics December 7-11, 2011, Phuket, Thailand

Target Object Identification and Localization in Mobile Manipulations Yong Jiang, Ning Xi, Member, IEEE, Qin Zhang, and Yunyi Jia 

Abstract—How to make mobile manipulator autonomously identify and locate target object in unknown environment, this is a very challenging question. In this paper, a multi-sensor fusion method based on camera and laser range finder (LRF) for mobile manipulations is proposed. Although the camera can acquire rich perceptual information, the image processing is very complex and easily influenced from the change in ambient light. Moreover, it can not directly provide the depth information of the environment. Since the LRF has the ability to directly measure 3D coordinates and the stability against the ambient light influence, meanwhile, the camera has the ability to acquire color information, the combination of the two sensors by making use of their advantages is utilized to obtain more accurate measurements as well as to simplify information processing. To overlay the camera image with the measurement points of the LRF pitching scan and to reconstruct the 3D image which includes the depth-of-field information, the model and the calibration of the system are built. Based on the combination of the color features extracted from the color image and the shape, size features extracted from the 3D depth-of-field image, the target object identification and localization are implemented autonomously. In order to extract the shape and size features, a triangular facet normal vector clustering (TFNVC) algorithm is introduced. The effectiveness of the proposed method and algorithm are validated by some experimental testing and analysis carried out on the mobile manipulator platform.

A

I. INTRODUCTION

great advantage of a mobile manipulator is a flexible operational workspace in contrast with a limited workspace of a fixed manipulator. This advantage endows the mobile manipulator with the ability to perform manipulation tasks in larger workspace, and become more maneuverable and transportable. Therefore, the mobile manipulator has applications in many fields, such as rescue operation, assembly task, aerospace, service environment, and nuclear Manuscript received July 21, 2011. Yong Jiang is with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA (e-mail: [email protected]). He is also with the State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016 China (e-mail: [email protected]). Ning Xi is with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA (e-mail: [email protected]). Qin Zhang is with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA. She is also with the Mechatronic Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China (e-mail: zhangqin0126@ gmail.com). Yunyi Jia is with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA (e-mail: [email protected]).

‹,(((

144

power station. When performing a dangerous task, like explosive disposal, many actions no man involved need to be done autonomously by the mobile manipulator. That is because: Firstly, the working environment is hazardous, usually unstructured or unknown in advance. Secondly, the dynamic change often happens to the environment. The walking person or the moving object often makes the environment become more complex. Thirdly, the operating target in the workspace may be not sure or be confused with the complicated scene. Therefore, the robot has to decide where to go, whether it is the target object or not, how to navigate to reach for it, how and where to grasp it, how to hold it, etc. Especially, the capacity for autonomous target object identification and localization is a very important function to improve the performance of the mobile manipulations. This is also a very challenging question for such a system. In order to solve it, the various sensors are fixed on the mobile manipulator. In [1], a sensor system for an outdoor service robot OSR-02 is developed. This sensor system consists of a laser range finder (LRF) and two cameras. The camera image processing method is employed for the target object detection and tracking. The plane segment finder method is presented to extract the data of the target object from the LRF measurements. In [2], a mobile manipulator that can navigate autonomously and transport jobs and tools in a manufacturing environment is developed. The camera mounted on the manipulator identifies jobs for pick-and-place operation. A variation of correlation based adaptive predictive search method is used for job identification. In [3], a multi-robot system composed by a mobile vehicle and a manipulator is developed. The system is equipped with some exteroceptive sensors like sonar and LRF. A color camera is mounted on the end-effector of the manipulator to locate objects to operate with. An ultrasound range finder, near the camera, is used to get distance measurements. In [4], a mobile manipulator ALACRANE assistant for exploration and rescue missions is presented. The system is equipped with CCD and thermal cameras and a LRF. The CCD and thermal cameras are used together to determine the possible targets. The LRF offers the main functions in navigation mode. For the above research achievements, the methods used for target object identification and localization are mainly based on the camera image processing. Although the camera can acquire a large amount of information such as color, shape, etc., it can not directly get the 3D data of the environment.

Moreover, the camera image processing is very complex and easily influenced from the change in ambient light. So the target object identification and localization only by camera may be not robust. In [5], a mobile manipulator for home service is described. Its sensor system consists of a RFID, a stereo camera and a LRF. To realize the recognition of various objects as well as to get more information, a multi-sensor fusion method for the RFID and the stereo camera is proposed. By referring the chair model from the database related to the ID, the detection of handling point on the chair is discussed based on the RFID and the LRF. In [6], a behavior selection system for human-robot interaction of the mobile manipulator El-E is presented. This system has a laser-pointer interface. When a user illuminates a location in the environment with an off-the-shelf green laser pointer, the laser-pointer interface can detect and estimate the 3D coordinates by stereo camera. Aiming at the same platform, a perception mode that can produce the images of spatial distribution of received signal strength indication for each of the tagged objects is proposed in [7]. From the above mentioned approaches, it can be seen that due to some auxiliary constraints in the environment, the target object identification and localization can be easily realized for the special applications, such as the RFID-based method for home service, the laser-pointer interface for human-robot interaction. However, the similar method based on the auxiliary constraints may be not feasible for more general applications. Since the LRF has the ability to directly get the 3D coordinates of the environment and the stability against ambient light influence, meanwhile, the camera has the ability to acquire color information, the combination of the two sensors by making use of their advantages is an available way to obtain more accurate measurements as well as to simplify information processing for autonomous target object identification and localization. To do so, the following problems should be discussed: (a) How to build the system model and calibrate the multi-sensor unit; (b) How to make good use of the information come from the two distinct sensors; (c) How to accurately extract the object features from the large amount of measurement data. In this paper, a multi-sensor fusion method based on camera and LRF in mobile manipulations is proposed. To overlay the camera image with the measurement points of the LRF pitching scan and to reconstruct the 3D color image including the depth-of-field information, the model and the calibration of the system are built. Then, based on the combination of the color features extracted from the color image and the shape, size features extracted from the 3D depth-of-field image, the autonomous target object identification and localization are implemented. In order to extract the shape, size features of the object, a triangular facet normal vector clustering (TFNVC) algorithm is also introduced.

145

II. MULTI-SENSOR SYSTEM FOR MOBILE MANIPULATOR A mobile manipulator system is usually composed of a mobile base, a manipulator, and a sensor unit including camera and LRF. There is no commercial 3D LRF available that is suitable for applications of mobile manipulations. A common solution to make 3D scanning feasible is to use a standard 2D LRF and a mechanical actuator to reach the third dimension, such as pitching scan, rolling scan and yawing scan in [8]. In this research, the pitching scan is adopted by using a Sick LMS111 2D LRF and a Schunk PW-70 pan tilt actuator. The configuration of the multi-sensor system for the mobile manipulator is shown in Fig. 1.

Fig. 1. Configuration of multi-sensor system

Because of the unknown offset between the pitching axis and the LRF origin, and the initial pose error which is difficult to measure directly, the calibration between the pan tilt actuator and the LRF need to be carried out to increase the measurement accuracy of the system. Here the calibration algorithm introduced in [9] is referred. Its principle is shown in Fig. 2.

Fig. 2. Calibration principle.

f ( Lo , I o , M o , d , E )

Lo cos E cos(T i  M o  I o ) 

U i sin(D i  E ) cos(T i  M o )  d

(1)

Eq. (1) is a target function, where Lo represents the length between the pitching axis O and the real origin Ol of the LRF, Io the angle between the l y axis of the LRF rectangular coordinate frame ¦ l { l x, l y, l z} and the line OOl , Mo the initial pose error of the pitching angle, Ui the

ranging data of the LRF, Ti the nominal pitching angle, D i the scanning angle of the LRF, E the angle between the l x axis of ¦ l and the surface of the calibration board, d the distance between the axis O and the surface of the calibration board. The calibration parameters Lo , Io and Mo can be obtained by minimizing Eq. (1) based on the Levenberg-Marquardt algorithm in [10]. The Matlab Calibration Toolbox developed in [11] is used

to estimate the intrinsic parameters of the camera. Since the measurements of the camera and the LRF can not be consistent with each other directly, to realize the fusion of the two sensors, it is absolutely necessary to get the precise model of the homogeneous coordinate transform matrix between the coordinate frames associated to them. In this research, the extrinsic calibration method described in [12] is used. III. MULTI-SENSOR FUSION METHOD A. Multi-sensor Fusion Method The flowchart of multi-sensor fusion for target object identification and localization is shown in Fig. 3. Start

Obtain 3D depth-of-field image of a scene

Obtain color image of a scene

Segment measurement point cloud Shape matching and pick out candidate areas No

Yes

Obtain candidate areas Yes

Extract size features

Extract color features

Size and color matching

Identify target object Database of object features

No

Yes Locate target object

End

Fig. 3. Flowchart of multi-sensor fusion.

This multi-sensor fusion method is divided into three stages: information preprocessing, candidate selection, identification and localization. 1) Information Preprocessing Aiming at the same scene, obtain two images. One is a 3D depth-of-field image by pitching scan of the 2D LRF, the other is a color image from the camera. Then, based on the calibration of the two sensors, reconstruct a 3D color image which includes the depth-of-field information by overlaying the color image with the measurement points of the LRF. In addition, the feature vector of the target object to be identified and located, as the known information, is saved in the database. 2) Candidate Selection Segment the measurement point cloud of the 3D

146

depth-of-field image based on a triangular facet normal vector clustering (TFNVC) algorithm, which is presented below. For each cluster, use the gradient-based algorithm with Sobel operator to detect its edges. Then, by matching the shape with the feature vector in the database, pick out the candidate areas in the 3D depth-of-field image. If get one or more, perform the next stage, otherwise, go to the end. 3) Identification and Localization For each candidate area in the 3D depth-of-field image, use the homogeneous coordinate transform model of the system and the geometry method to extract its size features in the real scene, such as width, height, etc. Meanwhile, according to the calibration of the sensor unit, find out the corresponding area of the candidate from the color image, and extract the color features, such as color histogram, color moments, etc. Then, by matching the size, color features with the feature vector in the database, identify the target object. If successful, locate it based on the ranging data of the LRF, otherwise, go to the end. The above method has the following characteristics: --The fusion of the color and the 3D depth-of-field information is the basis of the method. --The color, shape and size features are all used for the object identification. --The shape and size features are both extracted from the 3D depth-of-field image. By this way, the influence of the ambient light can be avoided. --By picking out the candidate areas based on shape matching, the computational burden for the color feature extraction can be reduced. B. Segmentation and Edge Detection of 3D Depth-of-field Image The segmentation of the measurement point cloud in the 3D depth-of-field image is critical for edge detection and size feature extraction. In this research, the triangular facet normal vector clustering (TFNVC) algorithm is proposed. It is divided into the following three steps: 1) Triangular Mesh Generation The 3D depth-of-field image is composed of huge number of measurement points. By making the adjacent three points be connected into a triangular plane, a reconstructed triangular mesh image is generated. As shown in Fig. 4, each triangular plane is called a triangular facet. r

pij

aij

r

pi 1, j

bij r

pi , j 1

nij

Fig. 4. Triangular mesh and facet.

2) Normal Vector Calculation Let r pij ( r xij , r yij , r zij )T  ¦ r be the points in the

reconstructed triangular mesh image. The triangular facet in the image is denoted as '( r pij , r pi 1, j , r pi, j 1 ) . Then, its normal vector nij is given by a ij u b ij

n ij

a ij u b ij

( nijx , nijy , nijz )T

(2)

a ij

r

p i 1, j  r p ij

(3)

b ij

r

p i , j 1  r p ij

(4)

scanning range up to 270°, and the angular resolution 0.25° or 0.5°. The CCD color camera captures images in 24-bit RGB color at rates to 30 fps. Its resolution is 640×480 pixels. The Schunk PW-70 servo-electric pan tilt actuator is a rotary module with two rotating axes for precise positioning. The rotation range of its pitch axis is ±120°, and yaw axis is 360°.

From Eqs. (2), (3), and (4), all the normal vectors of the triangular facets can be calculated in this step. 3) Ranging Point Clustering For the adjacent two triangular facets, a difference degree between their normal vectors nij and n mn is defined by cosine of vectorial angle, which can be written as dif ( n ij , n mn )

cos( n ij , n mn )

( n ij , n mn ) n ij ˜ n mn

(5)

Eq. (5) can be regarded as a classifier. If the difference degree is less than a setting threshold value O ( O  [0,1] ), the relevant measurement points of the two facets belong in the same cluster. Through the above three steps, the measurement points in the 3D depth-of-field image can be segmented into different clusters. In order to detect the edges of a cluster, the pixel gradient algorithm based on Sobel operator is used, which can be represented as: Gx

Gy G

ª 1 0 « 2 0 « «¬  1 0 2 ª1 «0 0 « «¬  1  2

1º 2 »» A 1 »¼ 1º 0 »» A  1»¼

(G x 2  G y 2 )

1 2

(6)

Fig. 5. Mobile manipulator platform.

The calibration results of the sensor unit on the mobile manipulator are as follows. --The parameters between the LRF and the pan tilt actuator are: Lo =147.8mm, Io =2.4°, Mo =3.1°. --The intrinsic parameters of the camera are given by ­ px ° uij ° ® ° px ° vij ¯

where

px

p ij

zij

c

yij

c

zij

 284.96  215.98

in the camera coordinate frame. --The homogeneous coordinate transform matrix between the camera and the LRF is given by T

gradient value in Y-direction, G is the pixel gradient value, A is the 3D depth-of-field image, and represents 2D convolution.

xij

c

pixel coordinate frame, c pij ( c xij , c yij , c zij )T  ¦ c the points

l c

where G x is the gradient value in X-direction, G y is the

835.24

c

( px uij , px vij )T  ¦ px is the points in the image

(7) (8)

830.22

ª 999.2 38.1 10.8 27.1º « 38.0 999.2 14.8 96.6 » « » « 11.3 14.4 999.8 21.9 » « » 0 0 1 ¼ ¬ 0

B. 3D Image Reconstruction Fig. 6 shows the testing results of the 3D color image reconstruction.

IV. EXPERIMENTAL TESTING AND ANALYSIS To validate the proposed method and algorithm, some experiments have been carried out on the mobile manipulator platform. A. Experimental Platform The mobile manipulator platform is composed of a four-wheel mobile base, a Schunk modular manipulator, and a sensor unit, as shown in Fig. 5. The sensor unit consists of a LRF, a CCD camera and a pan tilt actuator. The Sick LMS111 is an outdoor electro-optical laser measurement system that electro-sensitively scans the perimeter of its surroundings in a plane with the aid of laser beam. The infrared laser beam is radially irradiated from the center part of the LRF. Its measurement distance is maximum 20m, the

147

Fig. 6. 3D reconstructed color images.

The depth-of-field information in each image contains a mass of ranging data by pitching scan of the LRF. The objects in these images, such as paint can, calibration board, tool box, etc., indicate not only the color features, but also the depth

information relative to the mobile manipulator. The more the object near to the robot, the redder the point color is. The more the object far from the robot, the bluer the point color is. Therefore, different from the ordinary 2D image, these are the reconstructed 3D color images including the depth-of-field information.

picked out by shape matching (Fig. 9 (b)). Then, for each candidate area, the size and color features are extracted from the 3D depth-of-field image and the color image, respectively (Fig. 9 (c)). Next, the target object is identified by using the size, color features. Finally, the target object localization is performed by using the LRF ranging data (Fig. 9 (d)).

C. Segmentation and Edge Detection Fig. 7 shows the testing results of segmentation and edge detection of the 3D depth-of-field image. It can be seen that: when using the TFNVC algorithm, the threshold value O has a direct effect on the segmentation of the LRF measurement points. For the flat surfaces, this value should be closer to 1. However, for the non-flat surfaces, this value should be reduced properly in order to obtain good segmentations.

mm 2

mm 2

mm 2

O

0.98

O 0.98 O 0.94 Fig. 7. Segmentation and edge detection.

O

0.90

For the same scene, aiming at the 3D depth-of-field image and the color image, the gradient-based method is used for edge detection, respectively. The comparison is shown in Fig. 8. It can be seen that: the accuracy of the former (Fig. 8 (c) and (d)) is not better than the latter (Fig. 8 (e) and (f)), and only the rough shape can be detected by using the 3D depth-of-field information. But the former is more robust than the latter because the depth-of-field information isn’t easy to be influenced by the ambient light and the surface texture.

(d) Fig. 9. Testing results of target object identification and localization.

Table I is the contrast between the extracted size features from the 3D depth-of-field image and the actual measured data. Obviously, both of them are very close. Table II shows the matching results of the color and size features between the feature vector in the database and the extracted information from the reconstructed 3D color image. TABLE I CONTRAST BETWEEN EXTRACTED AND ACTUAL MEASURED SIZE FEATURES Extracted values from 3D Actual measured values Object depth-of-field image (candidate Width Height Width Height area) (mm) (mm) (mm) (mm)

Fig. 8. Comparison of edge detection.

D. Target Object Identification and Localization Fig. 9 shows the testing results of target object identification and localization based on the proposed multi-sensor fusion method. It can be seen that: firstly, the candidate areas in the reconstructed 3D color image are

148

A

75

205

76

206

B

77

207

77

208

C

76

207

77

207

TABLE II MATCHING RESULTS OF COLOR AND SIZE FEATURES Object Matching results (candidate Color features Size features area) A

Yes

Yes

B

No

Yes

C

No

Yes

The above experimental testing and analysis show that: (1) The proposed multi-sensor fusion method and the TFNVC algorithm are feasible and efficient to realize the autonomous target object identification and localization in mobile manipulations; (2) Since the color image is fused with the 3D depth-of-field information from the LRF pitching scan, the method based on the reconstructed 3D color image is more robust than the one only using camera information; (3) The method does not need to set extra auxiliary constraints in the environment, so it is more universal for various applications. V. CONCLUSION Aiming at autonomous target object identification and localization in mobile manipulations, a multi-sensor fusion method based on camera and LRF is proposed in this paper. By overlaying the measurement points of the LRF pitching scan onto the camera image, a reconstructed 3D image which includes the depth-of-field information is generated. Then, based on the combination of the color features extracted from the color image and the shape, size features extracted from the 3D depth-of-field image, the target object identification and localization are achieved. In order to extract the shape, size features of the object, the triangular facet normal vector clustering (TFNVC) algorithm is introduced. The experimental testing and analysis carried out on the mobile manipulator platform validate the effectiveness of the proposed method and algorithm. Further developments of this work will be devoted to study dynamic path planning and obstacle avoidance for mobile manipulations based on the multi-sensor fusion method. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

Takeshi Nishida1, Yuji Takemura1, Yasuhiro Fuchikawa1, Shuichi Kurogil, Shuji Ito, Masayuki Obata, Norio Hiratsuka, Hidekazu Miyagawa, Yasuhiro Watanabe, Toshinori Suehiro, Yoshinori Kawamura, and Fujio Ohkawa, “Development of a sensor system for outdoor service robot,” in Proceedings of SICE-ICASE International Joint Conference, Busan, 2006, pp. 2687-2691. S. Datta, R. Ray, and D. Banerji, “Development of autonomous mobile robot with manipulator for manufacturing environment,” International Journal of Advanced Manufacturing Technology, vol. 38, pp. 536-542, 2008. Andrea Gasparri, and Andrea Gasparri, “Pose recovery for a mobile manipulator using a particle filter,” in Proceedings of 14th Mediterranean Conference on Control and Automation, Ancona, 2006, pp. 1-6. A. Garcia-Cerezo, A. Mandow, J. L. Martinez, J. Gomez-de-Gabriel, J. Morales, A. Cruz, A. Reina, and J. Seron, “Development of ALACRANE: A mobile robotic assistance for exploration and rescue missions,” in Proceedings of IEEE International Workshop on Safety, Security and Rescue Robotics Proceedings, Rome, 2007, pp. 1-6. Manabu Yamashiro, Zhaoxian Xie, Hisashi Yamaguchi, Aiguo Ming, and Makoto Shimojo, “Home service by a mobile manipulator system -mobile manipulation of chairs,” in Proceedings of IEEE International Conference on Automation and Logistics, Shenyang, 2009, pp. 295-300. Hai Nguyen, Advait Jain, Cressel Anderson, and Charles C. Kemp, “A clickable world: behavior selection through pointing and context for mobile manipulation,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, 2008, pp. 787-793.

149

[7]

Travis Deyle, Hai Nguyen, Matt Reynolds, and Charles C. Kemp, “RF vision: RFID receive signal strength indicator (RSSI) images for sensor fusion and mobile manipulation,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, 2009, pp. 5553-5560. [8] Oliver Wulf, and Bernardo Wagner. (2003). Fast 3D scanning methods for laser measurement systems. Available: http://www.rts.uni-hannover.de/images/5/5f/Wulf03-CSCS.pdf [9] Zhiyu Xiang, and Eryong Wu, “Design and Calibration of a Fast 3D Scanning LADAR,” in Proceedings of IEEE International Conference on Mechatronics and Automation, Luoyang, 2006, pp. 211-215. [10] Manolis I. A. Lourakis. (2005). A Brief Description of the Levenberg-Marquardt Algorithm Implemented by levmar. Available: http://www.ics.forth.gr/~lourakis/levmar/levmar.pdf [11] Jean Yves Bouguet. (2010). Camera Calibration Toolbox for Matlab. Available: http://www.vision.caltech.edu/bouguetj/calib_doc [12] Qilong Zhang, and Robert Pless, “Extrinsic calibration of a camera and laser range finder (improves camera calibration),” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, 2004, pp. 2301-2306.

Suggest Documents