A Robust and Efficient Algorithm for Tool Recognition and Localization ...

International Journal of Advanced Robotic Systems

ARTICLE

A Robust and Efficient Algorithm for Tool Recognition and Localization for Space Station Robot Regular Paper

Lingbo Cheng1*, Zhihong Jiang1, Hui Li1 and Qiang Huang1 1 School of Mechatronic Engineering, Beijing Institute of Technology, Beijing, China * Corresponding author(s) E-mail: [email protected] Received 29 July 2014; Accepted 5 November 2014 DOI: 10.5772/59861 © 2014 The Author(s). Licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract This paper studies a robust target recognition and locali‐ zation method for a maintenance robot in a space station, and its main goal is to solve the target affine transformation caused by microgravity and the strong reflection and refraction of sunlight and lamplight in the cabin, as well as the occlusion of other objects. In this method, an Affine Scale Invariant Feature Transform (Affine-SIFT) algorithm is proposed to extract enough local feature points with a fully affine invariant, and the stable matching point is obtained from the above point for target recognition by the selected Random Sample Consensus (RANSAC) algo‐ rithm. Then, in order to localize the target, the effective and appropriate 3D grasping scope of the target is defined, and we determine and evaluate the grasping precision with the estimated affine transformation parameters presented in this paper. Finally, the threshold of RANSAC is optimized to enhance the accuracy and efficiency of target recognition and localization, and the scopes of illumination, vision distance and viewpoint angle for robot are evaluated to obtain effective image data by Root-Mean-Square Error (RMSE). An experimental system to simulate the illumina‐ tion environment in a space station is established. Enough experiments have been carried out, and the experimental results show both the validity of the proposed definition of

the grasping scope and the feasibility of the proposed recognition and localization method. Keywords Space station, maintenance robot, target recognition and localization, ASIFT & RANSAC, illumina‐ tion simulation, parameters estimation

1. Introduction Astronauts use video and other instrumentation to monitor a space station cabin. Cabin maintenance and operations involve various movements, such as touching, plugging and pressing [1], [2]. However, the harsh environment of space prevents astronauts from spending long periods of time in the space station. This challenge is further com‐ pounded by numerous operational tasks that must take place in narrow spaces and in a microgravity environment. Currently, technological limitations render long-term operational efficiency on a space station unfeasible. Therefore, there is an urgent need to develop a maintenance robot capable of providing assistance to (or replacing) astronauts for operational tasks such as maintenance and other general spacecraft operations. Although this task presents significant technical challenges, it also offers the Int J Adv Robot Syst, 2014, 11:193 | doi: 10.5772/59861

1

d Localization for Space tion Robot Type (Regular Paper) possibility of increased astronaut safety asassistance well as economic maintenance robot capable of providing to (or benefits [3], [4].astronauts for operational tasks such as replacing)

tion and a space affine strong t in the n this sform ugh local e stable or target sensus e target, of the te the e . Finally, ce the and distance obtain RMSE). nation nough imental nition of posed

maintenance and other general spacecraft operations. Space station cabin maintenance robots require target tool Although this task presents significant technical challenges, recognition and localization abilities to complete mainte‐ it also offers the possibility of increased astronaut safety as nance tasks well and operations The National as economic[5]. benefits [3], [4]. Aeronautics andSpace Spacestation Administration (NASA) has researched space cabin maintenance robots require target robots for several years and achieved some advances, such tool recognition and localization abilities to complete as Robonaut [6]tasks and and Robonaut 2 [7],[5]. [8].The TheNational humanoid maintenance operations Aeronautics and Administration has robot is designed toSpace use the same tools as(NASA) space-walking researched Robonaut space robots several years and achieved astronauts. 2’s for integrated mechatronic design some advances, as Robonaut [6] and Robonautcontrol 2 [7], results in a moresuch compact and robust distributed [8]. The humanoid robot is designed to use the same tools system with a fraction of the wiring of the original Robo‐ as space-walking astronauts. Robonaut 2’s integrated naut. In addition to the improvement of force control and mechatronic design results in a more compact and robust sensing, the vision modularity has potential to develop distributed control system with a fraction of the wiring of further. The most important problem of the vision system the original Robonaut. In addition to the improvement of is that the large metal surface the of the space cabin mayhas result force control and sensing, vision modularity in strong reflection and refraction of light in the cabin, potential to develop further. The most important problem resulting in images uneven illumination collected by of the vision systemwith is that the large metal surface of the the robot Specifically, the change space cabinvisualization may result insystem strong[9]. reflection and refraction of light in the cabin, resulting images withcould uneven in illumination that occurs nearinthe porthole cause illumination collected to by the visualization system [9]. serious interference the robot recognition and localization Specifically, the change in illumination that occurs near the systems (Figure 1). Furthermore, recognizable objects may porthole could cause serious interference to the recognition undergo affine transformation because of the effect of and localization systems (Figure 1). Furthermore, microgravity [10], which can cause the target recognition recognizable objects may undergo affine transformation task to fail and the accuracy rate to decline. Therefore, it is because of the effect of microgravity [10], which can cause crucial to develop a target recognition and localization the target recognition task to fail and the accuracy rate to method capable of overcoming target decline. Therefore, it is crucialillumination to develop a and target affine transformation. recognition and localization method capable of overcoming illumination and target affine transformation.

target AC; on

monitor rations ugging ment of riods of ther must take ronment. -term sible. pa

Figure 1. Illumination environment in the space station cabin. The Figure 1. Illumination environment in the spaceand station cabin. The image left image shows the strong reflection refraction ofleft light, shows the strong reflection and refraction of light, which is caused by the which is caused by the metal surface in the cabin. The right image metal surface in the cabin. The right image shows sunlight near the porthole. shows sunlight near the porthole

Target recognition is the foundation of target localiza‐ tion; feature extraction is the main component of target recognition. In order to solve the specific environmental challenges in a space station, feature extraction should be invariant to change in both illumination levels and image affine transformation. Slater and Healey [11] propose a method based on local colour invariants for target recognition. The method is unaffected by target location and position, but the effectiveness is low under identi‐ cal colour object interference. Wu and Wei [12] present an invariant texture method based on a rotating sample and the Hidden Markov Model (HMM). This method is invariant to rotation and grey transformation and is confined to recognizing similar regions in independent locations. Gevers and Smeulders [13] propose a method 2

Int J Adv Robot Syst, 2014, 11:193 | doi: 10.5772/59861

that combines colour and shape in order to extract features that are invariant to geometry and illumina‐ tion. Although this method is robust and highly effi‐ cient for target recognition, target discernment decreased with change in moment invariant features. Hence, existing research indicates that feature extraction meth‐ ods based on global variables cannot adequately ad‐ dress the problem of target occlusion. Vision-based target localization for space station robot is essential to maintenance robot grasping. Generally, an infrared target recognition and localization system is used for a free-floating space robot [14], [15]. However, this method is sensitive to external disturbance, such as noise and complex background. Meanwhile, pose estimation based on camera has been studied. A closed-form analytical solution to the problem of three-dimension (3D) estimation of circular feature location is presented in reference [16]. This method considers 3D estimation and position estimation when the radius is unknown. But it is confined by the fact that circles must exist in the surface of the target. R. T. Fomena proposes a method which is concerned with the use of a spherical projec‐ tion model for visual servoing from three points [17]. Although this method is robust to points’ range errors, it is not perfect for tool localization because the point itself is difficult to pinpoint. Based on the spherical projection model, another method is proposed by using three independent translation parameters and invariance to rotational motion to estimate pose [18]. Due to limita‐ tions of a single camera, such as limited field-of-view, it is not suitable for space station tool localization. In F. Janabi-Sharifi’s paper [19], an algorithm named Iterative Adaptive Extended Kalman Filter (IAEKF) is proposed by integrating mechanisms for noise adaptation and iterative measurement linearization. This method can estimate position and orientation of an object in real time based on robot vision system. But the non-linearities are usually the main reason for the failure of filtering strategies of the pose, especially during abrupt motions and occlusions. Based on local interest point feature extraction, a method, called Scale Invariant Feature Transform (SIFT), is proposed by D. Lowe [20], [21], and has a wide applica‐ tion in object tracking and localization [22], [23]. The extracted features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortions, 3D viewpoint changes, occlusion addition, and illumination changes. However, the overall number of matching points will greatly reduced and the number of incorrect matching points will increase when the viewpoint angle of the target grows larger. Based on the SIFT model, Ke and Sukthanka present a method that utilizes principal component analysis, which is called Principal Compo‐

nent Analysis and Scale Invariant Feature Transform (PCA-SIFT) [24]. This method reduces the dimensions of the SIFT feature descriptor effectively. Subsequently, Hakim and Farag propose a Colour Scale Invariant Feature Transform (CSIFT) algorithm for colour images [25] and Bay proposes the Speeded Up Robust Features (SURF) model [26]. Although these methods can solve the problem of complex calculations and time consumption to some extent, the number of matching points and the accuracy are both lower than SIFT. In order to solve the problem of fewer matching points under large affine distortion, Yu and Morel present a fully affine invariant algorithm named Affine-SIFT that treats the two left‐ over parameters: the angles defining the camera axis orientation [27], [28]. This algorithm is efficiently implemented by using the difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation. The key point descrip‐ tor is represented relative to the orientation and this enables invariance to image rotation and noise by assigning a consistent orientation to each key point based on local image properties. Moreover, the descriptor becomes invariant to affine changes in illumination by creating orientation histograms and modifying the feature vector. Furthermore, the method handles transitions of 36 tilts and above [28]; it also obtains a large number of matches (compared to other algorithms) to simulate all views by using the angles that define the camera axis orientation. The ASIFT algorithm has now been used in remote sensing image stitching, image registration and other image-related operations [29], [30]. However, this method has rarely been used for target localization. In addition, in order to estimate mismatched points because of the inevitable occurrence of mismatched points under the ASIFT algorithm, the Random Sample Consen‐ sus (RANSAC) [31] algorithm (proposed by Fischler and Bolles) is used in this paper. Then, the affine transformation model is established and the affine transformation param‐ eters are obtained. Finally, the three-dimensional (3D) coordinates of the grasping point and range of the target tool are calculated using the binocular vision system. In summary, in this paper we want to argue that the method we presented is an appropriate approach to achieve a general, robust and versatile vision system for all of the target tools used in the space station cabin. This paper is organized as follows. Section 2 describes a space station cabin robot system, a target recognition and localization system scheme. Section 3 establishes the affine transformation model and obtains the correct matches based on the ASIFT & RANSAC algorithm. Section 4 introduces the definition of grasping key points and grasping scope, as well as the target localization method by using the estimated parameters and a binocular vision system. Section 5 uses the proposed method to conduct a

series of experiments under different illuminations, affine transformations and occlusions, and eventually contrasts the accuracy and processing speed of three methods. Finally, section 6 summarizes the main contributions and also presents areas that merit further research. 2. System description 2.1 Robot system architecture Implementation of a grasping command for a given object is a fundamental task for a space station cabin maintenance robot when assisting (or replacing) astronauts with cabin operations. The key to grasping is the accurate recognition and localization of the target. Common repair tools are chosen as targets for recognition and localization. A target recognition and localization system for a space station cabin maintenance robot is built to test target recognition (Figure 2). The humanoid space robot consisted of a head with a binocular vision system, two arms, a trunk and 16 degrees of freedom (DOFs). A binocular stereovision system is built into the head of the robot. In order to ensure wide coverage of the vision system, the vision platform of the space robot head is designed with two DOFs. A Permanent Magnet Synchro‐ nous Motor (PMSM) and a harmonic reducer are used to drive the DOFs, ensuring that the cameras move accurately and smoothly to the desired targets. The positions of the target tools are calculated using the binocular stereo vision system of the robot.

Control System 6 DOF Arm

Servo Driver 2 DOF Head

Humanoid Space Robot

Camera

1 DOF Hand

6 dimensional force sensor

Figure 2. Humanoid space robot

2.2 Target recognition and localization system scheme Figure 3 shows the target recognition and localization system scheme based on the humanoid space robot. First, the appointed target will be recognized. By using the local interest point feature extraction algorithm to obtain the matches between the reference image and the experimental image, the mismatched points are eliminated. Then, the affine transformation model is established, and the trans‐ formation parameters can be estimated using the stable matching points. Second, we define the key points and grasping scope of the reference image, and calculate the 2D

Lingbo Cheng, Zhihong Jiang, Hui Li and Qiang Huang: A Robust and Efficient Algorithm for Tool Recognition and Localization for Space Station Robot

3

contrast s thods. ions and rch.

en object intenance ith cabin cognition ools are . A target station ognition of a head k and 16

e head of he vision head is gnet ucer are as move s. The ng the t.

Servo Driver

coordinates of the transformed key points based on the transformation parameters. On the basis of the parallax principle of the binocular vision system, the 3D coordinates of the target key points and the grasping range are ob‐ tained. Target identification and localization are completed under these two mechanisms. The whole process is implemented through the image processor DSP and FPGA. is implemented through the image processor DSP and FPGA. FPGA + DSP

Illumination system Target Location

Target Recognition Reference Image

Collected Image

Feature Extraction and Matching

Affine Transformation Model Affine Transformation Parameters Estimation

2-D Coordinates of Key Points in Transformed Image

2-D Coordinates of Key Points in Reference Image

Lamplight system

Left Camera 2-D Coordinates of Key Points in Left Camera

Right Camera 2-D Coordinates of Key Points in Right Camera

z x

3-D Coordinates of Grasping Area

y

Sunlight

Metal operating desk

Figure 3. Flowchart for target recognition and localization Figure 3. Flowchart for target recognition and localization

Robot system

3. Estimation of affine transformation parameters

3. Estimation of affine transformation parameters

In this section, two asymmetrical and symmetrical target tools selected, and the space station cabin illumination In thisare section, two asymmetrical and symmetrical target environmental simulation system is established. Based on tools are selected, and the space station cabin illumination the target and system, the key points of the target and environmental simulation system is established. Based on affine transformation parameters which are used to the target and system, the key points of the target and affine recognize and localize the target are obtained using ASIFT transformation parameters are used to recognize & RANSACwhich algorithm.

and localize the target are obtained using ASIFT & RAN‐ SAC3.1 algorithm. Target and illumination environment simulation The maintenance tools environment used in the space station can 3.1 Target and illumination simulation generally be divided into asymmetrical ones and

(Figure order to prove the can The symmetrical maintenanceones tools used4).inIn the space station presented be algorithm applicable to bothones tools, a common generally divided isinto asymmetrical and symmet‐ wrench has been chosen as the asymmetrical sample target rical ones (Figure 4). In order to prove the presented and a common pair of pliers is used as the symmetrical algorithm is applicable to both tools, a common wrench has sample target. been chosen as the asymmetrical sample target and a common pair of pliers is used as the symmetrical sample target.

6 dimensional force sensor

heme

ization ot. First, the local tain the erimental hen, the nd the ing the y points calculate based on e parallax oordinates 4 ge are on are le process

Figure 5 shows simulated space station cabin lighting conditions. These are simulated using an illumination environmental simulation system that included both a lamplight system and sunlight irradiation. In the space station cabin, because the metal surfaces might cause the strong reflection and refraction of light, to simulate the actual environment, we design a metal operating desk, a metal box and a metal plate around the robot.

Metal surfaces

Figure 5. Illumination simulation system

The illumination conditions are quantitatively analysed by controlling the number of light sources and the sunlight irradiation at noon in the experimental environment. There are three illumination conditions based on the lamplight system, classified as strong, medium and weak light sources. Non-sunlight and sunlight irradiation is control‐ led by curtain openings. Altogether, there are six levels corresponding to the different illumination conditions (Table 1). For a quantitative description of light, the illumination intensity of the captured images in the six illumination levels are measured. Under non-lamplight condition, the indoor intensity of illumination on a sunny day is 256.73 lux measured by the illuminometer whose model is SSTES-1339R, and it decreases to 9.14 lux without the sunlight irradiation. Controlling both sunlight irradiation and lamplight illumination at the same time, the specific illumination intensity of six illumination levels are meas‐ ured and shown in Table I. Sunlight Non-sunlight irradiation

Figure 4. Maintenance tools used in the space station

Figure 5 shows simulated space station cabin lighting conditions. These are simulated using an illumination environmental simulation system that included both a lamplight system and sunlight irradiation. In the space Figure 4. Maintenance tools used in the space station station cabin, because the metal surfaces might cause the strong reflection and refraction of light, to simulate the Int J Adv environment, Robot Syst, 2014, 10.5772/59861 actual we11:193 design| doi: a metal operating desk, a metal box and a metal plate around the robot.

Metal plate

Metal box

Sunlight irradiation

Lamplight

Classification

Illuminance / lux

Weak

1

58.98

Medium

2

123.11

Strong

3

175.26

Weak

4

301.43

Medium

5

355.61

Strong

6

410.87

Table 1. Illumination conditions

t e

s

the 3.2 third illumination level (non-sunlight and strong In 2004, Lowe proposed the SIFT algorithm. The major Image capture lamplight conditions). The reference images will be stored stages used to generate a set of image features include scalein the During library, the matching points to be extracted theenabling target recognition process, the captured images space extreme detection, key point localization, orientation arethe divided parts: reference imageunder and experi‐ during test. into Thetwo experimental images differentassignment and key point descriptors. This enables the mental image (Figure 6). For easy evaluation of thesection accu‐ 5.generation of a 128-dimensional feature vector for each key environmental conditions will be provided in racy of captured the estimated affineare transformation parameters point. The Best-Bin-First algorithm is then used to obtain  600 pixels, All of the images saved as 800 using the present method, each of the transformations is the minimum Euclidean distance for the invariant descrip‐ for the following reasons. On the one hand, the size is small determined by the known matrix Η . As such, the relative tor vector between the reference image and the collected enough to accelerate the image processing speed; on the position between the target tool and the camera should be image. The best candidate match for each key point is then other the hand, it That is not enough numerous same. is, small if the position of to themiss cameras has been vital selected from these images. However, mismatching and a details of all theoftarget. Because binocular visualat system fixed, the target images a should be captured the decrease in the number of matching points occur when the includes cameras, in order to avoid the confusion of sametwo position. target undergoes significant affine distortions. whether the images are captured by left camera or right The ASIFT algorithm fundamentally enables the simula‐ one, from now on, all of the presented images are captured tion of all views to achieve fully affine invariance based on by the left camera.

y

y

z

z

x

the angles defining the camera axis orientation. The projection transformation of the image is simplified to the affine transformation model. The transform matrix has a decomposition shown by equation (1).

x

écosy A =lê ë siny

- siny ù é t 0 ù écos f cosy úû êë0 1 úû êë sin f

- sin f ù cos f úû

(1)

Where λ is the scale factor; ψ is the angle between the camera and the optical axis; and t is the absolute tilt rate of Figure 6. Reference images stored in the library. Image (a) is the the camera, determined by the latitude angle of the optical reference image of wrench. Image (b) is the reference image ofaxis, θ . t = 1 / cosθ , where ϕ is the longitude angle of the pliers optical axis. Figure 7 shows a camera motion interpretation Figure 6. Reference images stored in the library. Image (a) is the reference image of wrench. Image (b) is the reference image of pliers. of equation (1).

(a)

(b)

3.3 Feature extraction and matching

Figure 6 shows the reference images that are captured at the front view when the target tools are vertical and under 3.3.1 ASIFT the third illumination level (non-sunlight and strong lamplight conditions). The reference images will be stored ASIFT an affine invariant feature and in the is library, enabling the matching pointsextraction to be extracted matching method based on SIFT. It is not only fully during the test. The experimental images under different invariant with respect to zoom, and translation, environmental conditions will be rotation provided in section 5.

but also treats the angles defining camera axis orientation. All of the captured images are saved as 800 × 600 pixels, for In 2004, Lowe proposed the SIFT algorithm. The major the following reasons. On the one hand, the size is small stages usedtotoaccelerate generate set ofprocessing image features enough theaimage speed; oninclude the scale-space extreme detection, point localization, other hand, it is not small enoughkey to miss numerous vital orientation and key point descriptors. details ofassignment the target. Because a binocular visual systemThis enables the generation ofina order 128-dimensional featureofvector includes two cameras, to avoid the confusion whether the images are captured by left camera or right for each key point. The Best-Bin-First algorithm is then one, from now on, all of the presented images are captured used to obtain the minimum Euclidean distance for the Figure 7. Affine transformation model by thedescriptor left camera. vector between the reference image invariant A series of affine transformation parameters are obtained and the collected image. The best candidate match for each by sampling the longitude angle, ϕ , and the latitude angle, 3.3 Feature extraction and matching key point is then selected from these images. Howeve r,θ , of the camera optical axis. The obtained transformation mismatching and a decrease in the number of matchingparameters are then used to transform the reference images 3.3.1 ASIFT points occur when the target undergoes significant affineand the target images. This step simulates the original images distorted under any affine transformation. Next, distortions. ASIFT is an affine invariant feature extraction and match‐ ing method based on SIFT. Itfundamentally is not only fully invariant with The ASIFT algorithm enables the the feature vectors of the transformed images are extracted using the SIFT algorithm. Figure 8 shows the algorithm respect to rotation translation, also treats the simulation ofzoom, all views toand achieve fullybut affine invariance angles defining camera axis orientation. flowchart of ASIFT. based on the angles defining the camera axis orientation. The projection transformation of the image is simplified to Lingbo Cheng, Zhihong Jiang, Hui Li and Qiang Huang: the affine transformation model. The transform matrixAlgorithm has for Tool Recognition and Localization for Space Station Robot A Robust and Efficient a decomposition shown by equation (1). cos  sin   t 0 cos   sin   A  (1)   

5

Reference Image in Library I

Longitude and Latitude Angle Sampling

Affine Transformed Image I’

At least three non-collinear matching points are required to calculate the transformational matrix H . The equality relationship is:

Captured Image by Camera C

Longitude and Latitude Angle Sampling

éu1 u2 êv v ê 1 2 êë 1 1

Affine Transformed Image C’

N

Y ASIFT Feature Points Matching

2.

The matching points belonging to reference image I are selected from the remaining elements of set S; equation (3) calculates the transformed results and errors based on matrix H ' ; the agreement set of matrix H ' is established along with the set that includes the points where the error value is less than the threshold, th (in

3.3.2 RANSAC The Random Sample Consensus (RANSAC) algorithm was used to eliminate the mismatched feature points. In 1981, Fischler and Bolles published the RANSAC algorithm at SRI International (SRI). First, an objective function will be designed based on that discussion. The data is assumed to consist of ‘inliers’ (data distribution explained by a set of model parameters) and ‘outliers’ (data that do not fit the model). RANSAC assumes that, given a set of inliers, there exists a procedure that can estimate the parameters of a model that optimally explains or fits these data. This research aims to eliminate mismatched feature points; as such, the transformational matrix is chosen as the objective function. Reference image I (stored in the library) and target image C satisfy the affine transformation relationship, such that: ì u = a1x + a2 y +t1 í îv = a3 x + a4 y +t2

section 5, th will be determined by repeated experi‐

ments); 3.

The number of the agreement set is updated when it is more than the threshold, T , whose initial value is set at 0. The threshold, T , became the new number of the agreement set;

4.

Set K = 1000 is selected and the process above is repeated until the iterations are completed. The agreement set with the most points is then obtained. Based on these points and using equation (4), the transformational matrix, H , is obtained.

3.4 Error analysis The Root-Mean-Square Error (RMSE) is a measure of the feature points that are precisely matched based on ASIFT and RANSAC algorithms.

(2) RMSE =

where I (x, y) and C (u, v) represent a feature point of reference image I and collected image C, respectively. Equation (2) can be written in matrix form as:

0

tx ù é x ù é xù é xù é A Tù ê ú ú ê yú = H t y ú êê y úú = ê y úê ú ê ú 0 1 ë û êë1 úû êë1 úû 1 úû êë1 úû

(3)

where H represents the transformational matrix of I (x, y)→ C (u, v) ; A is the affine transformational matrix, which includes scaling, rotation and shearing; and T is the translation vector. 6

(4)

Three matching points are selected from matching points set S; the transformational matrix, H ', is calculated using equation (4);

Figure 8. Algorithm flowchart of ASIFT

a2 a4

x3 ù y3 úú 1 úû

1.

Sampling Completed?

éu ù é a1 ê v ú = êa ê ú ê 3 êë1 úû êë 0

x2 y2 1

This equation is solvable when there are more than three matching points. Based on the above analysis, four steps are used to implement the RANSAC algorithm:

SIFT Feature Points Extraction and Matching

N

u3 ù é x1 v3 úú = H êê y1 êë 1 1 úû


1 N å[(a1xi + a2 yi + th x - ui )2 + (a3 xi + a4 yi + th y - vi )2 ] N i =1

(5)

The statistical error threshold is set at th ; as such, the

requirements are satisfied as long as the RMSE is less than th . 4. Target localization method

In the space station cabin, microgravity and the general movement of the station result in target transformation and occlusion, which make target localization much more difficult than on Earth. Furthermore, features of key points and grasping range are difficult to extract under a special illumination environment. This paper proposes a method

to calculate two key points of a transformed target based on grasping points’ definition and estimated transforma‐ tion parameters (section 3), and then obtaining the 3D coordinates of the grasping points and three Euler angles based on a binocular vision system. 4.1 Determination of 3D coordinates For convenience of localization and target tool grasping, the 3D coordinates of a grasping point and three Euler angles of the target should be calculated. In order to obtain the six parameters, two key points k1(x1, y1), k2(x2, y2) are selected from the target tool. Two coloured circles are added to the reference images (Figure 9). The two circles are segmented based on the HSV colour space and the contour shape feature is extracted to obtain the centre coordinates. Therefore, the target coordinate system can be defined, which is determined by the two key points k1 ', k2 ', with the original point k2 ' and the direction of X axis (direction of vector K 2'K 1').

y

z

y

z

x

(b)

Figure 9. Key points of the reference image. Image (a) is a wrench, image (b) is pliers.

Based on section 3, the transformed matrix, H , can be estimated when the wrench is transformed from the reference image position to the collected image position. Using the coordinates k1, k2, matrix H , and equation (6), the coordinates of the transformed k1 '(u1, v1), k2 '(u2, v2) are obtained. éui ù é xi ù ê v ú = H ê y ú (i = 1, 2) ê iú ê iú ëê 1 ûú ëê 1 ûú

(7)

Where A1L , A1R and A2L , A2R are the matrixes of internal and external parameters of the left and right cameras, respectively. The centre of the line K 1 'K 2 ' is selected as the localization grasping point O '(X W , Y W , Z W ). 4.2 Euler angles and grasping scope determination The Euler angles of the transformed target are determined by vector K 2'K 1'. For a target tool, both the 3D vector K 2 K 1 of the reference image and the 3D vector K 2'K 1' of the transformed image can be calculated based on the binocu‐ lar vision system. Therefore, the rotation angles between vector K 2 K 1 and K 2'K 1' can be represented by quaternion form. Then, three Euler angles ψ, θ, φ , which are the rotation angles around the Z axis, Y axis and the X axis, respectively, will be obtained by transforming the quatern‐ ion. Considering the size of target tools, we define the accurate grasping scope: a cylinder. The centre of the cylinder is O(X W , Y W , Z W ) and the height is 1.2 times that of line K 2 K 1, and the radius is the same as the radius of the red circles (radius equals 7.5 mm) on the target. This definition means that the grasping motion is accurate as long as the localization results (O '( X W , Y W , Z W ) and vector K 2'K 1') are included in the cylinder. The size of the cylinder is closely related to both the angle of view and the scraping precision of the robot, and it can be redefined by the specific scraping precision.

x

(a)

é XW ù é XW ù éui ù êY ú êY ú ZW êê vi úú = M i ê W ú =A1i A2i ê W ú (i = L, R ) ê ZW ú ê ZW ú êë 1 úû ê ú ê ú ë 1 û ë 1 û

(6)

The left camera coordinate system is selected as the world coordinate system. The 3D coordinates K 1 '(X W , Y W , Z W ), K 2 '(X W , Y W , Z W ) in the world coordinate system are

calculated based on the binocular camera internal and external parameters, M L , M R , and equation (7).

5. Experimental results To test the feasibility and effectiveness of the above algorithm, three groups of experiments based on the simulation of the special illumination environment are conducted. The experiments include estimations of the transformation parameters and target localization under the following conditions: same transformation and differ‐ ent illumination; same illumination and different transfor‐ mation; different illumination, transformation and occlusion. 5.1 Recognition results under different illumination levels In this test, images for each tool are captured under six different illumination levels (Table 1) with the same transformation, H 1 and H 2, respectively (Figure 10). In order to eliminate the influence of uncertain exterior factors, 20 images are captured and saved under each light level. Hence, there are 20 sets of images and each set contains six images. At this section, as only illumination is


7

the influencing factor, we thus set matrix H 1 and H 2 as the

actual transformation matrix for all the captured images of wrench and plier, respectively.

y

z

y

z

x

z

x

1

y

Threshold th must be chosen to be large enough to satisfy two conditions: that the correct model should be found for the data, and that a sufficient number of mutually consis‐ tent of points should be found to satisfyIn theorder needs the final needs the final smoothing procedure. to of obtain to obtainthe the wrong optimum thesmoothing optimum procedure. threshold In t o eliminate t h order th toaseliminate the wrong matching as far threshold matching points far as possible, and at the samepoints time to make sure that and the number of matching not that too the as possible, at the same time topoints make issure small, t h  4 is chosen based on Figure 11 (c) and (d). This

number of matching points is not too small, th = 4 is chosen

x

2 y

matching points. Each figure includes six curves corre‐ sponding to the threshold th which changed from 1 to 10.

is because the number of matching points decreases based on Figure 11 (c) and (d). This is because the number sharply when t is less than 4, and it declines slowly when t matching sharply when is less than 4, is of more than points 4. The decreases result also fits for the t following and it declines slowly when t is more than 4. The result also experiments.

3 y

y

fits for the following experiments. 400

z

x

4

x

6

5 (a) Images of target tool wrench

y

z

y

z

x

z

x

1

4

3 y

z

x

x

2 y

z

y

y

z

x

x

5

th=10,Number based on ASIFT&RANSAC th=8 ,Number based on ASIFT&RANSAC The transformational matrix under six illumination th=6 ,Number based on ASIFT&RANSAC 400 th=4 ,Number based on ASIFT&RANSAC conditions based on matching points after screening can be th=2 ,Number based on ASIFT&RANSAC estimated. Comparing the estimation values and the affine th=1 ,Number based on ASIFT&RANSAC 300 500

transformation model Η1, Η2, the RMSE of each image is

6

(b) Images of target tool plier Figure 10. Target images of wrench and plier under different illuminations when the transform matrix is H1 and H 2 , respectively. Images (a) are transformed images of wrench, and Figure 10. Target images of wrench and plier under different illuminations images (b) are transformed images of plier when the transform matrix is H 1 and H 2, respectively. Images (a) are

200 obtained (Figure 13). Figure 13 shows the values are relatively high under both weak and strong illumination, 100 but all of the RMSE values are less than the threshold th

0 which is 4. These values verify 301.43 that the method 58.98 123.11 175.26 355.61 410.87 of affine Illuminance / lux transformation parameter estimation based on the com‐ (b) Matching point number of plier bined ASIFT & RANSAC algorithms is invariant to illumination. Illumination 1 300

Illumination 2

Illumination 3 of the reference vector K K and the The250 3D coordinates 2 1

corresponds to the illumination level. For convenience of

binocular vision system (Figure 14). In Figure 14, the red 150 solid line is the reference vector K 2 K 1, and the rest of the

combined ASIFT & RANSAC algorithm) between the

reference and transformed images under different Take one set of images as an example and analyse the illuminations. The X axis represents the illuminance for six determination of threshold t . Figures 11 (a) and (b) show illumination levels, and the hY axis shows the number of

the numberpoints. of matching ASIFT and matching Each points figure (based includeson six curves corresponding to the which changed from 1 to combined ASIFT & threshold RANSACt h algorithm) between the 10. Threshold be large enough to t h must be chosen reference and transformed images to under different illumi‐ satisfy two that the correct model should nations. Theconditions: X axis represents the illuminance for be six found for the data, and a sufficient illumination levels, and thethat Y axis shows thenumber numberofof mutually consistent points should be found to satisfy the


Number of matching points

transformed images of wrench, and images (b) arefrom transformed The illumination intensity changes 58.98 images lux toof plier. 410.87 lux. The number at the bottom left of each image

The illumination from on 58.98 lux toof410.87 explanation, we intensity will use changes the number behalf the illumination conditions the following lux. The number at theinbottom left of discussion. each imageFigure corre‐ 10 also shows coordinate system each convenience target tool. sponds to thetheillumination level.of For of Take one set of images as an example and analyse the explanation, we will use the number on behalf of the determination of threshold t h . Figures 11 (a) and (b) show illumination conditions in the following discussion. Figure the number of matching points (based on ASIFT and 10 also shows the coordinate system of each target tool.

8


z

x


z

Number of matching points based on ASIFT This experiment takes the image of the plier under nonth=10,Number based on ASIFT&RANSAC 350 sunlight and (the first image of Figure 10 th=8weak ,Numberlamplight based on ASIFT&RANSAC 300 th=6 ,Number based on the ASIFT&RANSAC (b)) as an example to show matching point image based th=4 ,Number based on ASIFT&RANSAC on 250 the ASIFT algorithm and the combined ASIFT & th=2 ,Number based on ASIFT&RANSAC th=1 ,Number based on ASIFT&RANSAC 200 RANSAC algorithm (Figure 12). Figure 12 (a) shows the 150 matching points based on ASIFT, where the white lines are the100 connecting lines of the wrong matching points and the black lines are the same as in Figure 12 (b), which are the 50 matching points based on ASIFT & RANSAC algorithm. 0 58.98 123.11 show 175.26 that there 301.43 The white lines are 355.61 many 410.87 mismatching Illuminance / lux points under first illumination level, and the blue ones (a) Matching point number of wrench show the matching accuracy of the remaining points after 600 Number of matching points based on ASIFT RANSAC screening is satisfied.

Illumination 4

transformed vectors5 K 2'K 1' can be calculated based on the Illumination 200

Illumination 6

100are the transformed vectors K 'K ', where the blue lines 2 1

solid50line represents the actual transformed vector. Fur‐ thermore, Figure 14 also shows the grasping scope of the 0 1 2The results 4 6 8 all of 10 the transformed ∞ cylinder. show that Threshold th vectors K 'K ' are contained by the cylinder. 1 (c) Threshold 2of wrench Additional experimental processes for all 20 sets of images are repeated, and the localization accuracies are achieved: 94.17 % for wrench and 93.33 % for plier. At the same time, we also get the average frame speed of 26 frames per second. This demonstrates that the 3D localizations for both the symmetric and asymmetric targets are accurate.

150 100 50 0 58.98

123.11 175.26 301.43 355.61 Illuminance / lux (a) Matching point number of wrench Number of matching points based on ASIFT th=10,Number based on ASIFT&RANSAC Number of matching points based on ASIFT th=8 ,Numberbased basedononASIFT&RANSAC ASIFT&RANSAC th=10,Number th=6,Number ,Numberbased based ASIFT&RANSAC th=8 onon ASIFT&RANSAC th=4,Number ,Numberbased basedononASIFT&RANSAC ASIFT&RANSAC th=6 th=2,Number ,Numberbased basedononASIFT&RANSAC ASIFT&RANSAC th=4 th=1,Number ,Numberbased basedononASIFT&RANSAC ASIFT&RANSAC th=2 th=1 ,Number based on ASIFT&RANSAC

400 points of matching Number points of matching Number

600 350

300 500 250 400 200 300 150

410.87

y

200 100

50 100 0 58.98 0 58.98

123.11 175.26 301.43 355.61 410.87 123.11 175.26Illuminance / lux 301.43 355.61 410.87 / lux of wrench (a) MatchingIlluminance point number

(b) Matching point number of plier 600

Number of matching points based on ASIFT Illumination 1 based on ASIFT&RANSAC th=10,Number Illumination 2 based on ASIFT&RANSAC th=8 ,Number Illumination 3 based on ASIFT&RANSAC th=6 ,Number Illumination 4 based on ASIFT&RANSAC th=4 ,Number Illumination 5 based on ASIFT&RANSAC th=2 ,Number Illumination 6 based on ASIFT&RANSAC th=1 ,Number

250 400 200 300

x

150 200 100 100 50

123.11 175.26 301.43 355.61 4 6 8 10 Illuminance / lux Threshold th (b) Matching point number of plier (c) Threshold of wrench Illumination 1 Illumination 2 Illumination 3 Illumination 4 Illumination 5 Illumination 6

Number of matching points Number of matching points

300 300 250 250 200 200

410.87 ∞

obtained (Figure 13). Figure 13 shows the values are 1 relatively high under both weak and strong illumination, but all of the RMSE values are less than the threshold t h

Illumination 1 Illumination 2 Illumination 3 Illumination 4 Illumination 5 Illumination 6

0.5

which is 4. These values verify that the method of affine transformation parameter estimation based on the 0 combined ASIFT & RANSAC algorithms to 1 2 3 4 5 is invariant 6 Illumination conditions illumination.

150 150 100 100 50 50 0 0 1 1

2

2

4

6 8 6 Threshold th 8 Thresholdofth wrench (c) Threshold

4

10 10

∞ ∞

(d) Threshold of plier

Illumination 1 Figure 300 11. Number of matching points based on ASIFT and Illumination 2 combined ASIFT & RANSAC algorithm between the reference and Illumination 3 250 Illumination 4 Number of matching points

x

obtained (Figure 13). Figure 13 shows the values are 1.5 relatively high under both weak and strong illumination, but all of the RMSE values are less than the threshold t h Furthermore, Figure 14 also shows the grasping scope of 1 which values verify that the method of affine (a) is 4. These the cylinder. The results show that (b) all of the transformed transformation parameter estimation based on the K 'K ' vectors are contained by the cylinder. 2 1 0.5 12. Image Figure matching algorithms points of theis plier under first combined ASIFT &ofRANSAC invariant to Additional experimental forpoint all 20image sets based of illumination level. Imageillumination. (a) is processes the matching on images are repeated, and localization accuracies are on ASIFT algorithm. Image (b) the is the matching point image based 0 2 3 & RANSAC 4 5 6 2.5 194.17 combined ASIFT algorithm achieved: % for wrench and 93.33 % for plier. At the Illumination conditions Wrench-H1 same time, we also get the average frame speed of 26 Plier-H2 The transformational matrix under that six illumination Figure 13. of second. RMSE values under six illumination conditions frames per This demonstrates the 3Dfor both 2 Curve conditions based on matching points after screening t wrench and plier (threshold =4) localizations for bothh the symmetric and asymmetriccan be estimated. Comparing targetsthe areestimation accurate. values and the affine 1.5 transformation model Η1、Η2 , the RMSE of each image is RMSE value

0 0 58.98 1 2

This experiment takes the imagey of the plier under & RANSAC ASIFT non-sunlight and weak lamplight (the first ASIFT image of Figure z 10 (b)) as an example to show the matching point image z x x based on the ASIFT algorithm and the combined ASIFT & RANSAC algorithm (Figure 12). Figure 12 (a) shows the matching points based on ASIFT, where the white lines are the(a)connecting lines of the wrong (b) matching points and the black lines are the same as in Figure 12 (b), which are the Figure 12. Image of matching points points of the plier under illumination Figure 12. Image ofbased matching of plierfirst under first matching points on ASIFT & the RANSAC algorithm. level. Image (a) is the matching image based on ASIFT illumination level. Image (a) ispoint the matching point image algorithm. based on The white lines show that there are many mismatching Image (b) is the matching point image based point on combined ASIFTon & ASIFT algorithm. Image (b) is the matching image based points under first illumination level, and the blue ones RANSAC algorithm. combined ASIFT & RANSAC algorithm show the matching accuracy of the remaining points after RANSAC screening is satisfied. The transformational matrix under six illumination 2.5 y conditions based on matching points aftery screening Wrench-H1 can be estimated. Comparing the estimation values Plier-H2 and the& affine ASIFT RANSAC ASIFT 2 transformation model Η1、Η2 , zthe RMSE of each image is z RMSE value

points matching of of Number points matching Number

300 500

matching points based on ASIFT, where the white lines are the connecting lines of the wrong matching points and the black lines are the same as in Figure 12 (b), which are the matching points based on ASIFT & RANSAC algorithm. The white lines show that there are many mismatching points under first illumination level, and the blue ones show the matching accuracy of the remaining points transformed images under different illuminations, wherafter e the RANSAC threshold was valuedis1,satisfied. 2, 4, 6, 8 and 10 t h screening

Figure 13. Curve of RMSE values under six illumination 2.5 conditions for both wrench and plier (threshold t h =4) Wrench-H1 Plier-H2 The 3D2 coordinates of the reference vector K 2K 1 and the

transformed vectors K 2'K1' can be calculated based on the RMSE value

Number of matc

th=2 ,Number based on ASIFT&RANSAC th=1 ,Number based on ASIFT&RANSAC

200

1.5 vision system (Figure 14). In Figure 14, the red binocular K 2K(/mm) solid line is the (a)reference Target toolvector of wrench 1 , and the rest of the 1 the transformed vectors K 2'K 1' , where the blue lines are

solid line represents the actual transformed vector. 0.5

Illumination 5 Illumination 6

200

0

3 4 5 6 Illumination conditions Figure 13. Curve of RMSE values under six illumination conditions for both wrench and plier (threshold t h =4)

150 100

1

2

50 0 1

The 3D coordinates of the reference vector K 2K 1 and the 2

4

6 8 Threshold th (d) Threshold of plier

10

∞

binocular vision system (Figure 14). In Figure 14, the red solid line is the reference vector K 2K 1 , and the rest of the

Figure 11. Number of matching based on and ASIFT and Figure 11. Number of matching points points based on ASIFT combined combined ASIFT &algorithm RANSACbetween algorithm and ASIFT & RANSAC the between reference the andreference transformed images under different illuminations, where the threshold th was valued 1, 2, 4, 6, 8 and 10

transformed vectors K 2'K1' can be calculated based on the

lines are the transformed vectors K 2'K1' , where the blue (b) Target tool of plier (/mm) solid line represents the actual transformed vector. Figure 14. 3D coordinates of target vector and grasping scope Figure 14. Coordinates of targetillumination vector and grasping scope under different under different conditions illumination conditions

5.2 Recognition results under affine transformation

Lingbo Cheng, Jiang, Hui Liare and Qiang Huang: In this section, a series ofZhihong transformations set up to test A Robust and Efficient Algorithm for Tool Recognition and Localization for Space Station Robot the effect of a change in target position. For the same reasons as described in section 5.1, 20 sets of images are captured at the same light level and position. The position of the target, including H1、H2、H3、H4 , is changed under

9

5.2 Recognition results under affine transformation

500

In this section, a series of transformations are set up to test the effect of a change in target position. For the same reasons as described in section 5.1, 20 sets of images are captured at the same light level and position. The position of the target, including H 1, H 2, H 3, H 4, is changed under

Wrench-Number based on ASIFT Wrench-Number based on ASIFT&RANSAC Plier-Number based on ASIFT Plier-Number based on ASIFT&RANSAC


450

the third illumination level (non-sunlight irradiation and strong lamplight).

F limi of sho

400 350

T to c cha ma a wr the ar l di cur

300 250 200 150 100 50 H1

H2 H3 Transformational matrix

H4

(a) Matching point number of target tools 2.5

Wrench-Illumination 3 Plier-Illumination 3

2

RMSE value

(a) Transformed images of wrench

1.5 1 0.5

After screening matching points, the formational matrixesthe areinaccurate shown in Figure 16. The threshold

matrix under different transformation can ttransformational h = 4, based on section 5.1. be estimated, and the RMSE of the estimation values and

the affine transformation model a r ematching obtained (Figure After screening the inaccurate points, 16). the It is obvious that all of the RMSE values are less than transformational matrix under different transformation threshold can be estimated, and the RMSE4.of the estimation values and the affine transformation model are obtained (Figure 16). It is obvious that all of the RMSE values are less than threshold 4.

By repeating the above method to process all 20 sets of images, the corresponding RMSE value of each image can be obtained, and the accuracies for wrench and plier are 97.5 % and 98.75 %. The results verify that the proposed method is invariant to affine transformation. To achieve the range of scale, extensive tests have been carried out, and the results are shown in Figure 17. The results are the average RMSE value of the 20 captured images under a certain scale or viewpoint angle. It can be seen that both the RMSE values of wrench and plier are below the threshold (th = 4) when the scale changes from 0.3

to 2 times the initial reference image. So, we can conclude

H2 H3 Transformational matrix

H4

(b) RMSE of target tool under illumination level 3 (threshold t h =4) Figure 16. (a) is theisnumber of matching points basedpoints on ASIFT and on Figure 16.Image Image (a) the number of matching based ASIFT & RANSAC algorithms under different transformation matrix ASIFT and ASIFT & RANSAC algorithms under different (th = 4). Image (b) is the curve of RMSE values under different transforma‐ transformation matrix ( th  4 ). Image (b) is the curve of RMSE tional matrix for both wrench and plier.

values under different transformational matrix for both wrench and plier

that the proposed method is applicable to changes in scale By repeating from 0.3 to 2. the above method to process all 20 sets of images, the corresponding RMSE value of each image can be obtained, and the accuracies for wrench and plier are 4.5 97.5 % and 98.75 %. The results verify that the proposed 4 method is invariant to affine transformation. To 3.5 achieve the range of scale, extensive tests have been Wrench - scale carried are shown in Figure 17. The 3 out, and the results Plier - scale results are the average RMSE valueth=4 of the 20 captured Threshold 2.5 images under a certain scale or viewpoint angle. It can be 2 seen that both the RMSE values of wrench and plier are 1.5 below the threshold ( th  4 ) when the scale changes from RMSE Value

Figure 15 shows the transformed images of one set of Figure the transformed images one set of wrench15 andshows plier images, respectively, which of have varying degrees of plier rotation, shearing, scaling and translation to wrench and images, respectively, which have varying the change of the targetand position. degrees simulate of rotation, shearing, scaling translation to The number of matching points based on ASIFT and simulate the change of the target position. combined ASIFT & RANSAC algorithms between the The reference number and of matching points based ondifferent ASIFT and transformed images under transformational matrixes are shown in Figure 16. Thethe combined ASIFT & RANSAC algorithms between threshold onunder sectiondifferent 5.1. th  4 , based reference and transformed images trans‐

0 H1

1 2 times the initial reference image. So, we can 0.3 to 0.5 conclude that the proposed method is applicable to changes in scale from 0.3 to 2. 0 0.2 4.5

0.4

0.6

0.8

1 1.2 Scale

1.4

1.6

1.8

2

Figure 417. RMSE values with different scales. Because of the limited viewangle coverage, when scale equals 2 the target is out of the image range, so 3.5 the maximum of scale is 2. Figure also shows that the value of RMSE exceeds Wrench - scale th = 4, when scale equals 0.2. the threshold 3

RMSE Value

(b) Transformed images of plier Figure 15. Transformed images of wrench and plier corresponding Figure Transformed images of wrench plierillumination corresponding to to 15. different transformational matrixesand under different transformational illumination condition Image condition c. Image (a) ismatrixes wrench,under and image (b) is plier. (1), c. (2), (3) (a) is (4) wrench, andtransformed image (b) is plier. (1), under (2), (3) and (4) areHthe transformed and are the images matrix 1、H 2、H 3、H4 images under matrix H 1, H 2, H 3, H 4

Plier - scale Threshold th=4

2.5

The sensitivity of the estimated transformational matrix to 2 viewpoint angle (include both longitude and latitude) 1.5 change is examined in Figure 18. The graph shows the 1 changing of RMSE values between the actual transformed


0 0.2

repe

thr 18 thre

tr

Figu fou fo t Ma

four

tran sho (d) ve

sym

0.5

10

vie

0.4

0.6

0.8

1 1.2 Scale

1.4

1.6

1.8

2

A accu

matrix and the estimated matrix with different viewpoint angles. The solid lines show the RMSE value of target wrench, and the dotted lines represent the RMSE value of the target plier. The blue curves (solid one and dotted one) are the RMSE value under changing longitude when the latitude is zero, and the red ones are the values under different latitudes when the longitude is zero. The green curves show the value of RMSE under different longitude and latitude, but the two angles are the same.

5 Wrench - longitude Wrench - latitude Wrench - longitude & latitude Plier - longitude Plier - latitude Plier - longitude & latitude

4.5 4 RMSE Value

3.5 3 2.5

5.3 Recognition results under complex conditions The third experiment addresses recognition in complex conditions, including different illumination, affine trans‐ formation and occlusion. In this section, we set five different types of target occlusion, and in each occlusion a set of different illuminated images is captured which includes six images. Then, five transformational matrixes, H , are selected to transform the above 30 images captured for each target. Finally, a total of 150 images under varying degrees of occlusion, illumination and transformation are obtained. Figure 20 shows one set of the images for each of the target tools (wrench and plier), corresponding to matrixes H 1 and H 2, respectively. Other tools, such as scissors, awl and a circular coil occlude the target in the collected images.

2 1.5 1 0.5 0

0

30 45 Viewpoint Angle / °

60 65 70 75 80

Figure 18. Curve of changing RMSE value with increasing viewpoint angle. The angle contains three conditions in order to achieve a fully affine distortion.

It can be seen that each stage of RMSE value has risen repeatedly with increasing affine distortion, and we can see especially that some of the final values exceed the threshold when viewpoint angle rises to 80 degree. Figure 18 illustrates that all of the RMSE values are less than the threshold th = 4 when viewpoint angle (both longitude and latitude) changes from 0° to 75°.

(b)

(c)

To verify the accuracy of the 3D coordinates of the trans‐ formed vectors K 2'K 1' of the targets in Figure 15, Figure

19 shows the calculated results corresponding to the four matrixes. Because the transformational matrix model for wrench and plier are the same, the 3D coordinates of the vectors are shown in the same way, Figure 19 (a). Matrix H 1, H 2, H 3, H 4 in Figure 19 (a) corresponds to the four

groups of transformed vectors. As it is difficult to find an appropriate view position to show all of the transformed vectors, we give them the larger vision, which shows the 2D (X axis and Y axis) images (Figure 19 (b), (c), (d) and (e)). From the larger visions, all of the transformed vectors K 2'K 1' are contained by the cylinder. The larger visions show that the 3D localizations for both the symmetrical and asymmetrical targets are accurate under four different transformational matrixes.

According to the localizations of all 20 sets of images, the accuracy for wrench is 96.25 %, and plier is 97.5 %, and the average frame speed was 27 frames per second. The experiment verifies that the proposed method is invariant to affine transformation.

(d)

(e)

Figure 19. Coordinates of target vector and grasping scope under different transformational matrix (/mm)

Based on section 5.1, we chose th = 4 as the threshold. The number of matching points based on ASIFT and combined ASIFT & RANSAC algorithms under complex conditions are shown in Figure 21 (a). The X axis represents the illuminance for six illumination levels. For this test, there occur not only lighting effects but also different degrees of occlusion. Thus, the number of matching points decreased sharply compared with the former two groups. Although the matching points are relatively fewer in quantity, the RMSE value of each transformed image fulfils the require‐ ment compared with threshold th =4 (Figure 21 (b)).


11

z

y

y

2

1 z

z

y

z

x y

z

x

1

3

x

z

y

z

x

2

y

y

x

x


z

x a circular coil occlude scissors, awl and the target in the x collected images.

x

100

80

80

60

40 40 20 20 0 0 58.98 123.11 123.11175.26 175.26 301.43 355.61 355.61 410.87 410.87 58.98 301.43 Illuminance/ lux / lux Illuminance Matchingpoint pointnumber number (a)(a) Matching

3 y

y

3.5 3.5

x

x

(a) Images of target tool wrench

z

x

y

y

4

5

y 6

z

z

(a) Images of targetx tool wrench

y

RMSE value

x

z6

RMSE value

5z

2.5

z

z y

z

4

x

y

z y

z

5

x

y

z

6

2

3

4

5

6

1

=4) Figure 21. Image (a) was the number of matching points based on Figure 21. Image (a) was the number of matching points based on ASIFT ASIFT and ASIFT & RANSAC algorithms under complex and ASIFT RANSAC under complex conditions. Image (b) was Figure 21.&Image (a) algorithms was the number of matching points based on conditions. Image (b) was the curve of RMSE for both wrench and the curve of RMSE for both wrench andalgorithms plier. ASIFT and ASIFT & RANSAC under complex plier conditions. Image (b) was the curve of RMSE for both wrench and plier Figure 22 shows the 3D coordinates of the reference

3

x

1

Illumination conditions 2 3 4 5 6 (b) RMSE of target tool under complex conditions Illumination conditions (threshold t h =4) (b) RMSE of target tool under complex conditions (threshold t h

3x

2

x

1.5

1.5

0.5

z

y

1

x

2

1

y

2z x

2.5

2

0.5

y

Wrench-H1 Wrench-H1 Plier-H2 Plier-H2

3

1

x

y

1 z x

Plier-Number based on ASIFT&RANSAC

60

3

4 z

Wrench-Number based on ASIFT Wrench-Number based ASIFT Wrench-Number based ononASIFT&RANSAC Wrench-Number on ASIFT&RANSAC Plier-Number basedbased on ASIFT Plier-Number based ASIFT Plier-Number based onon ASIFT&RANSAC

100 Number of matching points

z

degrees of occlusion, illumination and transformation are y y y obtained. Figure 20 shows one set of the images for each of the target tools (wrench and plier ), corresponding to matrixes H1 and H 2 , respectively. Other tools, such as

vector K 2K1 and the transformed vectors K 2'K 1' . From

Figure 22 shows the 3D coordinates of the reference the larger vision, all of the transformed vectors K 'K 1' are vector K 2K 1 and the transformed vectors K 2'K1'2 . From

x

contained by the cylinder. It shows that the 3D localizations for both the symmetrical and asymmetrical contained by theunder cylinder. It shows that the 3D targets are accurate complex conditions. localizations for bothand the analysing symmetrical asymmetrical By calculating theand recognition and targets are accurate complex conditions. localization results of all under 150 images of each target tool, we calculate that the accuracy for thethe wrench is 89.33 and %, and By calculating and analysing recognition for the pliers 88.67 %. 150 Moreover, that the localization results of all imageswe of determined each target tool, we average that frame was second. The calculate the speed accuracy for23 theframes wrenchper is 89.33 %, and experiment verified that the we proposed method was for the pliers 88.67 %. Moreover, determined that the invariant to space illumination, affine per transformation average frame speed was 23 frames second. Theand occlusion. Therefore, wetool come (/mm) to the conclusion that (a) Target wrench experiment verified thatcan the proposed method was the maintenance robot can successfully use this method to invariant to space illumination, affine transformation and accurately localize and grasp the transformed target under occlusion. Therefore, we can come to the conclusion that reflection and refraction of light in a cabin and a space the maintenance robot can successfully use this method to microgravity environment.

the larger vision, all of the transformed vectors K 2'K1' are

(b) Images of target tool plier Figure 20. Images to matrix 4 of target tools corresponding 5 6 es H and Figure 20. Images of target tools corresponding to matrixes H 1 and1 H 2 under different illumination and occlusion conditions H2 under different illumination and occlusion (b) Images of target tool plier conditions Figure Images of target tools corresponding matrixes H and th  4 as the to Based on 20. section 5.1, we chose threshold. 1

under different conditions Figure shows thematching 3D illumination coordinates ofocclusion theon reference vector Hnumber The22 of pointsand based ASIFT and ASIFT & RANSAC algorithms transformed vectors From complex the larger K 2combined K 1 and the K 2'K 1'. under 2

Based on section 5.1, we chose th  4 as the threshold.

conditions are shown in Figure 21 (a). The X axis represents contained vision, all of the transformed vectors K 'K 1' areon The number of matching points 2 based ASIFT and

the illuminance for six illumination levels. For this test, combined &that RANSAC under complex by the cylinder. ItASIFT shows the 3D algorithms localizations for both there occur not only lighting effects but also different conditions are in Figure 21targets (a). Theare X axis represents the symmetrical andshown asymmetrical accurate degrees of occlusion. Thus, the number of matching points the illuminance for six illumination levels. For this test, decreased sharply compared with the former two groups. under complex conditions. there occur not only lighting effects but also different accurately localize and grasp the transformed target under Although the matching points are relatively fewer in degrees of occlusion. Thus, the number of matching points reflection and refraction of light in a cabin and a space quantity, the RMSE value of each image fulfils By calculating and analysing the transformed recognition and localiza‐ decreased sharply compared with the former two groups. microgravity environment. t the requirement compared with threshold =4 (Figure 21 h tion results of all 150 images of each target tool, we calculate Although the matching points are relatively fewer in

(b)).of each thefor RMSE transformed fulfils that thequantity, accuracy the value wrench is 89.33 %, andimage for the the requirement compared with threshold th =4 (Figure 21 pliers 88.67 %. Moreover, we determined that the average (b)). frame speed was 23 frames per second. The experiment verified that the proposed method was invariant to space illumination, affine transformation and occlusion. There‐ fore, we can come to the conclusion that the maintenance robot can successfully use this method to accurately localize and grasp the transformed target under reflection and refraction of light in a cabin and a space microgravity environment.

12


(b) Target tool plier (/mm)

Figure 22. 3D coordinates target andscope grasping Figure 22. Coordinates of target of vector andvector grasping underscope complex under complex conditions conditions

5.4 Comparison and analysis of experimental results Based on the above research and experiments, a series of data are calculated. For further comparison with some previous methods, such as ASIFT, SIFT & RANSAC, etc., additional experiments are carried out, and the results are shown in Table 2. From the above results, it is obvious that

The res object rec Affine-S feasible RANSAC and in efficien speed of improved the ASIFT different i

This significa recognition

1) A parame RANSAC matching change in in illumin

2) A new based combining robot, two were defin point an which cou

3) An environ enough ex special co

5.4 Comparison and analysis of experimental results Based on the above research and experiments, a series of data are calculated. For further comparison with some previous methods, such as ASIFT, SIFT & RANSAC, etc., additional experiments are carried out, and the results are shown in Table 2. From the above results, it is obvious that the accuracy and processing speed for asymmetrical tool wrench and symmetrical tool plier have no distinct differences, so, in additional experiments, we select the asymmetrical wrench as the target for both ASIFT algo‐ rithm and SIFT & RANSAC algorithm. The results of these experiments demonstrated that the object recognition and localization method, by combining Affine-SIFT and RANSAC, is sufficiently accurate and feasible. Compared with the ASIFT method, ASIFT & RANSAC has greatly increased efficiency both in accuracy and in speed. Furthermore, ASIFT & RANSAC, as an efficient alternative to SIFT & RANSAC, although the speed of operation declines slightly, shows significantly improved accuracy of localization. All the data show that the ASIFT & RANSAC method can solve the interference of different illumination, affine transformation and occlusion robustly and efficiently. Experimental

Method

Accuracy Rate (%)

condition

1.

An estimation method of affine transformation parameters of robot vision target based on ASIFT and RANSAC algorithms. This method could provide robust matching across a substantial range of affine distortion, change in 3D viewpoints, occlusion addi‐ tions and change in illumination in an illumination environment simulation of a space station.

2.

A new definition for target localization and grasping based on estimated transformation parameters. By combining the target size and the scraping precision of the robot, two grasping points and a cylinder scope of the tool were defined to obtain the 3D coordinates of the grasping point and evaluate the accuracy of positioning results, which could help the robot to accurately localize and grasp tools in a space station.

3.

An experimental system to simulate illumination environment in a space station was established, and enough experiments have been implemented under many special conditions, including different illumina‐ tion, affine transformation and comprehensive factors. By comparing the influence of different thresholds of the RANSAC algorithm on the number of matching points, the optimized threshold th was obtained as 4 in

this paper for enhancing the accuracy and efficiency of calculation. In illumination experiments, six illumina‐ tion levels were defined, and the results showed tool recognition and localization was accurate from 58.98 lux to 410.87 lux. In the affine transformation experi‐ ment, the limitations of scale (0.3~2 times the target) and viewpoint angle (0°~75°) were determined.

Frame Speed (frame·s-1)

Illumination

ASIFT&

(120 images)

RANSAC

Wrench 94.17

26

Plier 93.33

ASIFT

85.83

22

SIFT&

75

28

Wrench 96.25

27

RANSAC Affine

ASIFT&

Transformation

RANSAC

(80 images)

Plier 97.5

ASIFT

86.25

25

SIFT&

71.25

30

Wrench 89.33

23

RANSAC Complex

ASIFT&

Conditions

RANSAC

(150 images)

Plier 88.67

ASIFT

76.67

20

SIFT&

64

25

RANSAC Table 2. Comparison of target recognition results and processing time with different methods

6. Conclusion This paper proposed a novel method that could signifi‐ cantly increase the accuracy rate for robot target recogni‐ tion and localization in a space station. This method made the following contributions:

4.

A set of comparative experimental results was pre‐ sented based on previous methods (ASIFT and SIFT & RANSAC) and our proposed method (ASIFT & RANSAC). The data verified that this method is robust and efficient.

The robotic visual system should be further studied in order to understand its viability for use in an actual space station environment, in which recognition would occur. A viable system would also need to demonstrate effective‐ ness in coping with uncertainties that were not considered in this paper, such as the recognition and localization of moving targets. The algorithm used in this study only focused on robotic visual recognition and localization, because the aim was to propose a method that was invariant to image affine distortion and illumination change. Future studies should focus on the grasp mode of the space robot end executor based on visual positioning, including path planning, system optimization, and enhanced human and robot safety. 7. Acknowledgements The authors wish to express their gratitude to the National High Technology Research Programme of China (Grant


13

2011AA040202), the Beijing Science Foundation (Grant 4122065) and the National Natural Science Foundation of China (Grant 60925014 and 61273348) and the “111 Project” (Grant B08043) for their support of this work. 8. References [1] De-Han Wang, Wei-Fen Huang (1997) Astronaut task analysis. Aerospace China 7: 37-40. [2] Zheng Wang, Li Wang, Run-Tao Xu (2006) Study on operational reliability of astronaut in spacecraft cabin. Space Medicine & Medical Engineering 19, No. 5, 358-362.

[16] Safaee-Rad R., Tchoukanov I., Smith K. (1992) Three- dimensional location estimation of circular features for machine vision. IEEE Transaction on Robotics and Automation 8, No. 5, 624-640. [17] Fomena R. T., Tahri O. (2011) Distance-based and orientation-based visual servoing from three points. IEEE Transaction on Robotics 27, No. 2, 256- 267.

[3] Zhi-Hao Pang (2012) China’s manned space program. Satellite Application 5: 10-14.

[18] Tahri O., Araújo H., Mezouar Y., Chaumette F. (2014) Efficient iterative pose estimation using an invariant to rotations. IEEE Transaction on Cyber‐ netics 2, No. 44, 199-207.

[4] Jin-Dun Chen (1999) Safety analysis for astronaut and the personal protective equipment. Space Medicine & Medical Engineering 12, No. 6, 418-422.

[19] Janabi-Sharifi F., Marey M. (2010) A Kalman-filterbased method for pose estimation in visual servo‐ ing. IEEE Transaction on Robotics 26, No. 5, 939-947.

[5] Qian-Xiang Zhou (2001) The research of human reliability in manned spaceflight. Aerospace Shanghai 4, 26-29.

[20] Lowe D. G. (1999) Object recognition from local scale invariant features. Computer Vision (ICCV), International Conference on, Corfu, Greece, 1150-1157.

[6] Bluethmann W (2003) Robonaut: A robot designed to work with humans in space. Autonomous Robots 14, issue 2-3, 179-197. [7] Diftler M. A. (2011) Robonaut 2 - The first humanoid robot in space. Robotics and Automation (ICRA), IEEE International Conference on, Shanghai, China, 2178- 2183. [8] Bridgwater L. B. (2012) The Robonaut2 handDesigned to do work with tools. Robotics and Automation (ICRA), IEEE International Conference on, 3425-3430. [9] http://www.spacechina.com/n25/n148/n272/n4793/ c104843/content.html. Accessed on 01 July 2011. [10] Qian-Xiang Zhou (2006) Lighting design ergonom‐ ics research in manned space mission. Chinese Journal of Illuminating Engineering 17, No. 2, 1-5. [11] Slater D., Healey G. (1996) The illuminationinvariant recognition of 3D objects using local color invariants. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, No. 2, 206-210. [12] Wu W. R., Wei S. C. (1996) Rotation and gray-scale transformation- invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model. IEEE Transaction on Image Processing 5, No. 10, 1423-1434. [13] Gevers T., Smeulders A. W. M. (1998) Image indexing using composite color and shape invariant features. Computer Vision (ICCV), International Conference on, Bombay, India, 576–581. [14] Cheng Li, Bin Liang (2006) Autonomous trajectory planning of free-floating robot for capturing space target. Intelligent Robots and Systems, IEEE/RSJ International Conference on, 1008-1013. [15] Aghili F. (2012) A prediction and motion-planning scheme for visually guided robotic capturing of 14

free-floating tumbling objects with uncertain dynamics. IEEE Transaction on Robotics 28, No. 3, 634-649.


[21] Lowe D. G. (2004) Distinctive image features from scale-invariant. International Journal of Computer Vision 60, No. 2, 91–110. [22] Georgios K., Petros D. (2010) Viewpoint independ‐ ent object recognition in cluttered scenes exploiting ray-triangle intersection and SIFT algorithms. Pattern Recognition, No. 43, 3833-3845. [23] Lang H. (2010) Vision based object identification and tracking for mobile robot visual servo control. Control and Automation (ICCA), IEEE Internation‐ al Conference on, Xiamen, China, 92-96. [24] Yan K., Sukthankar R. (2004) PCA-SIFT: A more distinctive representation for local image descrip‐ tors. Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Conference on, IEEE Computer Society Press, 506-513. [25] Abdel-Hakim A., Farag A. (2006) CSIFT: A SIFT descriptor with color invariant characteristics. Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Conference on, 1978-1983. [26] Bay H., Ess A., Tuytelaars T. (2008) Speeded-up robust features. Computer Vision and Image Understanding 3, No. 110, 346-359. [27] Yu G., Morel J. M. (2009) ASIFT: A new frame work for fully affine invariant image comparison. SIAM Journal on Imaging Sciences 1, No. 65, 43-72. [28] Yu G., Morel J. M. (2009) A Fully Affine Invariant Image Comparison Method. Acoustics, Speech, and Signal Processing (ICASSP), IEEE International Conference on, Taipei, 1597-1600. [29] Ishii J. (2012) Wide-baseline stereo matching using ASIFT and POC. Image Processing (ICIP), IEEE International Conference on, 2977-2980.

[30] Bin C., Dalin J. (2012) Panoramic image stitching using ASIFT. Multimedia Information Networking and Security (MINES), International Conference on, 216- 219.

[31] Fischler (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 6, No. 24, 381–395.


15

A Robust and Efficient Algorithm for Tool Recognition and Localization ...

A Robust and Efficient Algorithm for Tool Recognition and Localization ...

Suggest Documents

A Design Flow for Robust License Plate Localization and Recognition

A Probabilistic Algorithm for Efficient and Robust Data ... - CiteSeerX

A robust and efficient algorithm for planar competitive location problems

A robust and efficient algorithm for planar competitive location problems

A Robust and Efficient Detection Algorithm for The Photon-Counting ...

FANSe2: A Robust and Cost-Efficient Alignment Tool for ... - CiteSeerX

Efficient and Robust Optical Character Recognition ...

Efficient and Robust Optical Character Recognition ...

An Efficient and Robust Algorithm for Shape Indexing and ... - UMIACS

The vSLAM Algorithm for Robust Localization and Mapping - CiteSeerX

A Robust Dual-Microphone Speech Source Localization Algorithm for ...

An Efficient and Robust Algorithm for the Calculation of

An Efficient and Robust Sequential Algorithm for ... - Conrad Sanderson

Algorithm for Efficient Attendance Management: Face Recognition

A Robust and Energy-Efficient Weighted Clustering Algorithm ... - MDPI

A novel efficient algorithm for mobile robot localization

A Novel Indoor Localization Algorithm for Efficient Mobility

An Efficient and Robust Moving Shadow Removal Algorithm and Its ...

A Two Level Face Recognition Algorithm For Efficient ...

A Robust and Efficient Production and Purification

A Robust and Efficient Production and Purification

An Efficient Localization Algorithm Focusing on Stop-and-Go Behavior ...

Efficient and Robust 3D Object Recognition - CAMP@TUM

Robust Floor Determination Algorithm for Indoor Wireless Localization