Visual Servoing in ISAC, a Decentralized Robot System for Feeding the Disabled M. Bishay, M. E. Cambron, K. Negishi, R. A. Peters IIy, and Kazuhiko Kawamura Dept. of Electrical and Computer Engineering, Vanderbilt University Nashville, TN 37235 Proceedings of the 1995 IEEE International Symposium on Computer Vision pp. 335-340, Coral Gables, FL, 21-23 November 1995
BB
On leave y Contact
from Bridgestone Corporation, Tokyo, Japan. at Box 6091-B, Vanderbilt University, or
[email protected], or (615) 322-7924.
BB
BB
Fuzzy Command Interpreter
Macro Action Builder
BB
Audio Output
Task Learning System
BB
BB
User Interface
3-D Face Tracking
Arm
Emergency stop
Voice System
BB
BB
Motion Execution
Arm
Object Recognition
Arm
Reflex action
Arm Server
Arm
BB
Visual Servo Controller
Remote Console
1 Introduction The visual servoing algorithm described in this paper was devised for the Intelligent Soft Arm Control (ISAC) robot [1]. ISAC ( gure 1), which has been developed for feeding the physically disabled, uses a pneumatically-actuated manipulator called the \Soft Arm" manufactured by Bridgestone [2]. ISAC's task requires the location of objects such as a spoon, a fork, and a bowl on a table. These objects are recognized from imagery captured by a monochrome CCD video camera mounted on a pole along the vertical axis of the arm [3]. The purpose of visual servoing is to lead the gripper to an object so the gripper may grasp it. The robot arm moves in a plane parallel to the table until it is directly over the grasp point of the object, at which time it drops to the table to grasp it. ISAC is a decentralized system. Communications between computational modules takes place through a \blackboard" architecture [4]. Part of this paper addresses the problems introduced into visual servoing by the decentralization.
BB
Software
ISAC is a distributed robotic feeding system for the physically disabled. This paper describes a visual servoing system which enables ISAC's robot arm to grasp an object. The servo control loop is de ned. The image to world coordinate transform is deduced. A threephase visual servoing algorithm which balances speed with complexity is described. A technique for compensating for network delays is presented. The results of 200 trials are analyzed. A success rate of 91% was achieved.
Blackboard
BB Mobile Robot Server
Hardware
Abstract
CCD CCD
Parallel Controller HERO 2000 Soft Arm
Fuzzy Processor
Hardware and Software Configuration of ISAC
Figure 1: The ISAC feeding system for the disabled.
2 The Image/Robot Jacobian Visual servoing concerns three principle realms, those of the camera, the robot arm, and the workspace. Each realm has an associated frame of reference with a coordinate system. Positions in the workspace are speci ed with respect to W = (x; y; z ), the world coordinate system. ISAC's table is parallel to the xy-plane. W is oset from the arm base-joint frame by a vector (Xos ; Yos ; Zos ). b is the base joint angle. The position of ISAC's gripper relative to W , is (xg ; yg ; zg ). The gripper also has degrees of freedom in yaw, pitch, and roll. The angles y , p , and r refer to the gripper's yaw, pitch, and roll angles, respectively, relative to W . The handle of
c Proc. 1995 IEEE Int'l. SYmposium on Computer Vision, pp. 335-340, Nov. 1995 the object to be grasped, which lies on the table, has a world coordinate orientation, s , around z , measured with respect to the x-axis. The position of the object in world coordinates is speci ed by the target point (xs ; ys ; zs ). Thus, both zg and zs are constants. Positions with respect to the base-joint of the arm are de ned by another coordinate system, WB = (xB ; yB ; zB ). The origin of WB is oset from the basejoint by a vector, (Tx; Ty ; Tz ), so that its zB -axis intersects the pole-mounted camera's focal point. The xB yB -plane in WB is parallel to the xy-plane in W . A sequence of images, I (n), is acquired overtime by the camera. Camera/image-plane coordinates for the nth image are speci ed by (u; v; n). The camera de nes another world coordinate system WC = (xC ; yC ; zC ). The origin of WC is the focal point of the camera. The zC -axis coincides with the optical axis of the camera, and the xC - and yC -axes are parallel to the u- and v-axes in the image plane. Coordinates u and v are related to xC and yC through a perspective projection,u = zfC xC , v = zfC yC . The camera is positioned so that its xC -axis is approximately parallel to the xB -axis. Thus, the yB zB plane is (ideally) coincident with the yC zC -plane of WC . In image-plane coordinates, the positions of the gripper and the object are (ug ; vg ; n) and (us ; vs ; n), respectively. The orientation of the gripper with respect to the image coordinate system is gi measured with respect to the x-axis. Similarly si is the orientation of the object handle in the image. YC
C
z
Optical axis
y
(1) (2)
xB yB zB
4
3
2
5
=J4
where J=
" M ?z B
f sin 0
0
0
f (M ?zB ) f sin ?v cos )2 0
3
u v 5 ; (uv)
(3)
# zB cos(? ) u sin f zB cot + f sin2 ?vcos sin ?zB
(4)
x y , θy C
Computation node 2 [x y z ] g g g ,θy
x y ,θ C y
Z-n
Arm controller
Robot arm
Ethernet link
us vs
+
Visual tracking -m Z
P(f)
[xs ys zs ],θs
Camera
B
z
B
ZB
Figure 3: Servo control loop for robot end-eector for grasping operation
α
β Q
;
C
T
Lower plane
2
ug vg
B
M
α
f sin2 ? cos sin
Hence the transform between the image coordinates and the base-joint coordinate frame is
Servo controller Kp
M
P
v
z cos( ? ) u xos = B : sin f
Camera frame, WC
P=(x , y , z ) C C C =(x , y , z ) B B B
Upper plane
yos = zB cot +
α
v
y
If the object is in the periphery of the image sequence (hence relatively far from the optical axis) the gripper is not directly above the object when it appears so in the imagery. For example, in gure 2, if P is the gripper position, and Q is the object position, then gripper and object are aligned in the image. However, if the gripper is lowered (along the zB -axis) to the table, it will miss the object. The alignment error can be thought of as a parallax error, since a different view would show dierent relative positions of the object and gripper. This error can be ameliorated by adding a position-dependent oset given by
f
Image plane z
M ?zB M ?zB f sin ?v cos u. Therefore, XB = f sin ?v cos u.
Computation node 1 [ ∆ u ∆ v]T
ZC
2
R
S
YB
Base-joint O frame, WB
Figure 2: Geometry for determining the Jacobian. Figure 2 shows the relation between WB and WC along the yB zB -plane. Consider point P = (xC ; yC ; zC ) = (xB ; yB ; zB ) in space, using trigonom+f cos , etry we can derive that yB = (M ? zB ) vf sin sin ?v cos B) hence, yB = (f sinf (M??vzcos ) v . Also xB = xC = 2
3 The servo control loop The feedback control loop shown in gure 3 is a position control loop for ISAC's robot arm. The vision system has a relatively large sampling rate with respect to the arm controller due to the complexity of the vision algorithms [5]. The arm and its controller, which is a nonlinear feedback controller, are described
c Proc. 1995 IEEE Int'l. SYmposium on Computer Vision, pp. 335-340, Nov. 1995
(6)
and the image orientation error is
eI = si ? gi :
(7)
Note that the object position, (us ; vs ; n), in the imagery is not constant because the camera rotates with the base-joint of the robot arm. The servo controller transforms the image-based error signal, (eIsg , eI ) into a set-point, (xg ; yg ; zg ), for the robot arm controller. The transformation is ane (linear plus an osets) and is given by [x y z]T = R(b )J (u; v)[u v (uv)]T : (8)
J (u; v), given by equation 4, is a 3 3 matrix whose upper left contains a 2 2 diagonal Jacobian matrix and whose third column is an oset vector. Function (0) = 1 and ( ) = 0 for 6= 0 so that the oset is computed only when eIsg = 0. R(b ) is rotation by b , the robot arm base-joint angle. The set-point is given by [xg yg ]T = [xg;curr yg;curr ]T + k[x y]T ;
(9)
where (xg;curr ; yg;curr ) is the current gripper position in world coordinates and k is constant, 0 < k < 1.
Origin
(x ,y ) 0 0
X_offset
γ
b
Y_offset
[u v]T = [(us ? ug ) (vs ? vg )]T ;
where
α
θ
(5)
The visual servoing algorithm has four phases: an initialization phase and three others dependent on the distance of the gripper from the object. Initialization: The purposes of initialization are to determine the trajectory that the wrist joint will follow in the image and to estimate the relative initial locations of the gripper and the object handle. Because ISAC's wrist joint is xed with respect to the base-joint, and because the camera rotates on the basejoint along with the arm, the wrist joint necessarily follows a linear trajectory in the image but the table and its contents rotate across the eld of view. ISAC's gripper's initial position is prede ned in the image by the physical con guration of the gripper relative to the camera. The image location, (us ; vs ), of the grasp point on the handle of the object is determined by a binary object recognition algorithm which is an arti cial neural network trained with a circular histogram of the object [3]. A simple matched lter can be used to nd the image location, (u; v; n)g , of the gripper in a number of positions. Simple linear regression yields the slope, m, and intercept, C of the line in the image along which the wrist will move. Phase 1 tracking algorithm: The algorithm uses a 2 1D tracking procedure (see below) to nd the object and uses a 1D search along the arm's image trajectory to nd the gripper. The 1D gripper tracking algorithm requires that in the image the central axis of the gripper is aligned with the forearm. (By \central axis" we mean the line parallel to the gripper prongs which bisects the space between the prongs and intersects the wrist joint.) The yaw angle y which will cause that to happen is equal to the base-joint angle, b , and must be supplied to the arm controller as part of the move command arguments so that the gripper remains in the proper orientation throughout the move. The base-joint angle, b , can be computed in terms of the world coordinate gripper position, (xg ; yg ; zg ), as follows: (see gure 4): *
p
eIsg = (u)2 + (v)2 ;
4 The Visual Servoing Algorithm
θ
in [6, 7]. The nal control output is the desired Cartesian pose for the robot arm determined through camera observations. A signi cant aspect of the design of this servo control loop is that the arm controller and the visual servo controller run on dierent computers linked through an ethernet, which introduces a random time delay. The visual servoing problem is to design a regulator that makes the world coordinate errors eW sg = (xs ; ys ; zs ) ? (xg ; yg ; zg ) and eW = ? both aps y proach zero through observation of the of the gripper and the object in the imagery. Thus, within the servo control loop, the problem is considered to be 2dimensional. The world coordinate errors are zeroed by moving the the gripper in such a way that the image coordinate errors are zeroed. The visual tracking module tracks the positions of the gripper, (ug ; vg ; n), and the object, (us ; vs ; n), in the image plane. At a speci c point, n, in time, the image position error is
3
b
(xg ,yg )
Figure 4: Yaw angle adjustment for the end-eector
c Proc. 1995 IEEE Int'l. SYmposium on Computer Vision, pp. 335-340, Nov. 1995 = tan?1 yg ? Yos ;
(10) xg + Xos Yos
= cos?1 p (11) (xg + Xos )2 + (yg ? Yos )2 b = 90 + ? (12) where (Xos ; Yos ) are ISAC's xed osets from the base-
joint to the world coordinate origin. The phase 1 algorithm searches the 1D arm trajectory line in the image for the gripper edge. A segment of the line extending from the previous position of the gripper is searched from the end opposite the base of the arm toward the arm until the luminance edge is found. The position of the edge plus an oset is taken as the image position of the gripper, (ug ; vg ).
Figure 5: Typical image con guration in phases 1,2, and 3 of the tracking algorithm The apparent motion of the object in the images is due to the motion of the camera as it rotates with the arm base-joint. To nd the object handle in the current image, a neighborhood of the object's previous location is searched. We use a fast, 2 1D algorithm on a binary image to nd the handle. That algorithm searches, within speci c limits, along a horizontal line for the edges of the object handle. It starts at the midpoint of the handle edges and searches vertically both directions for the handle edges. It takes the average of the four points as the object location. The positional dierence between the object and the gripper is encoded as (u; v). It is mapped through equations (8) and (9) to a new position set point (xg ; yg ; zg ). Phase 2 tracking algorithm: The purpose of this phase is to measure the spoon orientation and orient the gripper so that its central axis is parallel to the major axis of the object handle (within about 10 ). It starts when the gripper is within 60 pixels of the object. When the command is sent to the arm controller to change the gripper yaw angle, y , the entire arm moves
4
in compliance. This causes the image, hence the image locations of the gripper and object to change. Since the gripper is no longer in line with the arm, it will not lie on the 1D wrist image trajectory and cannot be found with the 1D algorithm. Thus, Phase 2 uses the 2 1D tracking algorithm to nd both the object and the gripper. It performs this while the gripper orientation is changing. The center-point, (uC ; vC ), between the prongs of gripper is tracked by nding the left and right gripper tips (ul ; vl ) and (ur ; vr ) using 2 1D tracking algorithm. Then the new center coordinates of the gripper's grasp region are
uC = ul + R cos(); vC = vl + R sin() where R is the distance from the gripper center-point to the left prong tip, and = gi + C , gi =
tan?1 ( uvrr ??uvll ). At the end of phase 2, the positions with respect to image coordinates of the object handle and the central point in the gripper's grasp region are known, and the orientation of the gripper matches that of the object handle. The nal tracking positions for the gripper and the object available during phase 2 are used as the center of image models of the gripper and of the object. Phase 3 tracking algorithm: As the gripper approaches the object to within 60 pixels, the image of the gripper can begin to occlude the image of the object. (See gure 5.) Therefore, phase 3 uses model matching, which is robust under partial occlusion, to track the gripper and the object handle. Since the gripper is close to the object at this stage, the model matching area is small and the system performance is not degraded. This distance, at iteration n, (x; y; n) is de ned as:
(u; v; n) p;q 2Model jIscene (u + p; v + q; n) ? Imodel (p; q; n ? 1)j: (13) (
)
The best match point (u; v) at time n is given by the minimum of (u; v; n). This technique is used to nd both (ug ; vg ) and (us ; vs ) which are then used to compute (u; v). When (u; v) = (0; 0) the parallax osets are computed and the gripper is moved to the table where it closes on the object handle (if successful).
5 Position Set-point Updating with Information Delays Network delays adversely aect set-point updating. The robot arm controller can be queried for current
c Proc. 1995 IEEE Int'l. SYmposium on Computer Vision, pp. 335-340, Nov. 1995 arm position. But, this takes 1209 msec on average from sending a wherearm request to receiving the position. Tracking is inhibited during the the request and response, and the algorithm loses the gripper and target locations. We have solved the problem as follows: The visual servo controller sends the next set-point to the arm controller when the error in the image space reaches a tolerance, d = (1 ? k)d(0) + d(0), which is determined by the value of . In the timing diagram in gure 6, the upper trace shows the timing of the servo controller. The lower shows that of the arm controller. The servo controller visual tracking module tracks the object in each image, I (n), of the input sequence. But the servo controller sends position set-points only if the image distance error is less than the distance error d. That is the case when the arm is in the update range U (see gure 6). Initially the arm is at home position and the updateflag is set to 1. The servo controller computes a position set-point (x1 ; y1 ) and sends it to the arm controller. The arm controller receives the set-point after a delay period caused by the network delay ND. After a controller delay C 1 (caused by the computation of the motion trajectory from the current position to the set-point position), the arm starts moving toward (x1 ; y1 ). While moving, the arm is being tracked by the visual tracking module which computes the error distance, eIsg , in each image, I (n). So long as the error is larger than d. the update ag is set to 0. That causes the servo controller to not compute a new set-point for the arm controller. When the error distance becomes less than d, (which is the case in frame j depicted in the timing digram, gure 6) the visual servo controller computes the next position setpoint and sends it to the arm controller. At this time, the arm might still be moving toward the previous setpoint, (x1 ; y1 ) or could already be at that point. So if we adjust the size of the update region such that the arm receives the new set-point precisely when it reaches the previous set-point then we compensate for ND and C 1, and the arm motion is not interrupted. Since the motion is continuous, the elapsed time from start to grip is less than in the interrupt driven approach. Continuous motion can, therefore, be achieved by choosing to satisfy
ts + tc + tsm + ND + C 1 + v < ((1 ? kv ) + )d ; (14) r
r
where ts is the time necessary to grab an image, tc is the time to compute the error of eqn. (refeqn:deltau), tsm is the time to compute the next set-point, is the
5
s = snap c = compute sm = send move uf = updateflag
Visual tracking
jth frame
frame 1 Computation s c sm node 1
s
uf=1 q=1
uf=0
uf=0
uf=0
y1
+kJ δ(0) Oy
x2
x1 +kJ δ(j-1)
=
Ox =
sm
uf=1 q=2 move to set point 2
move to set point 1 x1
c
uf=0
1
y2
y1
1 1
1
robot arm in error range U
Arm motion
C1
Computation node 2
ND
C1 ND
motion start
motion start
move arrived move arrived
end effector in motion to x1 y1
Time ND = Ethernet Network Delay C1 = controller delay
error value at the jth frame error = 0 Target
error = [(1-k) δ(0)+ ε δ(0)] ∆ Update range U
(pixels) error = d 0 at arm home position
error at set point 1
Figure 6: Timing diagram for the anticipation approach image distance already traversed by the arm inside the update range, U , when I (j ) is acquired, and vr is the image velocity of the gripper. Figure 7 is a ow chart of this procedure.
6 Experimental Results Two sets of 100 trials were run in which an object (a spoon) was placed at random locations and orientations within the workspace of ISAC's robot arm. A trial was considered successful if ISAC was able to both nd and grasp the object. The rst set of 100 trials ( gure 8 (left)) were run without using the parallax compensating osets described earlier. 81% of the trials were successful. In 8% of the trials the tracking failed. The remaining 11% of the failures were grasping errors (mostly due to parallax). The data is plotted as radius { the distance on the table from the point of intersection of camera's optical axis to the object grasp point { vs. delta orientation { the angle between the axis of the robot's forearm and the principle axis of the object handle. Successes are 's and failures are +'s. The gure illustrates that most of the failure's occurred when delta orientation was larger the 20 degrees. A delta orientation in the negative direction has fewer failures than those in the positive direction because a
c Proc. 1995 IEEE Int'l. SYmposium on Computer Vision, pp. 335-340, Nov. 1995 n is the image frame number q is the number of updates of the set point Delta Orientation
updateflag = 1 Get from tracking
∆u(n), ∆v(n)
∆ u(n)
= ∆ yq
x q+1 yq+1
=
R( θb) J ∆ v(n)
xq
+k
Compute Yaw,move to [xq+1 yq+1, yaw]
Get (∆u(n), ∆v(n))
n=n+1
2
n=n+1
2
d(n)=sqrt(∆ u(n), ∆v(n))
updateflag = 1
YES
80
60 40 20
d(n)