Towards Learning Robotic Reaching and Pointing: An ... - CiteSeerX

1 downloads 9423 Views 303KB Size Report
ing reflexes; (2) Skill learning and skill adaptation from a repository of intrinsic ..... An illustration of this method is given in Figure 1 where a hyperplane is fitted to ...
2009 Canadian Conference on Computer and Robot Vision

Towards Learning Robotic Reaching and Pointing: An Uncalibrated Visual Servoing Approach Azad Shademan

Amir-massoud Farahmand

Martin J¨agersand

Department of Computing Science ATH 2-21, University of Alberta Edmonton, AB, T6G 2E8 {azad,amir,jag}@cs.ualberta.ca Abstract

to acquire more advanced skills. We contribute by presenting a formal hierarchy when the model of the camera/robot is not available (uncalibrated settings). Specifically, we formalize two types of skills: skills that can be decided in the visual space (2D-decidable) and skills that can be decided in the Cartesian world coordinates (3D-decidable). These skills can be composed sequentially to achieve more advanced skills. Our point of view in this work is inspired by biological motor-learning, however, we make no claims that our numerical methods relate to animal behavior. Learning primitive hand-eye coordination skills for a robot resembles skill acquisition in animals [30]. Animals have the ability to adapt to changing circumstances and augment their learned motor repository [30, 27]. Motor learning in animals takes different forms and is both intrinsic and extrinsic. Shadmehr and Wise [30] suggest three forms of motor learning for animals: (1) Genome-encoded motor programs including reflexes; (2) Skill learning and skill adaptation from a repository of intrinsic motor programs; and (3) Learning the type and timing of movements. The ability to compose primitive skills into more sophisticated skills (such as manipulating a new object) is therefore extrinsic for animals. Nevertheless, most current robots cannot build upon their available skills and there is a significant amount of hard-coded task specifications which limits robot’s ability to learn on its own. We study how primitive robotic skills can be augmented and used to perform more advanced skills for an autonomous robot.

It is desirable for a robot to be able to operate in unstructured environments. In this paper, we demonstrate how a robot can learn primitive skills and we show how to augment them. We formalize 2D-decidable (pointing) and 3D-decidable (reaching) skills within an uncalibrated visual servoing framework. Skill decidability is defined in conjunction with an image-based controller, which has local asymptotic stability. In addition, we propose sequential composition of primitive skills to combine pointing and reaching skills in order to increase the accuracy of reaching skill. We use simple primitive tasks such as multi-point alignment and point-to-line alignment. We validate our results with real uncalibrated eye-in-hand experiments with a 4-DOF WAM from Barrett Technology Inc., alongside computer simulations.

1 Introduction Simple skill development for an uncalibrated hand-eye robot is among the first steps towards autonomy. A handeye robot is desired to have hand-eye coordination and perform simple pointing and reaching skills. In addition, it is desired that the robot be able to compose its skills in order to perform more advanced tasks. For a robot to be able to compose some of its primitive skills, there is a need for a formal hierarchy of skills (human-defined) and a method to augment the acquired primitive skills or compose them into new skills. We propose numerical methods to learn simple primitive skills (such as point-to-point or point-toline alignments) over a subspace of the visual-motor space and show how these primitive skills can be augmented for a larger subspace. We also propose a framework within which the acquired primitive visual-motor skills can be composed 978-0-7695-3651-4/09 $25.00 © 2009 IEEE DOI 10.1109/CRV.2009.47

2 Visual-Motor Skills 2.1

Problem Setup

Consider a serial link manipulator with N joints and a camera rigidly attached to the last link (eye-in-hand config229

uration). Let F : RN → RM be the mapping from configuration q ∈ RN of the robot to the visual feature vector s ∈ RM with M visual features. The visual-motor equation of such robotic hand/eye system is denoted by s = F (q). The time derivative of the visual-motor equation gives a visual-motor Jacobian matrix: ∂s ∂t s˙

∂F (q) ∂q , ∂q ∂t = Ju (q)q. ˙ =

2.2.1 Single Point-to-point 2D alignment skill Let point P ∈ R3 be the coordinates of one 3D feature in Cartesian world frame that maps to homogenous image point p ∈ P2 . In this case, the visual feature s is the first two components of the image point p and s ∈ R2 . Let the desired image feature point s∗ ∈ R2 be the coordinates of the projection of target point P ∗ ∈ R3 . A point-to-point 2D alignment skill is the combination of the image-based control law in (11) and a Jacobian learning algorithm (see Section 3) that regulates the visual space error, Es (s(t)) = s(t) − s∗ , to zero over a trajectory. For an eye-to-hand configuration, point P is rigidly attached to the arm and its motion is observed by one or more static cameras. For an eye-in-hand configuration (this paper), the camera and point P are rigidly attached to the arm. At the completion of this skill, the camera center and points P and P ∗ will be collinear.

(1) (2)

When we have discrete-time sampled data, an estimate of the visual-motor Jacobian can be obtained: u (q)Δq. Δs  J

(3)

The uncalibrated Jacobian plays an important role in the nonlinear control law as explained in Appendix A. Let x ∈ R6 denote the 6D Cartesian pose of the robot end-effector. A robotic visual-motor skill involves moving the 6D pose of the end-effector (or any other tool rigidly attached to the body) to a desired final configuration using only the corresponding visual features and an image-based control law. Let mapping Ex : R6 → R6 represent an error of the end-effector pose defined in the Cartesian world frame, such that when Ex (x) = 0, the target pose is reached and the reaching task is complete. Let mapping Es : RM → RM represent a visual space error. When Es (s) = 0, the target visual features are only aligned with the observed visual features in the visual space. In this case, a pointing task is complete, but due to the projective geometry governing the cameras and projective ambiguities, completion of pointing does not suffice completion of reaching. However, reaching is sufficient for pointing. The error equations Ex (x) = 0 and Es (s) = 0 can only hold under ideal (no noise) conditions. In practice, it suffices that the error equations be sufficiently small, i.e., ∃  1, |E| < , where E is either Ex (·) or Es (·).

2.2

2.2.2 Single Point-to-line 2D alignment skill Let point P ∈ R3 be the coordinates of one 3D feature in Cartesian world frame that maps to image feature point p ∈ P2 , and line l ∈ P2 be the projection of a 3D line onto the projective plane. In this case, the visual feature s is the distance of point p to line l, which is denoted by their inner product s(t) = p(t).l(t). A point-to-line 2D alignment skill is the combination of the image-based control law in (11) and a corresponding Jacobian learning algorithm (see section 3) that regulates the visual space error, Es (s(t)) = s(t), to zero over a trajectory. For eye-to-hand configurations, point P is rigidly attached to the arm and its motion is observed by one or more static cameras. For eye-in-hand configurations, the camera and point P are rigidly attached to the arm. At the completion of this skill, the camera center, point P and line l will be coplanar.

2.3

2D-Decidable Primitive Skills

A skill is said to be 2D-decidable if for a local image† Es (s(t)), based controller q˙ = −λJ u ∀ > 0, ∃t0 > 0, ∀t ≥ t0 : |Es (s(t))| < .

3D-Decidable Primitive Skills

A skill is said to be 3D-decidable if for a local image†u Es (s(t)), based controller q˙ = −λJ

(4)

∀ > 0, ∃t0 > 0, ∃δ > 0, ∀t > t0 :

The 2D-decidable skills are complete when the error in the visual space is regulated to zero. A typical pointing skill is desired to be 2D-decidable. This is regardless of the Cartesian space error and therefore resembles pointing. For example, consider a pan-title camera that is to point at a dynamic object represented by its center of mass. This skill is 2D-decidable because a point-to-point alignment is represented by two visual features (the pixel coordinates) and the pan-tilt system has also two decoupled degrees of freedom.

|Es (s(t))| < δ ⇒ |Ex (x(t))| < .

(5)

A typical reaching skill must be 3D-decidable. All 3Ddecidable skills are by definition also 2D-decidable. The 2D-decidable skills do not guarantee reaching, i.e., after completion of pointing skills the Cartesian space error may not be regulated at zero. For a robot to reach a desired configuration, the visual features must contain enough information about the degrees of freedom at the Cartesian 230

end-effector frame. For example, if coordinates of an image point (2 dimensions) are used for in one camera, such visual feature alone cannot be used to control the 6D pose of the end-effector. In this case the corresponding Jacobian matrix (see Section 3) would be rank-deficient. In addition, the robot must meet the requirement of having sufficient controllable degrees of freedom (or degrees of mobility). For example, a pan-tilt camera has only two controllable degrees of freedom, but the Cartesian camera pose has 6 dimensions and hence, does not have sufficient controllable degrees of freedom for 3D-decidable skills. For the following 3D-decidable skills, a robot with sufficient controllable degrees of freedom is assumed.

studied in visual servoing [17], but pointing is not traditionally studied. Visual servoing of a robotic arm refers to the motion control of the arm to reach a desired posture from video feedback (for a recent survey of different architectures see [3, 4]). It is desirable for a robot to be able to operate in uncalibrated settings, where little information about the geometry of the robot, camera, and the scene are available. The robot skills include very basic pointing and reaching capabilities which are carried out in the form of the uncalibrated image-based control law in (16), see Appendix A. Our work here is in many ways analogous to the view taken by computational neurobiology [30]. The control law in (16) is analogous the genome-coded reflexes and motor programs of an animal which are evolved across generations. The control law without Jacobian learning is not sufficient for success of a task in uncalibrated settings. This could be viewed as the failure of innate skills of an animal under changing circumstances [30]. Learning of the Jacobian matrix [20] associated with a particular skill (e.g., point alignment) could be seen as analogous to biological skill acquisition, in the sense that over time and enrichment of visual-motor history, the accuracy of Jacobian estimation improves [11]. This improvement is similar to biological skill augmentation and motor adaptation. Finally composition of robotic skills and sequential planning and execution of a stack of primitive skills is, in a very general sense, analogous to intelligent selection of the type and the timing of skills from the motor repertoire.

2.3.1 Multi points-to-points alignment skill Let P = (P1 , · · · , PK ) be an ordered set of K 3D points, where Pi ∈ R3 for i = 1, · · · , M . When the end-effector has 6D pose x, under a projective transformation the 3D feature points project onto an ordered set p = (p1 , · · · , pK ) where pi ∈ P2 . Therefore, the visual feature s ∈ R2K has the coordinates of pi ’s and M = 2K. Let the desired image feature set s∗ ∈ R2K be the coordinates of the projection of the desired 3D feature set at the desired endeffector pose x∗ ∈ R6 . A multi points-to-points 3D alignment skill is 3D-decidable when the geometry and number of features is selected such that the control and Jacobian estimation allow for (5) to hold for Es (s(t)) = s(t) − s∗ and Ex (x(t)) = x(t) − x∗ . Not all feature configurations are 3D-decidable. Examples of degenerate feature configurations include 180o rotation around the view axis. These degenerate configurations are discussed in detail by Chaumette [1].

2.5

Decidability of visual tasks for robotic hand-eye coordination has been studied by Hespanha et al. [15, 14] and Dodds [9]. They use a stereo eye-to-hand configuration, but do not discuss the case of a single camera. Their main contribution is to determine what tasks can be performed with a given level of calibration for a stereo rig: calibrated, where intrinsic camera matrices and Essential matrix is known (highest level), affine (Fundamental matrix is know plus plane at infinity), weakly calibrated (only the Fundamental matrix is known), uncalibrated projective (injective and projective), and injective (lowest level) [9]. The main difference between the works of Hespanha [14] and Dodds [9] is that Dodds focuses on image-based encodings and develops a specification language for task decidability, but Hespanha focuses on Cartesian-based encodings (an estimate of the actual camera model is used to reconstruct the feature list). A major result of Hespanha’s dissertation is that projectiveinvariant tasks are decidable on a weakly calibrated stereo rig [15]. Dodds extends a similar result to an uncalibrated (projective) stereo rig [8]. Our definition of 3D-decidable skills (see section 2.3) is similar to their notion of decidable tasks, however, their

2.3.2 Multi points-to-lines alignment skill One and two point-to-line alignment skills are only 2Ddecidable. Let p = (p1 , · · · , pK ) be the homogenous image points and l = (l1 , · · · , lK ) be the projection of 3D lines onto the projective plane of the camera, where pi , li ∈ P2 and K > 2. In this case, the visual feature s = [p1 .l1 , · · · , pK .lK ] and M = K. A multi points-tolines 3D alignment skill may be 3D-decidable if (5) holds for Es (s(t)) = s(t) and Ex (x(t)) = x(t) − x∗ . It is important to note that the 2D- and 3D-decidable skills are defined with respect to a Jacobian learning algorithm and a local controller.

2.4

Related Work and Discussion

Analogy to Computational Neurobiology

In this work, we consider robotic hand-eye pointing and reaching. Vision-guided robotic reaching has been long 231

definition excludes a control law and is strictly a geometryinduced property. We use the term skill decidability to refer to decidability in the context of visual-motor skills and distinguish between the aforementioned task decidability. In addition, we use only one camera, but their results are specific to a stereo system. Other relevant work in the literature is as follows. Morrow and Khosla [23] start from a complete taxonomy of the constraintable degrees of freedom between two rigid bodies and progress to a partial list of vision-based and forcebased task primitives. Schnackertz and Grupen [29] present a set of three controllers that serve as a basis for visual servoing tasks. The image error is mapped to the joint space through a robot/camera Jacobian. Their work is related in the sense that they decompose all tasks to be performed with three simple controllers. Finally, Dodds et al. [7] use the primitive trackers implemented in XVision [13] to build a vision hierarchy from elementary tasks to more advanced tasks. In addition, they introduce a manipulation hierarchy for coarse-to-fine tasks. They use a parallel composition of tasks (see Section 4).

Figure 1. Illustration of the Local LeastSquares (LLS) Jacobian estimation method.

method provides an estimate of the visual-motor Jacobian in (3) at the vicinity of any previously observed point by looking at the global visual-motor space. The visual information specify a certain task. For a visual-motor query point dc = (sc , qc ), the visual skill learning problem is posed as the following optimization problem:   u (q) J = arg min (Δsk − Ju Δqk )2 , (6) q=qc

3 Visual-Motor Skill Learning

Ju

k: qk ∈Br (qc )

where Br (qc ) = {qp : qc − qp < r , p = 1 · · · P } is an open ball with radius r centered at qc which contains joint-space neighbors of query joint qc , Δsk = sc − sk , and Δqk = qc − qk . This method fits the best hyper-plane to the visual-motor data around qc under a global consideration. An illustration of this method is given in Figure 1 where a hyperplane is fitted to 2 × 1-dimensional data (two degrees of freedom for joints and one image feature). Skill augmentation is an important ability of animals. Over time and experience, animals augment their acquired skills and increase performance [30]. Our adopted learning scheme benefits from utilizing the history of visual motor data, therefore, resembles animal skill augmentation. While Hager [12] used projective invariance in stereo eye-to-hand visual servoing of robots, we use a single camera and present general primitive skills that can be used in either eye-to-hand or eye-in-hand configurations. The LLS method is similar to the work of Schaal et al. [26] and Laprest´e et al. [19], although the latter solves the least squares problem for random perturbations only around the desired pose.

Biological motor learning may refer to skill acquisition [28], motor adaptation, learning of instinctive behaviors, decision making, or all of the above [30]. Our view of robotic visual-motor learning excludes learning of instinctive behaviors, but includes a formalism for skill learning (a loose abstraction for skill acquisition plus motor adaptation) and an optimization-based decision making strategy. We use the notion of end-effector error minimization as a principle for skill learning after Todorov and Jordan [32]. The primitive visual-motor skills consists of a collection of simple 2D- and 3D-decidable skills. Hager [12] has studied the calibration-free visual servoing using projective geometry from a stereo pair of cameras. He uses five known points to compute the Essential matrix online. On the other hand, there are other Jacobian estimation techniques that do not require any knowledge of the scene or features [20]. Local methods are either based on secant update methods (such as Broyden update [18]) or least-squares (such as [26, 19]), but most of these methods do not exploit the global history 1 of visual motor data to increase estimation accuracy. We use the Locally Least Squares-based (LLS) Jacobian estimation method [11] for learning skills2 , because of its nice properties in global exploitation of history and efficient data structure of the visual motor data (eliminating the closely correlated data). This

4 Composite Visual-Motor Skills Different primitive skills can be composed either sequentially or in parallel. A common method for task composition is by stacking the error measures in a vector [7]. This is the most common practice in visual servoing, because the corresponding

1 We denote a history of P visual-motor data pairs (s , q ) by set D = p p {(sp , qp )}P p=1 . 2 In our computational-learning point of view, numerical estimation of a Jacobian is equivalent to learning a corresponding skill.

232

0.8

y

z

x

0.6 0.4

WAM4

0.2

(7)

Z

visual-motor Jacobian can be decoupled: ⎡ (1) ⎤ ⎡ (1) ⎤ u Δs J ⎢ .. ⎥ ⎢ . ⎥ ⎥ ⎣ . ⎦⎢ ⎣ .. ⎦ Δq, (K) (K) u Δs J

0 −0.2 −0.4 −0.6 −0.5

−0.8

0 0.5 0.8

where the superscripts denote a different skill which is learned separately. In addition to decoupling property, another advantage of parallel composition is the ease of implementation. We have used the parallel composition for multi point alignment skills. The parallel composition should not be used when skills are competing, because this might lead to instability or getting stuck in local minima. This happens because a skill would cancel the contribution of another skill in joint control signal, see (11) in Appendix A. The other approach is the sequential primitive skill composition. This can be done in two ways: (1) The redundancy formalism, where one skill has a higher priority and needs to be assigned as the primary task, while the other skills are secondary to that [20, 24], or (2) a mechanism selects the sequence of skills based on a local heuristic. Our skill composition is an instance of the second approach. For our experiments (see Section 5) we use an empirical mechanism of 2D pointing and then 3D aligning.

0.4

0.2

0

−0.2

−0.4

−0.6

−0.8

Y

X

Figure 2. Barrett WAM. (Left) MATLAB simulations using the Robotics Toolbox [6] and the Epipolar Geometry Toolbox [22].(Right) Experiment setup with fiducial markers.

Quality of Jacobian Learning Improves with History 0.8

LLS

0.5

|J

−J

0.6

true

|

0.7

0.4 0.3 200

400 600 Number of history points

800

Figure 3. Normalized Frobenius norm of the difference between the estimated Jacobian and the true Jacobian matrices. With more points in the history, the quality of estimation improves. The mean value for 200 random points are reported.

5 Experiments MATLAB simulations and experiments with a real robot are designed to validate our results. Jacobian learning for primitive tasks are first implemented using the Robotics Toolbox [5] and the Epipolar Geometry Toolbox [22]. An uncalibrated eye-to-hand image-based visual servoing systems is implemented. The WAM arm is an advanced backdrivable 4 degree-of-freedom arm running on an RTAILinux machine. The vision system consists of a Point Grey Grasshopper camera that captures 640 × 480 MONO8 images at 60 Hz (Figure 2). The Visual Servoing platform (ViSP) [21] is used for visual tracking. Two sets of experiments are presented. First, we validate our primitive skill learning method (LLS) on a single-camera eye-in-hand WAM Arm and validate the idea of skill augmentation over experience (richer history). Another set of experiments are designed to show an arbitrary composite task. The sequence of tasks is determined empirically.

5.1

0.6

shows the quality of the Jacobian learning from MATLAB simulations. The quality is measured by the Frobenius norm of the difference between the estimated Jacobian matrix and the true Jacobian matrix. It is seen that with inclusion of more data points in the history, the quality of the learned skill improves. For experiments with the WAM Arm, the user moves the arm on random trajectories while visually asserting that none of the tracked features leave the field-of-view. Some 10,000 intermediate points are recorded in a data structure for initialization. The whole initialization takes less than 5 minutes on average for the user. As the robot moves according to the visual servoing loop, new data points are added to the data structure. This ensures more accurate Jacobian estimates when the robot passes near that point in the future. Once the Jacobian is learnt for a specific skill, the skill can be augmented by repetition. Figure 4 shows an experiment for augmentation of a 3D-decidable multi points-topoints skill with the WAM. During the first try (before time

Experiment 1: Primitive Skill Learning and Augmentation

For MATLAB experiments, random joint vectors were produced for the robot/camera model. If the resulting points were in the field-of-view and the pose was kinematically dexterous, the data points were kept in a database. Figure 3 233

Skill Augmentation for multi points−points 0.4

|Ex(X)| [m]

F3,init

F

1,init

0.3 F

3,end

0.2

F4,init

0.1

F2,init

F1,end F

0 0

4,end

20

40 time

60

80 F

2,end

Figure 4. End-effector position error during skill augmentation for a 3D-decidable pointsto-points skill. The skill test is repeated at time t=40 with the acquired history. After augmentation, regulation to zero happens at a much faster rate. This experiment suggests that the skill is augmented.

Figure 5. The image-space trajectories during the sequential composite skill in Section 5.2 . Note that the trajectory is quasi-linear, which is a desired property for a sequence of two tasks.

t = 40), the skill is completed successfully but at a slow rate. At the second try, the same skill is tested after augmenting the visual-motor data from the first try. The pose error converges to a couple of millimeters at a much faster rate.

5.2

numerical techniques. In our computational view, visualmotor learning is equivalent to learning the Jacobian matrix for a particular skill. We showed how the Locally Least Squres (LLS) method [11], implicitly incorporate the notion of skill augmentation. In addition, we proposed a sequential composition of primitive skills in order to perform more complex tasks. This part of the work is still under development, but early results are promising.

Experiment 2: Composite Skill

The second experiment validates a simple sequential composite scheme. The first skill (which is learned previously) is a 2D-decidable one-point-to-two-lines, which points the arm towards the center of the object. This helps to bring the camera close enough to the desired state, such that the local controller in (11) is asymptotically stable. The second skill is a 3D-decidable points-to-points. Both skills are previously learned. Once the first skill is carried out, the second 3D-decidable points-to-points skill is activated. Figure 5 shows the near quasi-linear image space trajectories for this composite task. A linear trajectory suggests that the corresponding tasks are learned successfully. The pointto-lines skill first aligns the center point of the four points on the two diagonal lines (dotted line) and then performs a multi-point alignment. Figure 6 depicts the visual feature error Es (s) and the end-effector Cartesian error Ex (x), respectively.

Acknowledgements Azad Shademan is supported by an AIF/iCore ICT scholarship. Amir massoud Farahmand acknowledges the support of the Alberta Ingenuity Centre for Machine Learning (AICML) at the University of Alberta. The authors would also like to acknowledge financial support from NSERC, CFI and ASRA and in-kind contributions from Barrett Technology Inc. for funding the robotics lab and the WAM arm and Simon Leonard for developing the robot control library.

A Image-Based Visual Servoing A.1

6 Conclusions

The Classic Approach

The classic eye-in-hand image-based visual servoing problem can be stated as below. Let vector r denote the extrinsic parameters of the camera, vector s(r) denote the value of visual features in the image plane (e.g., pixel coordinates or higher order image moments [2]), and vector s∗

In this paper, we presented a framework for vision-based robotic pointing and reaching in uncalibrated eye-in-hand settings. We introduced our notion of primitive visualmotor skills and showed how these skills can be learned by 234

Visual Space Error, E (s) s

system [10, 4], i.e.,

45

e˙ = −λe,

40

(10)

35

where λ is a positive real number. † denote an estimate for the pseudo-inverse of the Let J i Jacobian matrix Ji . The velocity screw v may be considered as † e. (11) v = −λJ i

30 2D point−to−line

[pix]

25 20 15

3D point−to−point 10 5 0 0

20

40

60 time

80

100

The differential equation of the visual error in (9) with the choice of velocity command in (11) gives

120

End−Effector Error, Ex(x) 0.18

† e. e˙ = −λJi J i

0.16 0.14

[m]

0.12

To ensure the local asymptotic stability, we must have

0.1

0.08

(12)

† > 0, Ji J i

2D point−to−line

0.06

3D point−to−point

0.04

† is an accurate estimate of J† . Note that which holds if the J i i this control law uses an estimate of the pseudo-inverse of the Jacobian matrix (the interaction matrix [10]), which has an analytic form containing the estimated depths of feature points. A singular or near-singular interaction matrix may result in wrong numerical estimations and result in control failure.

0.02 0 0

20

40

60 time

80

100

120

Figure 6. (Top) The visual feature error for sequential tasks (a 2D-decidable task followed by a 3D-decidable task) and (Bottom) the Cartesian end-effector error norm of the same composite task. As it can be seen, once the 2D-decidable skill is complete, the end-effector has a large error, but behaves like a first-order decoupled system (10) as soon as the second task is activated. Also, see the linear visual space trajectories in Figure 5.

A.2

In uncalibrated visual servoing, the control law is also defined entirely in the image-space without a need to reconstruct the depth or other 3D quantities. One formulation of uncalibrated visual servoing controls the joint velocities rather than the end-effector velocity screw. The kinematic Jacobian of a manipulator relates the velocity screw of the end-effector to the joint velocities, q: ˙

denote the desired value of the visual feature in the target frame. The time variation of s is related to the Euclidean camera velocity screw by a Jacobian matrix [10]: s˙ = Ji (s(r))v

v = J(q)q. ˙

(13)

Assuming that the camera frame is exactly on the endeffector frame, (8) and (13) result in

(8)

where Ji (s(r)) is the interaction matrix associated with s, v = [v ω] is the velocity screw, v is the linear velocity and ω is the rotational velocity of the camera frame. In calibrated settings, the interaction matrix can be found analytically. A visual task can be defined as an error measure

s˙ = Ji v = Ji J(q)q. ˙

(14)

Let Ju = Ji J be the total Jacobian that maps the image velocities to joint velocities. For any visual feature, a corresponding Jacobian matrix can be defined that relates the rate of change in the visual feature measurements to the joint velocities: s˙ = Ju q. ˙ (15)

e(r) = s(r) − s∗ , where e should be regulated at zero to complete a task. The time derivative of the error is ∂e ∂r = Ji v. e˙ = ∂r ∂t

The Uncalibrated Approach

The discrete form of the above equation can be approxiu Δq and a control law can be derived mated as Δs  J similar to (11): † e, (16) q˙ = −λJ u

(9)

A classic proportional control scheme may be achieved to ensure that the system behaves like a first-order decoupled

where e is the visual task error. 235

References

[18] M. J¨agersand, O. Fuentes, and R. Nelson. Experimental evaluation of uncalibrated visual servoing for precision manipulation. In IEEE International Conf. Robotics and Automation (ICRA), volume 4, pages 2874–2880, April 1997. [19] J.-T. Laprest´e, F. Jurie, M. Dhome, and F. Chaumette. An efficient method to compute the inverse jacobian matrix in visual servoing. In IEEE Int. Conf. on Robotics and Automation, ICRA’04, volume 1, pages 727–732, New Orleans, LA, April 2004. [20] N. Mansard, M. Lopes, J. Santos-Victor, and F. Chaumette. Jacobian learning methods for tasks sequencing in visual servoing. In IEEE/RSJ International Conf. Intelligent Robots and Systems, pages 4284–4290, October 2006. [21] E. Marchand, F. Spindler, and F. Chaumette. ViSP for visual servoing: a generic software platform with a wide class of robot control skills. IEEE Robotics and Automation Magazine, Special Issue on ”Software Packages for Vision-Based Control of Motion”, P. Oh, D. Burschka (Eds.), 12(4):40–52, December 2005. [22] G. Mariottini and D. Prattichizzo. EGT: a toolbox for multiple view geometry and visual servoing. IEEE Robotics and Automation Magazine, 3(12), December 2005. [23] J. Morrow and P. Khosla. Manipulation task primitives for composing robot skills. In Int. Conf. on Robotics and Automation (ICRA), volume 4, pages 3354–3359, Apr 1997. [24] F. C. N. Mansard. Task sequencing for sensor-based control. IEEE Trans. on Robotics, 23(1):60–72, February 2007. [25] J. A. Piepmeier, G. V. McMurray, and H. Lipkin. Uncalibrated dynamic visual servoing. IEEE Trans. Robotics and Automation, 20(1):143–147, February 2004. [26] S. Schaal, C. G. Atkeson, , and S. Vijayakumar. Real-time robot learning with locally weighted statistical learning. In Proc. International Conference on Robotics and Automation, pages 288 – 293, April 2000. [27] R. A. Scheidt, J. B. Dingwell, and F. A. Mussa-Ivaldi. Learning to move amid uncertainty. The Journal of Neurophysiology, 86(2):971–985, August 2001. [28] R. A. Schmidt and T. D. Lee. Motor Control and Learning: A Behavioral Emphasis. Human Kinetics, Champaign, IL, fourth edition, February 2005. [29] T. J. Schnackertz and R. A. Grupen. A control basis for visual servoing tasks. In Int. Conf. on Robotics and Automation (ICRA), volume 1, pages 478–483, May 1995. [30] R. Shadmehr and S. P. Wise. The Computational Neurobiology of Reaching and Pointing. Computational Neuroscience. MIT Press, Cambridge, Massachusetts, 2005. [31] M. Spong, S. Hutchinson, and M. Vidyasagar. Robot Modeling and Control. John Wiley, USA, 2006. [32] E. Todorov and M. I. Jordan. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5:1226– 1235, 2002.

[1] F. Chaumette. Potential problems of stability and convergence in image-based and position-based visual servoing. In D. Kriegman, G. . Hager, and A. Morse, editors, The Confluence of Vision and Control, pages 66–78. LNCS Series, No 237, Springer-Verlag, 1998. [2] F. Chaumette. Image moments: a general and useful set of features for visual servoing. IEEE Trans. on Robotics and Automation, 20(4):713–723, August 2004. [3] F. Chaumette and S. Hutchinson. Visual servo control, part I: Basic approaches. IEEE Robotics and Automation Magazine, 13(4):82–90, December 2006. [4] F. Chaumette and S. Hutchinson. Visual servo control, part II: Advanced approaches. IEEE Robotics and Automation Magazine, 14(1):109–118, March 2007. [5] P. Corke. A robotics toolbox for MATLAB. IEEE Robotics and Automation Magazine, 3(1):24–32, Mar. 1996. [6] P. I. Corke. Visual Control of Robots: High-performance Visual Servoing. Research Studies Press, Somerset, UK, 1996. [7] Z. Dodds, M. J¨agersand, G. Hager, and K. Toyama. A hierarchical vision architecture for robotic manipulation tasks. In Int. Conf. on Computer Vision Systesm, volume 1542, pages 312–330, 1999. [8] Z. Dodds, G. D. Hager, A. S. Morse, and J. P. Hespanha. Task specification and monitoring for uncalibrated hand/eye coordination. In Int. Conf. on Robotics and Automation (ICRA), volume 2, pages 1607–1613, May 1999. [9] Z. B. Dodds. Task Specification Languages for Uncalibrated Visual Servoing. PhD thesis, Yale University, New Haven, CT, May 2000. [10] B. Espiau, F. Chaumette, and P. Rives. A new approach to visual servoing in robotics. IEEE Trans. Robotics and Automation, 8(3):313–326, June 1992. [11] A. M. Farahmand, A. Shademan, and M. J¨agersand. Global visual-motor estimation for uncalibrated visual servoing. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1969–1974, 2007. [12] H. G.D. Calibration-free visual control using projective invariance. In ICCV95, pages 1009–1015, 1995. [13] G. Hager and K. Toyama. XVision: A portable substrate for real-time vision applications. Computer Vision and Image Understanding, 69(1):23–37, January 1998. [14] J. Hespanha. Logic-Based Switching Algorithms in Control. PhD thesis, Yale University, New Haven, CT, December 1998. [15] J. P. Hespanha, Z. Dodds, G. D. Hager, and A. S. Morse. What tasks can be performed with an uncalibrated stereo vision system? International Journal of Computer Vision, 35(1):65–85, Nov. 1999. [16] M. Hosoda, K.; Asada. Versatile visual servoing without knowledge of true jacobian. In IEEE/RSJ International Conf. Intelligent Robots and Systems (IROS), volume 1, pages 186–193, September 1994. [17] S. Hutchinson, G. D. Hager, and P. I. Corke. A tutorial on visual servo control. IEEE Trans. Robotics and Automation, 12(5):651–670, Oct. 1996.

236

Suggest Documents