Ter verkrijging van de graad van doctor ... ten overstaan van een door het ...... 10. Chapter 2. Learning with neural networks of the steepest descent method ...... ject features (indicated by f igo) in the image must uniquely determine the position.
Patrick van der Smagt
Visual robot arm guidance using neural networks
Visual robot arm guidance using neural networks
Visual robot arm guidance using neural networks
ACADEMISCH PROEFSCHRIFT
Ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnicus Prof. dr P. W. M. de Meijer ten overstaan van een door het college van dekanen ingestelde commissie in het openbaar te verdedigen in de Aula der Universiteit op vrijdag 24 maart 1995, 12:00h
door
Patrick van der Smagt geboren te Assen
Promotoren: Prof. dr ir F. C. A. Groen dr ir B. J. A. Krose Faculteit:
Wiskunde en Informatica
This research has been partly sponsored by the Dutch Foundation for Neural Networks. Work presented in chapter 6 has also been sponsored by the Carver Trust Charitable Fund. CIP-GEGEVENS KONINKLIJKE BIBLIOTHEEK, DEN HAAG Smagt, Patrick van der Visual robot arm guidance using neural networks / Patrick van der Smagt. - Amsterdam: Universiteit van Amsterdam, Faculteit Wiskunde en Informatica. - Ill. Proefschrift Universiteit van Amsterdam - Met lit. opg., reg. ISBN 90-74795-23-4 Trefw.: robotica / neurale netwerken / beeldverwerking
Copyright c by Patrick van der Smagt. All rights reserved. Cover design: Britta Platt and Patrick van der Smagt.
fu¨ r voor for
Britta
Contents 1 Introduction 1.1 1.2 1.3 1.4
Adaptive autonomous systems Visually guided robot arms : : Arti cial neural networks : : : Overview : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
2 Learning with neural networks
2.1 Choosing the basis functions : : : : : : : : : : 2.1.1 Types of functions : : : : : : : : : : : 2.2 The minimal approximation error : : : : : : : 2.2.1 Model of the approximation error : : : 2.3 Parameter estimation : : : : : : : : : : : : : : 2.3.1 First order methods : : : : : : : : : : : 2.3.2 Second order methods : : : : : : : : : 2.4 Results : : : : : : : : : : : : : : : : : : : : : : 2.4.1 Classi cation: the XOR problem : : : 2.4.2 Continuous function approximation : : 2.4.3 Discontinuous function approximation 2.5 Discussion : : : : : : : : : : : : : : : : : : : :
3 Principles of hand-eye coordination
3.1 Robot perception : : : : : : : : : : : : : : : 3.1.1 The OSCAR visual system : : : : : : 3.1.2 Measurement errors : : : : : : : : : : 3.2 Robot Motion : : : : : : : : : : : : : : : : : 3.2.1 OSCAR kinematics : : : : : : : : : : 3.2.2 Dynamics : : : : : : : : : : : : : : : 3.2.3 Error in arm positioning : : : : : : : 3.3 System Description : : : : : : : : : : : : : : 3.3.1 Structure of the proposed controller : 3.3.2 Generating learning samples : : : : : 3.3.3 Coupling the robot with the camera : 3.4 Conclusions : : : : : : : : : : : : : : : : : : i
: : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : :
1
1 3 5 7
9
11 12 14 16 17 18 19 24 25 26 28 30
31
33 34 36 40 41 44 44 46 47 48 50 51
ii
CONTENTS
4 Eye-in-hand positioning
4.1 Formal statement of the problem : : : : : : : : : 4.2 Constructing the controller : : : : : : : : : : : : : 4.2.1 The neural network : : : : : : : : : : : : : 4.2.2 Bins : : : : : : : : : : : : : : : : : : : : : 4.2.3 Learning samples : : : : : : : : : : : : : : 4.2.4 Implementation of the learning algorithm : 4.2.5 Accuracy of the neural network : : : : : : 4.3 Results : : : : : : : : : : : : : : : : : : : : : : : : 4.3.1 Simulated robot : : : : : : : : : : : : : : : 4.3.2 Results with OSCAR : : : : : : : : : : : : 4.4 Discussion : : : : : : : : : : : : : : : : : : : : : :
5 Visual feedback in motion
5.1 The trajectory of an eye-in-hand system : : : : : 5.1.1 The trajectory as a function of time : : : : 5.2 The time-independent constraints : : : : : : : : : 5.3 The time-dependent constraints : : : : : : : : : : 5.3.1 Feedback control of the trajectories : : : : 5.3.2 Finding the time-dependent constraints : : 5.4 Visual measurement of the stopping criteria : : : 5.4.1 Measuring (t) : : : : : : : : : : : : : : : 5.5 Controlling the manipulator : : : : : : : : : : : : 5.5.1 A trajectory in visual domain : : : : : : : 5.5.2 Translating the trajectory to joint domain 5.5.3 Controller structure : : : : : : : : : : : : : 5.5.4 Generating learning samples : : : : : : : : 5.5.5 Experimental procedure : : : : : : : : : : 5.5.6 Exploration : : : : : : : : : : : : : : : : : 5.5.7 Finding the optimal network : : : : : : : : 5.6 Results : : : : : : : : : : : : : : : : : : : : : : : : 5.6.1 Inuence of input noise : : : : : : : : : : : 5.7 Discussion : : : : : : : : : : : : : : : : : : : : : :
6 Dynamic control of a rubbertuator arm
6.1 The robot system : : : : : : : : : : : : : : : : : 6.1.1 Kinematic system : : : : : : : : : : : : : 6.1.2 The rubbertuator drive system : : : : : 6.1.3 The gripper and its controlling valves : : 6.2 Dynamics of the SoftArm : : : : : : : : : : : : 6.2.1 Behaviour of a rubbertuator driven joint 6.2.2 Analysis of the rubbertuator behaviour : 6.3 Control of the SoftArm : : : : : : : : : : : : : : 6.3.1 System setup : : : : : : : : : : : : : : :
: : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
53
55 58 58 59 60 62 64 64 66 70 72
75
77 78 80 83 83 84 88 91 91 92 93 94 95 95 96 96 99 99 101
103
104 104 107 107 107 108 109 114 114
iii
CONTENTS
6.3.2 Network structure : : : : : : 6.4 Results : : : : : : : : : : : : : : : : 6.4.1 A simple trajectory : : : : : 6.4.2 A more complex trajectory : 6.5 Discussion : : : : : : : : : : : : : :
: : : : :
: : : : :
7 Discussion A Orthogonal basis functions B Proofs for chapter 5 B.1 B.2 B.3 B.4 B.5
Derivation of the stopping criteria : : : Proof of theorem 5.2 : : : : : : : : : : Least-mean-square minimisation : : : : Nonlinear transform of a noisy signal : Robust parameter estimation methods
C The OSCAR Robot System
C.1 OSCAR mechanical structure : : C.2 Hardware structure : : : : : : : : C.3 Software structure : : : : : : : : C.3.1 OSCAR controller : : : : C.3.2 Vision software : : : : : : C.3.3 Putting the parts together
D OSCAR inverse kinematics E Simderella E.1 Introduction : : : : : : E.2 Simulator Organisation E.2.1 Connel : : : : : E.2.2 Simmel : : : : : E.2.3 Bemmel : : : : E.3 Availability : : : : : : E.4 Acknowledgements : :
Bibliography Summary Samenvatting Acknowledgements Index
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : :
: : : : : : :
: : : : : :
: : : : : : :
: : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
: : : : :
: : : : : : : : : : :
: : : : : : :
115 115 115 117 117
121 127 129
129 131 132 135 136
139
139 140 140 140 141 141
143 147
147 147 147 148 149 150 151
153 161 165 169 171
iv
CONTENTS
List of symbols Below is a list of symbols used in this thesis. Every entry contains a reference to the rst use of the symbol. This reference, however, may not always be true|sometimes it is rather interpreted as the rst denition of that symbol. For vectors, the oddball font eur is used: it renders boldface, non-italic characters.
Symbol
C () F () I N () R()
tan i kgi+1 k : (2.43) cos i kgik kgik Now, if i approaches =2, the iteration may take a small step such that (gi+1 ; gi) is small and the ratio kgi+1k=kgi k approaches unity. Then, according to (2.43), i+1 is also
A method is said to converge linearly if the error E (wi+1 ) = cE (wi ) with c < 1. Methods which converge with a higher power, i.e., E (wi+1 ) = cE (wi )m with m > 1 are called super-linear. ()
23
2.3. Parameter estimation
gi+1
ui ui+1
a very slow approximation Figure 2.5: Slow decrease with conjugate gradient in non-quadratic systems. The hills on the left are very steep, resulting in a large search vector vi . When the quadratic portion is entered the new search direction is constructed from the previous direction and the gradient, resulting in a spiraling minimisation.
near =2 and the eect depicted in gure 2.5 is observed. If, however, the expression for i in (2.38) is replaced by T (2.44) i = gi+1(giT+1 ; gi ) then or
gi gi
; gi k i kgi+1 k kkggki+1 2 i
(2.45)
; gik tan i+1 cos1 kgi+1 (2.46) kgik i such that i+1 i and vi+1 is turned towards the steepest descent direction. Notice that this expression for i retains the conjugacy of vi and vi+1 . Conjugate gradient minimisation with (2.44) is known as Polak-Ribi$ere conjugate gradient. i;1ui;1 gi
i
ui
Figure 2.6: Denition of i .
Powell's restart procedures. A second improvement involves the restart. It appears
that restarting with u+1 = g+1 is inecient. Instead, a restarting method which does
24
Chapter 2. Learning with neural networks
not abandon the second derivative information is needed. A proposed method is obtained by setting vi according to (2.37) when a restart is made, and by extending the de nition of vi on normal iterations (Beale, 1972). Let vk be an arbitrary downhill restarting direction. Supposing E is quadratic, we are looking for a direction vi+1 which is a linear combination of vk and the gradients gk , gk+1, : : :, gi+1 such that vk , vk+1, : : : are mutually conjugate. An expression which suces these conditions is (2.47) vi+1 = gi+1 + i vi + i0 vk : Again, i is calculated to make vi+1 conjugate to vi, while the extra term provides conjugacy to vk : T (g ; g ) gTi+1 (gk+1 ; gk ) i+1 i 0 = : (2.48) i = gvi+1 i vT (g T (g ; g ) i i+1 i i+1 k+1 ; gk ) To prevent that the resulting direction leads uphill instead of downhill, we require that vTi gi > 0 for i > k. Furthermore, the orthogonality between gi;1 and gi must be guaranteed after restart to prevent that the approximations go to a nonzero limit. Powell suggests the following inequality: gTi;1 gi
0:2kgik2:
(2.49)
Thirdly, the new search direction must go suciently downhill. Again following a suggestion by Powell: ; 1:2kgik2 vTi gi ;0:8kgik2: (2.50) If (2.49) or (2.50) are not satis ed, we restart with i = k ; 1.
2.4 Results Where appropriate, the following methods are compared:
bp : standard error back-propagation with xed learning rates sd : error back-propagation with line minimisation instead of learning rates (i.e., steepest descent minimisation)
dfp : Davidon-Fletcher-Powell quasi-Newton minimisation fr : Fletcher-Reeves conjugate gradient minimisation cg : conjugate gradient minimisation with Powell restarts. The number of necessary error function evaluations is counted as the number of iterations. With error back-propagation, this equals the number of search directions traversed. For the other methods, due to line minimisation every search direction is traversed multiple times, and in each instance the function value as well as the derivative is computed. The methods are compared on the following three problems:
25
2.4. Results
1. classi cation of the four XOR patterns, to reach an average squared error of less than 0.025 per pattern
2. approximation of the function sin(x) cos(2x) for 0 x 2, from which 20 samples are uniformly chosen. An average squared error less than 0.025 per pattern must be reached
3. approximation of the function tan(x) (with one discontinuity) for 0 x , from which 20 samples are uniformly chosen. An average squared error less than 0.025 per pattern must be reached.
In all instances, 10,000 trials were run.
2.4.1 Classication: the XOR problem
0.01
0.0125 0.0075
prob. dens.
0.015
prob. dens.
It has been observed (Fahlman, 1988) that the exclusive or classi cation problem is not a representative problem, since it does not encourage but rather penalise when the network tries to generalise the learning patterns. However, it is a dicult classi cation problem, and therefore we use it as a benchmark for the ve minimisation methods. For all learning methods, 10,000 trials were run with a network with two hidden units in one layer. For those runs which reached a global minimum, where a global minimum is considered reached when the summed squared error is less than 0.025 per pattern, the number of iterations needed is counted. Table 2.1 shows how often a global minimum is reached. First, standard error back-propagation is tested with a learning rate of 0:1 and a momentum of 0:9. In 91.3% of the trials a global minimum is reached. Figure 2.7 (left) shows the number of iterations needed to reach a summed squared error of less than 0.025 per pattern the average is located at 332. When, each step, a line minimisation
0.005 0.0025 100
200
300
400
500 iteration 600
1000 2000 3000 4000 5000 6000 iteration
Figure 2.7: Training XOR with bp and sd. The gures show the probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for XOR classication with (left) standard error back-propagation and (right) error backpropagation and line minimisation (steepest descent).
is performed, the system is much more sensitive to local minima: only in 38.0% of all
26
Chapter 2. Learning with neural networks
0.015
0.0125
prob. dens.
0.02
prob. dens.
trials a global minimum was reached. The number of steps needed is shown in gure 2.7 (right). Notice, however, that each minimisation now takes 3 to 5 line minimisations the average number of minimisation steps is therefore the average number of function evaluations (3661.7) divided by the average number of line minimisation steps (3 to 5). The quasi-Newton method reaches a global minimum only in 34.1% of the cases, while the average number of learning iterations is 2141.1 (see gure 2.8 (left)). Fletcher-Reeves
0.0075
0.01 0.005
0.0025 200
400
600
200
800 1000 iteration
400
600
800 1000 iteration
Figure 2.8: Training XOR with dfp and fr. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for XOR classication with (left) quasi-Newton DFP and (right) Fletcher-Reeves conjugate gradient.
0.025
0.015
prob. dens.
conjugate gradient reaches the minimum in 81.5% of the cases, but needs, on the average, 523.0 iterations see gure 2.8 (right). Finally, conjugate gradient with Powell restarts is 82.1% successful, and only needs 79.2 iterations ( gure 2.9).
0.005 50
100
150iteration200
Figure 2.9: Training XOR with cg. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for XOR classication with Polak-Ribiere conjugate gradient with Powell restarts.
2.4.2 Continuous function approximation
Secondly, the function sin(x) cos(2x) was trained with 20 learning samples over the period 0 : : : 2 (see gure 2.10). The used network had 10 hidden units in one layer (Vysniauskas et al., 1992). Figures 2.11 and 2.12 show the number of function evaluations needed to
27
2.4. Results
XOR sin(x) cos(2x) tan(x) bp 91.3 15 0.0 sd 38.0 90 0.0 4.5 fr 81.5 49.5 35.7 40.0 dfp 34.1 100.0 85.4 cg 82.1 Table 2.1: Percentage of runs that lead to a global minimum. A global minimum was considered reached when the summed squared error was less than 0.025 per pattern. The indicates that adaptive learning rates were used in this case.
1 0.5
1
2
3
4
5
6
-0.5 -1
0.015
0.0125
0.01
0.0075
0.005
prob. dens.
0.02
prob. dens.
Figure 2.10: Learning samples from the function sin(x) cos(2x).
0.0025 200
600
1000
1400 iteration
5000
10000
15000 20000 iteration
Figure 2.11: Training sin(x) cos(2x) with dfp and fr. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for sin(x) cos(2x) approximation with (left) quasi-Newton DFP and (right) Fletcher-Reeves conjugate gradient.
28
0.0125
prob. dens.
Chapter 2. Learning with neural networks
0.0075 0.0025 500
1500
2500 iteration
Figure 2.12: Training sin(x) cos(2x) with cg. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for sin(x) cos(2x) approximation with Polak-Ribiere conjugate gradient with Powell restarts.
reach a summed squared error less than 0.025 per pattern. Although dfp is slightly better than cg, the latter always reaches a global minimum, whereas the former only reaches a minimum in 35% of the cases. Steepest descent is not very sensitive to local minima here, but needs, on the average, 4 106 iterations to reach the minimum. Error back-propagation was, when xed stepsizes were used, never able to nd the minimum when the system was approaching a global minimum, the stepsize was not small enough to continue decreasing the error, such that overshoot resulted. To overcome this problem, adaptive learning rates (Silva & Almeida, 1990) were used. Still the algorithm gave poor results, often reaching local minima and needing over 2 million function evaluations.
2.4.3 Discontinuous function approximation
Finally, the function tan(x) over 0 : : : , with one discontinuity, was tested with 20 learning samples equally distributed over the inputs space (see gure 2.13). A network with ve hidden units in one hidden layer was used (Vysniauskas et al., 1992). Steepest descent 10 5 0.5
1
1.5
2
2.5
3
-5 -10
Figure 2.13: Learning samples from the function tan(x).
and error back-propagation never reached a global minimum. The other methods (see gures 2.14 and 2.15) could solve the problem.
29
0.01
0.03 0.02
0.005
prob. dens.
0.015
prob. dens.
2.4. Results
0.01
200
600
1000
1000 iteration
2000
3000 iteration
0.01
prob. dens.
Figure 2.14: Training tan(x) with dfp and fr. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for tan(x) approximation with (left) quasi-Newton DFP and (right) Fletcher-Reeves conjugate gradient.
0.005
200
600
1000 iteration
Figure 2.15: Training tan(x) with cg. Probability density of the number of function evaluations required to reach a global minimum (summed squared error less than 0.025 per pattern) for tan(x) approximation with Polak-Ribiere conjugate gradient with Powell restarts.
30
Chapter 2. Learning with neural networks
2.5 Discussion In this chapter we have seen that a theoretical model of the approximation error using a feed-forward neural network can be found through simulation, such that an optimal number of learning samples and hidden units can be chosen for a particular problem. This method, introduced in (Vysniauskas et al., 1993), determines an asymptotic model of the approximation error as a function of the number of learning samples and hidden units used in the network. The best t of this model to the approximation error of the feed-forward network can be used to predict the minimal computational resources to attain a certain error in the approximation. Secondly it has been shown that, although standard error back-propagation is less sensitive to get stuck in local minima, second order minimisation methods are far superior with respect to learning time, especially in accurately approximating smooth functions. Conjugate gradient algorithms, which can be seen as error back-propagation with momentum where learning rates are optimally determined, were shown to be a better choice for feed-forward network training. In particular, Polak-Ribi$ere conjugate gradient optimisation with Powell restarts shows promising results for training feed-forward networks due to the fact that it takes the non-quadratic behaviour of the minimised error function far from the minimum into account. Although the problem of local minima is potentially serious, practice shows that the preferred minimisation method hardly suers from this problem. First, it has been suggested that this problem can be alleviated by adding more dimensions to the search space (i.e., more parameters (weights) in the neural network by adding hidden units) of the optimisation method, thus creating `escape routes' to escape from local minima. Secondly, in order to avoid getting stuck in local minima some authors suggest hybrid training algorithms (Gorse & Shepherd, 1992): switch to conjugate gradient optimisation near minima, and use error back-propagation with xed stepsize otherwise. This method, which is also suggested in (M&ller, 1990), is a technique from numerical analysis known as Levenberg-Marquardt optimisation (Press et al., 1986). The technique for nding the optimal neural network will be employed for the handeye coordination problem discussed in chapter 4. In all cases, the optimal Powell learning strategy will be used.
Chapter 3
Principles of hand-eye coordination Industrial environments have always been the home and hearth of robots. It may be said that the applicability of robots in industrial environments has found its roots in the introduction of conveyor belt production by Henry Ford in his automobile assembly factories in 1903. However, robots never managed to progress away much from the conveyor belt their incapability to react `intelligently' to changing environments|i.e., adapt where necessary|prevented their wider applicability. Adaptation is essential to cope with unforeseen changes. Traditionally, autonomous systems perform poorly in cases where the environment is changing and cannot be described by a rigid model, whereas humans have the ability to generalise from previous experiences. Incorporation of such generalisation into autonomous robot systems could lead to more robust and more widely applicable solutions. As a start in the search for such solutions, in this chapter we will study the robot and sensor systems that are to our disposal and are used in the experiments. In order to facilitate this study, it is customary to split a problem into simpler subproblems, study and solve those sub-problems separately, and try to integrate the separate solutions. The components we have to study are:
the sensor system (camera, joint sensors, etc., as well as, e.g., the sensor signal preprocessing hardware)
the sensor interpretation system (e.g., in the simplest case, object recognition) the robot navigation system (e.g., planning, inverse kinematics, dynamics: the lowlevel control)
the robot and its actuators. Of all hardware components it is important to measure the accuracy for the following reasons. First, those measurements are required in order to obtain realistic simulations. Secondly, it is important to know which errors can be coped with by an autonomous adaptive system (in general, systematic errors can but stochastic errors cannot). In the above list, the distinction between perception and motion is emphasised. This emphasis is reected in the two-stage approach taken in many autonomous robot control 31
32
Chapter 3. Principles of hand-eye coordination
implementations: (1) transform a sensor reading into a world model representation, such that the sensor interpretation is known with respect to a prede ned frame common to all parts of the system (2) transform the world model representation into a robot motion. In the rst stage, the correspondence between the sensor reading and the world space must be determined in other words, calibration is required. In the second stage (called the inverse kinematics) the desired end-eector position is translated to joint values of the robot, by using the known geometry of that robot. Again, this requires calibration. For instance, the images of two camera's can be combined into a 3D representation of the viewed scene, and from this 3D representation the required robot motion can be computed. Such approaches, however, require accurate and stable models of both the robot and its sensors. There is an unpleasant trade-o between complexity of the models, accuracy of the system, and price of the equipment. Furthermore, wear and tear can never be excluded, resulting in expensive maintenance (replacement), model adaptation, or frequent recalibration. In this research, we focus on behaviour-based robotics (Brooks, 1986) rather than planning (Meijer, 1991). In particular, we are interested in the following problem:
place the end-eector of a robot arm directly above a stationary object. This problem, known as the pick-and-place problem, is very common in robotics but only solved in well-structured settings. In pick-and-place problems, the robot hand positioning problem is thus constrained that only the position and not the orientation of the hand is taken into account (i.e., the position is always xed). This suces for many pick-andplace operations encountered in industry. In a number of cases, the rotation of the hand is also of importance, but this rotation can be uncoupled from the 3D positioning problem. Thus the remaining problem is 3 degrees-of-freedom (DoF). In order to satisfy the requirement of adaptivity to a changing environment, we assume that no (detailed) models of the visual and robot system are available furthermore, the visual and/or robot system may be subject to sudden or gradual change. In other words, we assume that operator-invention for robot maintenance or precise model building is not feasible. This is the case in, e.g., space robotics or other tele-operated robots, or when models are simply too complex to make (which is the case for the rubbertuator robot discussed in chapter 6). To attain this goal, the two-stage approach is replaced by a direct translation from visual input to robot output. This translation is done by a self-learning and adaptive controller, which obtains its learning data from the motion of the robot. In this chapter, all components mentioned above will be discussed. In section 3.1, the visual system that is chosen to tackle the task at hand is described. In section 3.2, the robot system is described. Both of these sections are concluded with an error analysis of the camera and robot system, which can be used in a later study of the accuracy of the control system. Section 3.3 describes control systems necessary for model-free robot arm guidance. All three sections have the following setup: rst a description of the relevant theory, followed by a description of the speci c choices and implementations used for this thesis. Finally, a conclusion is given in section 3.4.
3.1. Robot perception
33
3.1 Robot perception For the system to accomplish the speci ed task it must know the position of the observed object relative to the robot, in some coordinate system. Consider a robot that a speci c moment has a joint position . The position of the end-eector in world coordinates is given by xr . The robot has to move towards an object located at world position xo , i.e., assume a such that the xr equals xo . As discussed above, xo and xr are not available without an accurate model of the visual system. There are two basic visual setups to obtain the required visual information: 1. External `world-based' camera: both the robot and target object are observed by cameras situated at xed and unchanging positions. The visually observed object features (indicated by figo) in the image must uniquely determine the position xo of the target object otherwise, when the target position is not uniquely known, the required robot move to reach the target cannot be determined from the observation. Also, the visually observed robot features figr representing the robot end-eector must uniquely determine its world coordinates xr , to ensure that the relation between the vision domain and the robot domain is learnable. 2. Internal `robot-based' camera: the target object is observed by cameras which move together with the robot's end-eector. The observed object position (the visual observation ) may not uniquely describe the object position in world coordinates. However, now measurements together with the robot position must uniquely de ne the world coordinates xo if a non-ambiguous motion plan has to be made. These two setups are basically dierent in the following way. In the case of world-based vision, the target object, the robot end-eector, and the (positional) relation between the two must be determined. In robot-based vision the target object is observed in a camera coordinate frame relative to the robot end-eector position. Therefore it is not necessary that the position of the robot end-eector be observed, and hence robot-based vision is simpler and more robust. Robot-based vision has another advantage over world-based. The positional precision that can be extracted from a quantised camera image is inversely proportional to the distance between the camera and the observed scene (when the focus of the optical system is xed). Thus the visual precision increases when the target object is approached the nite resolution of the camera is no limiting factor. World-based cameras, which are not moving with the gripper and yet have to see the whole work space, have to be placed rather far away, typically in the order of 2m from the robot base for a robot with a typical arm length of 1m. Thus their precision will be limited, typically 0.5{1cm for the application in this chapter. The perspective transformation that maps 3D points on a plane (c.q., the image plane) is a many-to-one relationship and is thus not invertible. A single image obtained from one camera does not provide the required depth information to reconstruct the positions of the observed components in the 3D world. Since the controllers developed within the scope of this thesis needs such information, additional a priori knowledge is required. Two solutions are considered.
34
Chapter 3. Principles of hand-eye coordination
1. Model-based monocular: in this case, a priori knowledge of the observed object is assumed. For instance, when a sucient set of point features of the object can be observed, and the position of these features on the object is known, the position of the object relative to the camera in world space can be reconstructed. For more detailed information about robust model-based approaches, consult, e.g., (Kanade, 1981 Lamdan & Wolfson, 1988 Gavrila & Groen, 1992). An exemplar method is the following: when the camera is looking down to the scene consisting of a single object with a at, horizontal surface, the observed area of the object is inversely proportional to the square root of the distance. 2. Correspondence-based binocular: When no a priori knowledge is present, triangulation can be used to measure depth. This can be realised with a stereoscopic system (Ballard & Brown, 1982 Fu et al., 1987). When two cameras, whose image planes are situated at known (relative) positions, observe the 3D scene, the corresponding point features from both images can be used to reconstruct the 3D image. As shown in, e.g., (Ritter et al., 1989), the relative image plane positions need not be calibrated but can be incorporated in the learning mechanism. When the visual scene becomes increasingly complex, or the number of degrees of freedom for positioning the robot manipulator increases (e.g., not only the position but also the orientation of the hand is of importance), the complexity of stereo vision will increase considerably. Especially for complex visual scenes, the correspondence or matching problem (Marr, 1982 Ballard & Brown, 1982) becomes a signi cant problem: which point features in the left image correspond with which point features in the right image? In conclusion, the advantage of model-based monocular vision is the increasing precision, avoidance of the correspondence problem, and simpler image processing. An additional advantage is that the problem of occlusion of marker points or parts of the observed object (e.g., due to the rotating robot arm) is avoided. A disadvantage of this method is the requirement of a priori knowledge of the observed scene. Relative depth information can be obtained by using sequences of visual images. By measuring the divergence in an image when approaching an object (e.g., how much an observed object gets larger when it is approached), the visual distance divided by the visual velocity. Note that the absolute distance cannot be measured e.g., a large object far away cannot be distinguished from a nearby small object.
3.1.1 The OSCAR visual system
Taking these advantages and disadvantages into account, the preferred visual setup was decided to consist of a robot-based camera, viz. a camera mounted in the gripper of the robot, combined with model-based vision. As mentioned, the disadvantage of this method is that a priori knowledge of the object is required. For most industrial (conveyor belt) applications, however, this knowledge is available. Secondly, in chapter 5 a setup will be discussed which uses additional sensing through multiple images, thus not requiring
35
3.1. Robot perception
absolute positional information of the object but relative positions (i.e., the change in the position when the camera is moving). Figure 3.1 shows the placement of the camera and optical system in the centre of the robot's gripper. The robot hand is always positioned in such a way that the camera looks `straight down' at the table placed at the robot's base. On or above the table a single white object is placed, from which the relative 3D coordinates must be measured. Due to the finger camera xture the following simple model suces to describe the (idealised) camera mapping. In the optical system used, the image plane (CCD) is placed at the focal distance f from the lens, such that the point of focus is at in nity. An object camera and placed at distance dz + f from the lens will be projected on optical system a point h from the lens, such that (Hecht & Zajac, 1974) Figure 3.1: Camera placed 1 +1=1 in the manipulator of the (3.1) dz + f h f robot. as depicted in gure 3.2. In the proposed system, the depth dz will be derived from image plane (CCD)
q
dz
f
f
f
h
cb
Figure 3.2: An optical system with the CCD placed at focal distance from the lens. The symbols are referred to in the text.
the projected (or observed) area A with relation to the `real' area A of the object A is de ned as the projected area of the object when dz = f . This observed area is measured as the number of white pixels (constituting the object) on the CCD. Assuming a pinhole camera, the dz and A are related as
s
dz = f A : (3.2) A Note that f and A are constants for one particular lens and object. Given the placement of the camera, and the data that are obtained from it, the visual processing system can be discussed in more detail. The system must accomplish the following tasks (Fu et al., 1987):
36
Chapter 3. Principles of hand-eye coordination
1. image acquisition: the image, projected on the camera's image plane, must be transferred to the memory of the image processing hardware. This renders a discretised image I 2. image segmentation: from the discrete image I it is determined which parts represent components, and which represent background. This is typically done using the basic principles of (dis)continuity (i.e., edge-based) and homogeneity (i.e., region-based) 3. image description: for the purpose of component recognition and for subsequent use in the control algorithm, for each component, features describing the properties of the component as well as its position are determined 4. component recognition: the identi ed components are labeled as target object, robot hand, background, : : :. For the purpose of this thesis, only the target object and the background need to be identi ed. The visual system used for the experiments described in this thesis needs only identify the 3D position of an observed object the depth is calculated from the area of the object. It is advantageous when the visual preprocessing does not form a bottleneck in the proposed control algorithms. The time delay presented by the camera system has a lower bound of 40ms, which is the time needed for image acquisition. During acquisition of I i + 1], the image processing system has time to extract features from image I i]. By keeping the visual scene suciently simple, the required features can be extracted from the approximately 480 512 pixels of 8 bits width within 40ms by using a DataCube MaxVideo-20 image processing board. It is therefore chosen that the visual scene consists of a single white object against a dark background. From this object, the (x y) position (x y ) on the image as well as its area A are extracted. Here, x and y are de ned as the number of pixels measured in x and y direction between the centre of the image and the centre of the object A is de ned as the number of white pixels in the image. Details of this operation are described in (Bartholomeus, 1993). These three features portray all information for the 3D localisation of the object however, choice of these features is not essential for the control algorithm. The simple visual setup is chosen to get a good performance in the preprocessing and suces for the current research purposes. For the neural controller, it is only essential that the visual features are complete, i.e., contain all necessary information for obtaining disambiguating positional information of the object.
3.1.2 Measurement errors Although the above models are sucient for describing the principle of the camera system, the visual measurements are inuenced by noise and errors, which we will discuss in more detail.
37
3.1. Robot perception
Error in visual position In a study by Lagerberg et al. (Lagerberg, van Albada, & Visser, 1993) of the optical system used, the eect of the wide angle lens and errors due to inexact placement of the CCD are investigated. We will consider the results of this study. According to a model which is based on measurements with the OSCAR camera, the following deformations have to be taken into account: 1. A point (xc yc zc) in the camera coordinate system (which coincides with the robot end-eector coordinate system) is projected on a point (u v w) in the image plane coordinate system as
0 :5 B@ 509 0:0 ;0:0735
;2:47 ;619:5 ;0:0484
10 1 0 1
0:0 xc u C B C B 0:0 A @ yc A = @ v C A: 1:0 zc w
This means that there is an angle of 5:0 between the normal to the image plane and the z axis of the camera coordinate system. 2. The optical centre of the lens is not the centre of the CCD. These centres are located at a distance ((x = 46 (y = 30) pixels from each other. 3. Deformation in the x and y direction due to the wide-angle lens. The x, y coordinate which would have been obtained with a pinhole camera is projected on the deformed xv , yv on the CCD as
xv = x ; k1r (x ;1 +(xk)r+2 +k2kr r(x4 ; (x) 1 2 2 (y ; (y ) + k r4 (y ; (y ) k r yv = y ; 1 1 + k r2 +2k r4 1 2 r2 = (x ; (x)2 + (y ; (y)2 2
4
(3.3a) (3.3b) (3.3c)
where k1 = 6:24 10;7, k2 = ;4:97 10;13.
Errors 1 and 3 will introduce systematic errors error 2 can be ignored, since it only results in a shift of the end-position, which will be automatically learned by the adaptive controller. The eect of error 1, which results in a cosine relationship rendering up to one pixel inaccuracy, is negligible with respect to error 3. The eect of error 3 can be the Euclidean pixel distance between the ideal and deformed measured q by calculating pixel (xv ; x)2 + (yv ; y)2. In gure 3.3 this deformation is shown as a function of the position on the CCD. As the gure shows, a measurement can be as much as 40 pixels or 8% o. Secondly, there is a stochastic error in the observed position. This error is related to the errors in the area measurement and are discussed in that section.
38
Chapter 3. Principles of hand-eye coordination
200 100 0 y ;100 ;200 -200
40 0
-100
;200 ;100
0
x
-200
-100
q
x
0
0 y
100
100 200 100
200
200
Figure 3.3: Deformation (xv ; x)2 + (yv ; y )2 in the camera image. The deformation is calculated with the model described in (3.3) and is given by the Euclidean distance between (x y ) and (xv yv ).
Error in area
Now what is the error in the measurement of A, given an ideal lens? Two sources of error are distinguished: (1) due to discretisation of the image on the CCD, which is stochastic and (2) due to blurring (out of focus), which is a systematic error. We will discuss both errors in turn.
Discretisation error. To investigate the discretisation error, a series of tests were
performed where the area of a single white object is measured when the robot arm is kept at a constant position relative to the stationary object. An image I was thresholded using a xed threshold and the number of white pixels was counted. A typical probability density for the measured area of an object is shown in gure 3.4a. The standard deviation in the area measurement increases linearly with the square root of the area, as can be seen from the measurements in gure 3.4b. Therefore, we take the standard deviation of the length at the edge of the object (circumference), i.e., the area to be proportional to p approximately proportional to A:
q Adisc = cd s A (3.4) p where s is a shape factor, which is 4 for a square object and 2 for a circle. From the measurements in gure 3.4b the constant cd s can be found to be approximately equal to 0.0036 i.e., the error is in the order of 0:36% of the circumference of the object for a circular object.
Blurring error. The reader is referred back to gure 3.2. In this gure, the blur radius rb can be combined with the lens equation (3.1) to obtain q + rb = rb h h;f
(3.5)
39
3.1. Robot perception
probability density
standard deviation / pixels 14 12
0.08
10
0.06
8 6
0.04 4 2
0.02
50
100
150
200
250
300
350
p
circumference/pixels/( =2)
area/pixels2
19260 19265 19270 19275 19280 19285
a.
b.
Figure 3.4: a. Distribution of the measured area for a single, stationary circular object. b. Average circumference (square root of area) and standard deviation of the measured area for several object sizes. The A is varied by varying A. In all tests, a sample size of 5,000 was used.
where q is the aperture of the lens. Thus we assume an error model similar to (3.4) for the blurring error. Since from (3.1), h=(h ; f ) = (dz + f )=f , we nd with (3.2)
s q = dz = A rb f A
such that
rb = pq A
q
A:
(3.6) (3.7)
So the resulting error "Ablur in the area is the length of the object edge multiplied by the blur radius: "Ablur = cbA (3.8) with (3.9) cb = pq s: A For the camera used in the experiments, the values of the variables are f = 4:8mm and q 1mm (according to de nition, the aperture q is related to the `f-number' (written as f=#) as q = f=(f=#) in the experiments, a lens with f=# 5 was used). So, we conclude that cb A;1=2 s. When the object is placed at focal distance from the lens (i.e., A = A) then the rb from gure 3.2 equals the aperture q, i.e., total blur. In the experiments, the system is typically used up to values of dz = 10cm with a corresponding A = 25 000 these values lead to a cb = 0:05, i.e., 5% of the edge of the object is unsure. The error that is introduced by this eect is systematic instead of stochastic therefore, it will have no negative eect on the adaptive controller that is used to control the camera-robot system, since its eects can be learned.
40
Chapter 3. Principles of hand-eye coordination
Finally, the positional error is stochastic, and measurements show that the following relationships exist: q q x 0:010 A y 0:016 A: The dierence in error between the x and y components is a result of the construction of the camera's CCD.
Discussion Further errors are introduced by the vision software, variations in light intensity, etc. For instance, the choice of threshold that is used in binarisation of the image has an eect on the nal image size. In this case we chose to take a xed threshold, since most advanced thresholding techniques (Chow & Kaneko, 1977 Ballard & Brown, 1982) cannot easily cope with an image which is mostly black and has only a very small (it may be as small as 100 of 2:5 105 pixels) region of white pixels. Furthermore, the error introduced by using a xed threshold will be systematic rather than stochastic, and can thus be learned by an adaptive controller. In conclusion we have found the following measurement errors for the area: a stochastic error of q Adisc 3:6 10;3 A (measured) and a systematic error of
"Ablur 5 10;2A (computed)
plus a systematic error of up to 40 pixels in the measured position in the image with respect to the expected position where an ideal lens is assumed. All systematic errors, as long as they are not in conict with used models (in particular, assumption of linearity of the lens transformation) of the visual system, will be compensated for by an adaptive controller.
3.2 Robot Motion In this section, the de nition of `robot move' will be examined. A distinction between the static and the dynamic description is made. With a static description, the position and motion of the robot is considered in terms of its joint values. That is, for a given non-compliant robot, the position and orientation of the robot's end-eector is always uniquely determined by the robot's joint values. This correspondence is known as the robot's kinematics. Secondly, the dynamics are concerned with the forces (torques) necessary to move the robot. A robot is moved by applying forces (torques) to its joints. The motion that is induced is generally described in terms of the variables (joint position), _ (joint velocity), and (joint acceleration), and parameter matrices describing the inertia, the Coriolis coecients, centrifugal coecients, friction, and gravity. The relationship between the path followed in joint space (i.e., in terms of , _ , and ) and the torques applied to the
41
3.2. Robot Motion
joints is known as the robot's dynamics. For a sti robot, this relationship can usually be modelled within acceptable accuracy, such that a relatively simple controller can be made which calculates the desired torques to eectuate a path in joint space. Dynamics are discussed in section 3.2.2. In the subsequent two chapters, only knowledge of the position and motion of the arm will be used in the controllers the dynamics (i.e., forces that cause the motion) are not taken into account at this control level. This means that it is assumed possible to control the robot arm by specifying paths in joint angles rather than torques the underlying robot dynamics controller, currently not of interest, will actually perform the calculations necessary to move the arm. Section 3.2.3 will discuss the inuence of the limited accuracy of the dynamic controller used in the experiments.
3.2.1 OSCAR kinematics
A exible way of describing robot kinematics is by means of the Denavit-Hartenberg notation. With each joint i of the robot, four parameters i , bi , ei , and i are associated (Craig, 1986 Fu et al., 1987). For a rotational joint, i is variable for a translational joint, bi is the variable. All the other parameters are constant for that particular joint. Figure 3.5 shows the de nitions of the parameters. When with each link i a coordinate frame is associated, the transformation from the coordinate frame attached to link i ; 1 to the coordinate frame attached to link i is given by the homogeneous matrix i;1Hi:
0 cos ; cos sin i i i B sin cos cos B i i i i;1 H = B i @ 0 sin i 0
0
sin i sin i ei cos i 1 ; sin i cos i ei sin i CCC : cos i bi A 0 1
(3.10)
The position and orientation of link i relative to the base frame is then given by the matrix 0 H 0 H 1 H i;1 H (3.11) i 1 2 i where i is more than 0 but less than or equal to the number of degrees-of-freedom (DoF) of the robot. Using this notation, the product i Hj x expresses the vector x in coordinate frame j , when x is originally expressed in coordinate frame i (see gure 3.6). Thus it is possible to determine the position of an object relative to the robot's end-eector when it is known to the robot's base (frame 0)|or vice versa. Equation (3.11) describes the forward kinematics of the robot. The inverse kinematics calculates the joint angles that the robot must assume to reach (with the endeector) a given point in space with a given orientation this point is expressed with respect to the base frame (frame 0) of the robot. The joint values are computable when the target state is uniquely described in terms of position and orientation of the coordinate frames. Often, however, only the position and orientation of the end-eector is of importance, in which case multiple solutions (such as elbow-up and elbow-down) may exist. As mentioned above, this is solved by simply choosing a preferred solution such
42
Chapter 3. Principles of hand-eye coordination
axis i
axis i ; 1
zi
zi;1
ei
bi
i
i xi
Figure 3.5: Parameters i , i , bi , and ei for joint i. In this case, the joints are rotational, such that the corresponding 's are variable and the other parameters are constant. The parameters are dened as follows:
i is the joint angle from the xi;1 axis to the xi axis about the zi;1 axis (using the righthand rule)
bi is the distance from the origin of the i ; 1st coordinate frame to the intersection of the zi;1 axis with the xi axis along the zi;1 axis
ei is the oset distance from the intersection of the zi;1 axis with the xi axis to the origin of the ith frame along the xi axis (or the shortest distance between the zi;1 and the zi axes)
i is the oset angle from the zi;1 axis to the zi axis about the xi axis (using the right-hand rule).
2H
3
1H
2
0H x 2
0H x 1 0H 1
0H x 3
x
coordinate frame 0 Figure 3.6: Multiplication by homogeneous matrices. The vector x expressed with respect to coordinate frame 0, 1, 2, and 3.
43
3.2. Robot Motion
i 1 2 3 4 5 6
i bi ei * 46cm 0cm * 0cm 51cm * 0cm 0cm * 50cm 0cm * 0cm 0cm * 21cm 0cm
i ;90 0 90 ;90 90 0
Table 3.1: Denavit-Hartenberg parameters for the OSCAR{6 robot. An `*' indicates that the corresponding parameter is a variable.
a choice is often given by the underlying path planner, e.g., by taking the shortest path from the current to the target position. The robot used in the experiments is a 6 DoF anthropomorphic robot called OSCAR{6. Figure 3.7 gives a schematic representation of this robot, showing the coordinate frames and the de nitions of the joint angles. The corresponding Denavit-Hartenberg parameters y joint 3
3
x3 x 2 z 2 y joint 2
2 180−θ3
link 2
z
−θ 2
1
x
1
y
joint 4
joint 5
z3
link 3
z y
4
x4 x
4
z
y
x
5
5
θ5
y
6
6
joint 6
z
5
6
1 link 1
z
0
y
0
joint 1
x0 link 0 (the world)
Figure 3.7: Line drawing representation of the OSCAR{6 robot used in the experiments.
are given in table 3.1. As de ned in the introduction, the control system is restricted to three DoF. Therefore, only three of OSCAR's six joints are controlled by the adaptive controller. Speci cally, joints 4 and 6 are always kept at a constant value joint 5 is thus adjusted that the camera, xed in the robot's hand, is always looking down. To extend the problem to four DoF by adding the rotation of the robot manipulator, the controller must also generate a fourth rotation, namely for joint 6. Knowledge of the kinematic structure of the robot, however (table 3.1) shows that, in theory, this extra degree of freedom can be regarded separate from the other three, such that the problem can be split up in two dierent controllers:
44
Chapter 3. Principles of hand-eye coordination
one for the rst three joints taking care of the position of the end-eector, and one for joint 6 to obtain a correct orientation.
3.2.2 Dynamics
The dynamics of any d degree of freedom robot with rotational joints can be described by the equation (Craig, 1986)
2 _ h_ _i T = F1 () + F2 () + F3() _ + F4 ( _ ) + F5 ()
(3.12)
where T is a d-vector of torques exerted by the links, and , _ , and are d-vectors denoting the positions, velocities, and accelerations of the d joints. _ _ ] and _ 2] are vectors 2 h h_ _i h_ _ _ _ iT i _ _ (3.13) = 1 2 1 3 : : : d;1 d _ = _12 _22 : : : _d2
F1() is the matrix of inertia, F2 () is the matrix of Coriolis coecients, F3() is the matrix of centrifugal coecients, F4( _ ) is a friction term, and F5 () is the gravity working on the joints. When the robot has to move from one joint position to another, a torque must be applied which generates T . The problem of calculating the correct torques (forces) to have the robot arm follow a speci ed trajectory is known as inverse dynamics. Industrial robots, such as the OSCAR robot used in the experiments, are generally designed to eliminate the interdependence between the joints, such that the robot arm can be regarded as d independent moving bodies. In that case, F1 and F3 are diagonal matrices and F2 is zero. This reduces the 3d-values vector eld as described in (3.12) to d independent functions of three variables for which the coecients have to be found. Also, the link actuators are usually made so powerful that F1 , F3 , F4 , and F5 can be considered independent of . For this simpli ed (and common) case, various standard methods exist to compute the inverse dynamics (Fu et al., 1987). This controller eliminates the requirement of knowledge of the robot arm in order to control it. When using such a control method, e.g., a PID controller which is used for the OSCAR robot, the robot can be controlled by specifying joint values, velocities, and accelerations, and knowledge of the required forces is not required. This methods is used in the controllers presented in chapters 4 and 5.
3.2.3 Error in arm positioning
The simpli cations made in the dynamic model, as well as the mechanical construction and joint sensor precision of the robot lead to errors in positioning. The precision of the robot is tested by moving it randomly at various joint speeds, and checking the resulting position against the desired position. For each joint speed, 100 trials are performed. The result of this test is shown in gure 3.8. It can be seen that the average joint positioning error is hardly dependent of the rotational velocity, except for joint 5. The construction of joint 5 explains the better positioning behaviour for this joint at higher joint velocities:
45
3.2. Robot Motion
mean error / degree
max error / degree 1.50
0.8
1.20 0.6 joint 5 0.90
joint 5 0.4
joint 3
joint 3 0.60
0.2
joint 1 0.30 0.0
10.0
20.0
30.0
speed (deg/sec)
joint 1
joint 2
joint 2 0.0
10.0
20.0
30.0
speed (deg/sec)
Figure 3.8: Precision of PI controller at various joint speeds.
it is expected to suer mostly from stiction (OUG, 1992). The maximum error is in the order of 0:5 for joints 1, 2, and 3 it is therefore not useful to demand a higher accuracy in the adaptive controller used for this robot. With table 3.1, this leads to an end-eector positioning error of up to 12mm when the robot arm is stretched and the error in joints 1 and 2 is 0:5. It is therefore not required that the control algorithms, when applied to the OSCAR robot, exceed 1cm in positioning accuracy. point of Most important is the error in 5 . When the camera is rotation assumed to be always looking down, the rotation of the image plane due to an error "5 in the rotation 5 is not measured and corrected for. Refer to gure 3.9. To determine the l = 17cm inuence of this measurement error, consider that a rotation of 5 results in an apparent translation of the object over the image plane, since the latter is no longer parallel to the "5 image table on which the object is placed. This translation can plane be quantised as follows. When considering the position of lens the projected centre of the camera image on the table, it ( l is shifted over a distance (x = (z + 17cm) cos "5 due to a Figure 3.9: Translation and rotation "5. The constant 17cm is the distance from the lens rotation of the image plane to the point of rotation it is placed 5cm above the point due to error in . (x is where the gripper's ngers join when grasping. For "5 0:5 the corresponding5 shift on (see gure 3.8) and z = 50cm, the shift on the table is (x the table. 12mm. The errors in joint values inuence the grasping precision. Obviously, requesting a precision of much more than 0.5 degree per joint will be dicult to attain, since the robot cannot reach that accuracy. In summary, an accuracy exceeding 0:5 per joint is not to
46
Chapter 3. Principles of hand-eye coordination
be expected with the robot used.
3.3 System Description Having qualitative models of the sensors and the robot, a controller must be designed to solve the task set out in the introduction. The design of a controller depends on the knowledge that is available from the process that is to be controlled. Three stages of control are distinguished (Bellman & Kalaba, 1959 Narendra & Annaswamy, 1989): 1. when the controller has complete information about the behaviour of the inputs of the process, and this process is fully speci ed, the process is called a deterministic control process 2. when unknown variables are present as random variables with known distribution functions, the process is called a stochastic control process 3. nally, when even that information is not available, but the controller must learn to improve its performance by observing the behaviour of the controlled process, this process is referred to as an adaptive control process. Clearly, within the scope of the task set out in the introduction, the process that is controlled is in the third of the above categories. An adaptive controller has the problem of the following duality (Feldbaum, 1965 Narendra & Annaswamy, 1989): rst, it must identify the process that is to be controlled and nd its parameters, and secondly, it must nd which actions are required for control of the process. Two solutions exist (Narendra & Annaswamy, 1989 Narendra & Parthasarathy, 1990 , Astrom & Wittenmark, 1989): direct adaptive control and indirect adaptive control. In indirect control ( gure 3.10b), the parameters of the plant (i.e., the system that is controlled) are estimated on-line, and these estimates are used to update the parameters of the controller. In direct control, however, plant parameters are not estimated but the control parameters are directly updated to improve the behaviour of the system ( gure 3.10a). The direct control method in gure 3.10a works as follows. The plant is controlled by a signal u, and outputs a signal yp. The reference signal is a signal r which is input to the controller r can be regarded as the desired situation or desired state of the system. The reference model is used to compute the desired plant output ym from the reference signal or setpoints r. The reference model translates r to the domain of yp, resulting in ym . The task for the controller C is to generate a signal u = C (r) which minimises kym ; ypk, i.e., it minimises the dierence between the actual and the desired situation. This error signal is subsequently used in the update of the controller. The direct control method is normally not used for the following reason. The error signal kym ; ypk carries information on how the output of the plant must change, and not the output of the controller. In order to adapt the controller, however, the error must be available in terms of u. This can only be determined when @ u=@ yp is known, which
47
3.3. System Description ym
reference model
−
r
Σ
ym
reference
ident. model
model
−
r
Σ
C
controller
u plant
yp
(p)
p
+ Σ
+
C
controller
u
plant
a.
e
+
−
yp
b.
Figure 3.10: a. Direct adaptive control. b. Indirect adaptive control.
requires an inverse model of the plant. This is a serious drawback of this approach, since this inverse model is usually not available. A solution is given by the indirect control method. An extra model called the identication model is introduced in the system, which learns to copy the behaviour of the plant, i.e., it is a forward model of the plant. This model will be some parametric model of which the resultant parameters p describe the behaviour of the plant. These parameters are then used in the update of the controller.
3.3.1 Structure of the proposed controller
However, the direct control method can still be used when no reference model is needed. This is the case when the reference signal r is expressed in quantities which can directly be measured from the plant, i.e., r and yp are in the same domain. The robot/camera system can be seen as a discrete state machine, where the controller C is assumed to be delay-free, and we write d i + 1] instead of r, and ( i + 1] i + 1]) for yp. From i to i + 1 two measurable transitions occur: from i] to i + 1] and from i] to i + 1]. Both are representations of the state of the camera-robot system the the internal robot state (viz. joint values) and the the visual observation. Since the camera is hand-held and only observes the object, neither representation alone suces to uniquely describe the state of the system this state is only given by pairs ( i] i]). Since the robot moves from internal joint state i] to i + 1], we will denote the controller output which eectuates this move by ( i]. Also, note that i + 1] is the signal for which d i +1] is the desired value. This means that, when i +1] = d i +1], a `successful' ( i] had been generated and applied to the robot. Dierently put, we know that C 0 ( i] i] d i + 1]) = ( i]
48
Chapter 3. Principles of hand-eye coordination
where C 0 is the ideal controller. In all other cases a valid transition is still available:
C0 ( i] i] i + 1]) = ( i]
even though i + 1] 6= d i + 1]. Thus we can focus on direct control only, and do not require an identi cation model after all, the plant output and the controller input are in the same domain. The resulting controller is depicted in gure 3.11. d i + 1] − Σ
C
controller
( i]
( i] i])
robot
+
( i + 1] i + 1])
z;1
Figure 3.11: Structure of the proposed controller. After receiving input i], i], and the desired d i + 1], the controller has enough information to determine the plant input ( i]. The z ;1 is a delay.
The studied direct adaptive controller, used in all subsequent chapters, consists of a feedforward neural network trained with conjugate gradient optimisation. We will refer to this neural controller by the symbol N instead of C . The universal approximation capabilities of such networks allow them to be used in a direct control scheme while eliminating the requirement for a reference model.
3.3.2 Generating learning samples
The task of the network is to learn the relationship between the transition from i] to i + 1] on the one hand, and i] to i + 1] on the other. The neural network must be trained to generate robot motion commands ( i] to control i] towards d i + 1]. To teach the neural network from the transition i ! i + 1 two approaches are considered: learning by interpolation and input adjustment learning.
Learning by interpolation. In learning by interpolation (van der Smagt et al., 1992a), besides the robot state i], not only the reference signals i] are input to the neural controller, but also their desired values at i + 1, d i + 1]. The ideal controller C 0 would generate a correct ( i] which makes i + 1] = d i + 1]. When the i + 1] 6= d i + 1], the generated ( i] was not correct, but still it is known that the camera-robot system makes a transition from i] to i + 1] which can be used as learning sample. The closer i + 1] and d i + 1], the more useful information this transition carries. In any case, a learning sample is available: input ( i] i] i + 1]) maps on ( i].
49
3.3. System Description
learning sample learning sample projected with input adjustment learning the line where = d
1.4
output 1.2
1
0.8 0.6 0.4 0.2 2 2.5 3
3.5 y
0 0.5 1
4
1.5
2
4.5
2.5 5
3
xd
Figure 3.12: Learning by interpolation. All learning samples are positioned on a hyperplane (in the gure shown in two dimensions). The only part of the hyperplane that is used for control of the robot arm describes, in the gure, a line. Because of the increased dimensionality of the plane, the learning samples are distributed around the line, instead of on it. By using input adjustment learning, the input of the samples is thus transformed that they move to the line.
This process of generating learning samples is illustrated in gure 3.12. The learning samples will, in general, be situated `around' the line where i + 1] = d i + 1], but by interpolating those learning samples, an approximation for that line is assumed to be generated. Note that the dimensionality of the input space had been increased by the dimensionality of , leading to a sparseness of the learning samples. This sparseness can be understood from gure 3.12. Samples are positioned on the hyperplane, whereas only the values situated on the `hyperline' are used by the controller.
Input adjustment learning. In input adjustment learning (Krose, van der Korst, &
Groen, 1990), the input of the neural controller consists only of the i] and i]. Again, this renders a ( i] which, when given to the robot, leads to i + 1]. Again, when i + 1] = d i + 1], the controller generated a correct move, and a learning sample that lies on the `hyperline' in gure 3.12 is available. Otherwise, the following question is asked: if i] led to i + 1] with that robot move, which 0 i] should we have had such that the same robot move would have led to d i + 1]? This is summarised as follows:
50
Chapter 3. Principles of hand-eye coordination
i] 0 i] = f ( i] i + 1])
leads to i + 1] by ( i] leads to d i + 1] by ( i]
e.g., when -space is linear: i]
; i + 1] + d i + 1]
leads to d i + 1] by ( i].
3.3.3 Coupling the robot with the camera As was mentioned in section 3.1, the output of the camera alone does not portray sucient information about the position of the target object, but must be augmented with the state of the robot . In the 3 DoF problem, this would mean knowledge of the joint values 1 , 2 , and 3 . However, the value of 1 is not necessary, as can be seen from the robot structure ( gure 3.7) and the robot kinematics (see appendix D). Why this is y0
y6
p0 p6
(1
b x6
1 x0
Figure 3.13: View of the robot arm from above. The left gure shows the robot arm before and after the rotation (dotted lines), the right gure shows the rotation that the arm has to make with respect to coordinate frame 6.
so can be explained as follows (Jansen, 1992 van der Linden, 1994). Consider gure 3.13. This gure shows the robot arm from above. In order to reach the point p0 (in world coordinates, frame (x0 y0)), which is the same as p6 with respect to the coordinate system 6 of the end-eector, the robot arm has to rotate over (1 . The distance in 2D from the origin to the gripper is given by
d = e2 cos(;2 ) + b4 sin 3
3.4. Conclusions
51
where e2 is the length of link 2 and b4 the length of link 4 (table 3.1). In order to reach the object at p6, joint 1 has to eectuate a rotation (1 such that tan((1 ) = y6 jdj ; x6 which can be derived from the gure. Clearly, this rotation is independent of 1 . Therefore, the neural controllers do not require the position of OSCAR's joint 1 as an input. To reduce the dimensionality of the input space, this simpli cation will be made in the experiments.
3.4 Conclusions In this chapter we have laid out the basis for constructing an adaptive controller for robot hand-eye coordination. The following basic design considerations have been made: the robot arm is eectively 3 DoF since that suces for most pick-and-place problems encountered in industry in order to get high visual accuracy, to avoid the correspondence problem, and to simplify the image processing, a monocular visual system is located in the robot's end-eector p component Adisc 3:6 10;3 A the accuracy of the visual system has a stochastic and a systematic component "Ablur 5 10;2A where A is the observed area of the single object. The systematic error can be compensated for by the adaptive controller, whereas the stochastic error cannot the accuracy of the robot system is bounded by 1cm in the end-eector position or 0:5 degree in joint accuracy the task of the system is to place the end-eector (i.e., camera) directly above the observed object this goal is expressed in visual domain, such that direct learning can be used. These design considerations are used in the neural network controllers described in chapters 4 and 5.
52
Chapter 3. Principles of hand-eye coordination
Chapter 4
Eye-in-hand positioning The positioning of a robot hand in order to grasp an object is a problem fundamental to robotics. The task we want to perform can be described as follows: given a visual scene the robot arm must reach an indicated point in the visual scene. This marked point indicates the observed object that has to be grasped. In order to accomplish this task, a mapping from the visual scene to the corresponding robot joint values must be available. The task set out in this chapter is to design a self-learning controller that constructs that mapping without knowledge of the geometry of the camera-robot system. When the position of the object is unequivocally known with respect to the robot's base, and the robot geometry (the kinematics) is known, a single computation followed by a robot joint rotation suces to reach the indicated position. In the case of the robot arm equipped with a monocular hand-held camera, this static positioning can be obtained when a sucient model of the observed object is available such that depth can be measured for instance, the area A of the viewed object surface must be known. In many industrial applications, where robots are used to pick up or work on objects on conveyor belts, this approach is very typical. As was set out in the requirements in the previous chapter, it is assumed that no explicit model of the robot is available. Moreover, the robot is controlled by specifying changes of its joint values, and its `current' joint values can be measured. Also the visual system needs no precise calibration, except for determining the relationship between depth and area close to the point where the vertical distance is de ned as zero (0). Given the visual data and the robot positional data, a model-free adaptive neural controller learns to generate robot joint rotations which position the end-eector directly above the target object. The system is thus set up that the relation between sensor input and robot motion is many-to-one, i.e., given a sensor reading, a robot motion can be uniquely determined. This simpli cation means that there is always only one posture (arm con guration) to reach a speci c position for instance, there is no elbow up{elbow down ambiguity. As customary in robotics, this situation is realised by choosing a preferred situation (c.q., the elbow up con guration) and never presenting the conicting (elbow down con guration) to Part of this text was previously published as a technical report (van der Smagt, Groen, & Kr ose, 1993). ()
53
54
Chapter 4. Eye-in-hand positioning
the adaptive controller. Note that it is not guaranteed that the elbow down con guration will never occur when the controller parameters are suciently disturbed, this case is, in theory, not excluded. Similar systems have been studied in the past. In (Weiss et al., 1987) an indirect control mechanism is described for eye-hand coordination with visual servoing. This closedloop system generates robot motor commands (cf. torques) directly from its visual input. However, models of the camera and manipulator are required. A similar problem is tackled by Miller (Miller III, 1989), who uses an adaptive cerebellar-based CMAC neural network for model-free visual servoing. Kohonen networks are used in (Ritter et al., 1990) and (Hesselroth, Sarkar, van der Smagt, & Schulten, 1994) for model-free binocular handeye coordination. Although a high accuracy is obtained in only two feedback steps, the method needs several thousands of trials before this accuracy is reached|a run that takes hours using a real robot. In order to design a system which can be successfully used in real-world applications, there are two important issues which have to be considered. First, in real-world applications of robot systems, a `reasonable' training time must be ensured. Real robots move slowly in comparison with simulated robots, so it is important that after only a few trials the goal is reached. Secondly, the added value of self-learning systems must be fully exploited: it is essential that the method adapt to unforeseen gradual or sudden changes in the robot-camera system. The combination of these two points has been ignored in previous approaches. In our approach no explicit models of the camera or the robot are available. The camera-hand mapping must be learned by the neural network based on the (measured) behaviour of the camera-robot system. Hereto learning samples are gathered during the control process and added to the learning set. The size and exact implementation of this learning set is directly related to the adaptability and accuracy of the system: a large learning set leads to a sluggishly adapting system, whereas a small learning set cannot be used to construct an accurate controller. Therefore its size must be varying, as described in section 4.2.2. Two points are stressed. First, it is important that the neural controller exhibits fast learning: the adaptive control of the real robot should have eect in the order of seconds, rather than hours. This requirement means that long learning sessions are not acceptable, since they are not realistic in real-world applications. Secondly, the system must be capable of adaptation. When the robot-camera mapping changes (e.g., due to an unforeseen bending of a joint or rotation of the hand-held camera, or when a precise model of the robot (such as the rubbertuator robot discussed in chapter 6) cannot be determined), be it gradually or suddenly, the neural controller must be able to relearn the new behaviour, again fast. In order to combine these requirements with a highly accurate controller, the structure of the system that is proposed in this chapter allows that multiple controllers|ranging from coarse but fast-adapting to accurate but slowly adapting|be used in parallel. Note that learning is performed over a continuously changing set of learning samples, instead of learning one sample at a time (see chapter 2). Nevertheless learning is on-line, since after each learning iteration the adapted network parameters, resulting from the newly obtained learning samples, are used in the controller.
55
4.1. Formal statement of the problem
A detailed description of the problem at hand is given in section 4.1. A description of the controller that solves the set task is given in section 4.2. The resulting system is applied to a simulated and a real robot, leading to the results in section 4.3. Finally, a discussion is given in section 4.4.
4.1 Formal statement of the problem As concluded in the previous chapter, the visual observation that is minimally required to grasp the object consists of the observed position (x y ) and area A of an object projected on the camera's CCD. We will write = (x y A) to denote the visual observation. Secondly, the position of the robot is described by its joint values = (1 2 3). As was shown in chapter 3, in order to control the robot, 2 , 3 , and suce to describe the state of the camera-robot system at any time this however only leads to (and is used as) simpli cation of the controller but has no conceptual implications. Therefore we will continue writing as controller input, while we know that (2 3 ) suce. In this chapter, a neural controller will be constructed which, given the state ( ) of the camera-robot system, generates a robot setpoint ( = ((1 (2 (3 ), to move the system to the goal state, i.e., reaching the state where = d :
(4.1)
When this state is reached, we say that the goal is attained. The desired visual state is de ned as d (0 0 Adesired), i.e., the object must be in the centre of the camera image at the desired object size Adesired the latter requirement means that the distance between the camera and the observed object agrees with some pre-speci ed value. This visual state d is called the visual setpoint for .
Open loop control We are trying to construct an adaptive controller which generates a robot joint rotation ( such that, when this rotation is applied to the robot, the observed object will be located in the centre of the camera image at the prede ned size. Using the symbol N for the neural controller mapping which we are looking for, and R for the given robot-camera mapping, the following two transformations are performed:
N ( ) = ( such that
R ( () = d where d (0 0 A) and = +(. When N indeed generates the correct joint rotation ( for all possible combinations of and , we call N the ideal controller and write C 0 . So, by de nition 8 : R C 0 ( ) := d : 0
0
0
56
Chapter 4. Eye-in-hand positioning
The neural controller N consists of a feed-forward neural network. N is being trained from learning samples ( () from which it is known that C0 ( ) = (: The method for creating these learning samples is explained in section 4.2.3. In order to nd optimal weight values in the network, the Polak-Ribi$ere conjugate gradient learning algorithm with Powell restarts, concluded in chapter 2 as being a superior nonlinear optimisation algorithm, is used.
Closed loop control
However, N will in general not be equal to C 0. The reasons for this were explained in chapter 2: there will always, depending on the structure of the neural network and the optimisation method chosen, remain an error in the approximation to the underlying function. It must be noted that the approximation must be very accurate. For instance, in order to obtain a positioning accuracy of less than 5mm in the end-eector, from the DenavitHartenberg table 3.1 it can be concluded that an accuracy of less than 0:28 in joints 1 and 2 is needed these joints have a range of 180 and 90, respectively. This means that the error in the network output should never exceed 0:16% of the maximum value. Thus, when controlling the robot arm in an open loop it is likely that the hand-held camera loses track of the object when it is approached. A single joint rotation which places the end-eector only a few cm next to the target location will make that the camera does not see the object anymore because it is outside the camera's eld of view, such that no information at all is available about the correctness of the previous move. By using closed loop control, thus adapting the path towards the object during the move, this problem is solved. We introduce feedback in the control loop as follows. To indicate the sequence of control, the variables , , and ( will be time-indexed. Thus we write, N ( i] i]) = ( i]: Now, instead of waiting for the robot to complete the move ( i] until it is nished, during that move the state of the robot-camera system is measured anew and fed into the neural controller: N ( i + 1] i + 1]) = ( i + 1]: The resulting joint rotation ( i + 1] is sent to the robot, which immediately changes the trajectory it follows. This scheme is repeated until the target object is reached. Figure 4.1 illustrates the feedback control inuence on the trajectory followed by the robot arm. Note that the delay between iteration i and i + 1 is not de ned it is dictated by the visual processing and communication delays. Why will the system be more accurate when a feedback loop is introduced? There are two reasons: 1. sensor readings close to the target are more accurate 2. the approximation to the C 0 close to the target can be made more accurate.
57
4.1. Formal statement of the problem
( 0]
start
0] 1]
2]
3]
( 1] ( 3] finish ( 2]
4]
i]: a position in joint space
(marked by a dot) ( i]: a move in joint space (marked by an arrow)
Figure 4.1: 2D motion plan of the robot arm. A planned move ( is, in general, not completed. While the robot is moving towards the new setpoint, a new setpoint is received, and the motion plan is updated.
Ad 1. The analysis of the robot's optical system that is given in chapter 3 shows that,
close to the target, the visual accuracy increases due to the fact that area measurements are most accurate there. There is a second reason, however: as was shown in gure 3.3, a visual position measurement can be as much as 40 pixels or 8% o due to the type of lens used in the experiments. For small moves, this error will occur in both the start and end position, and will therefore not be disastrous. Only for moves across the whole image, an 8% error will result. When the object is close to the centre of the camera, this error will be the least prominent.
Ad 2. When a uniformly spaced set of learning samples is approximated by a feed-
forward neural network, the approximation error in the samples will, on the average, be the same for all samples. Speci cally, it will not be smaller for small ( (used when the target object is close to the goal state) than for large (. One method to tackle this problem is to weigh the learning samples, such that learning samples close to the goal state have a larger weight than those far away. Another method is to increase the sample density close to the goal state. This is automatically obtained, since a successful grasping trail always ends in this goal state, thus creating samples in that part of the input space. Furthermore we know that, before any learning samples have been obtained, the only knowledge that is available of the system is that the robot should be stationary when the visual measurement of the object corresponds with d. That is to say, we know that
8 : C 0 ((0 0 A) ) = (0 0 0)
(4.2)
i.e., independent of the value of . Although the multi-dimensional approximation cannot be guaranteed to coincide with C 0 at those points, it can be made more likely by training the neural controller on samples adhering to (4.2). We will refer to these samples as the null-samples. The proposed experimental setup is shown in gure 4.2. At time i, the visual and robot state are fed into the neural network, which generates a joint rotation. That rotation is realised by the inverse robot dynamics module, which calculates torques to make the robot move. One time slot later, new visual and robot state data are available and given to the neural network.
58
Chapter 4. Eye-in-hand positioning i]
z;1
neural network
( i]
i + 1] robot preprocessing
robot dynamics
vision preprocessing
i]
z;1
i + 1]
Figure 4.2: The experimental setup. The visual state i] and robot state i] are input to the neural controller, which generates a joint rotation ( i]. The robot's PID controller (marked `robot dynamics' in the gure) calculates the required torques to make the robot move. At the next step i + 1, the new robot and visual state are available.
4.2 Constructing the controller In this section, the details of the implementation are discussed. A description of the input/output behaviour, as well as a typical control loop, is given.
4.2.1 The neural network The neural network consists of a feed-forward network trained with conjugate gradient back-propagation as described in chapter 2. The visual inputs and robot state inputs together make 5 network inputs (as discussed before, the structure of the robot eliminates the use of 1 as network input) three outputs (1 , (2 , and (3 constitute the network output and are given to the robot as a rotation (delta joint value) for joints 1, 2, and 3. The visual input consists of the position (x y ) measured in pixels relative to the camera centre, plus the area A of the object, also measured in pixels. With specialised hardware these quantities are measured in real time, i.e., within the time needed for grabbing one frame. At the time a frame is grabbed, and before the position and area of the object are extracted, the robot position is measured and stored in a register. The neural network must learn the relationship between displacements in the visual domain and displacements in the robot domain. This relationship is contained in the measured visual data i] and i + 1] in relation to the robot data i] and i + 1]. Knowing that a robot move from i] to i + 1] corresponds with an observed object move from i] to i + 1], this data can be used in the construction of learning samples which describe the actual behaviour of the camera-robot system. We will discuss the implementation of the two methods mentioned in chapter 3, learning by interpolation and input adjustment learning, and how the samples are kept.
4.2. Constructing the controller
59
4.2.2 Bins The conjugate gradient method that is used to update the weights in the neural network minimises the summed squared error over a set of learning samples. Therefore the learning samples which are generated during control of the camera-robot system are collected in bins. When n steps are used to move towards the object, n (n ; 1) learning samples can be constructed. This can be seen from gure 4.1: for instance, apart from data obtained from the move 0] ! 1] and 1] ! 2], also the move 0] ! 2] can be constructed by combining the previous two moves. With a typical value of n set to 100, a single trial may lead to nearly 10,000 samples. Clearly, the number of samples would grow quickly out of bound, leading to unacceptable delays in the learning algorithm, when all learning samples were kept. Therefore a selective binning structure is set up as follows. Along each input dimension d of the neural network (1 d 5 in this case) a partition into b d] parts is made. This partitioning leads to an 5-dimensional structure of bins a hypercube. When a learning sample is available, its input values uniquely determine which bin in the hypercube this learning sample ts in. When the corresponding bin is empty, the new sample is put in it otherwise the sample in that bin is replaced by the new sample. Thus each bin contains only one learning sample. An obvious advantage of the hypercube method to store the samples is that a uniform or otherwise desired distribution of the learning samples can be realised. This means that the neural network approximation will not lose accuracy in those parts of the input space which have not been visited for a long time.
Bin size and adaptation speed The size of the hypercube depends on the maximum and minimum expected values of the corresponding network inputs, and the number of desired partitions per dimension. A maximum of Q5d=1 b d] can be stored at any time. When a system adapts itself quickly to changes, it will forget the information stored in earlier learning samples (Grossberg's stability-plasticity dilemma). This is also the case with the binning structure: a hypercube consisting of ne-grain bins leads to accurate representations, but it takes very long before all learning samples in that hypercube are replaced by new samples when the system has to relearn its behaviour. Conversely, a course-grain hypercube with only a few bins is quickly updated, but cannot be used to build an approximation of high accuracy (see section 4.2.5). The tradeo can be alleviated by combining large and small bin structures in a single controller. When more than one neural network, each with its own hypercube of dierent granularity, is used to control the robot, the system can use discrepancies between the networks to detect and react to changes. This tradeo is application dependent and is discussed in section 4.3.
60
Chapter 4. Eye-in-hand positioning
4.2.3 Learning samples The neural network N () is trained with learning samples ( (). Unfortunately, it is not possible to analytically compute a ( from a given ( ), since that would require knowledge of the ideal controller C 0 . How can learning samples be constructed? We will now discuss both methods mentioned in chapter 3 in further detail.
Learning by interpolation In learning by interpolation the inputs to the neural controller do not only consist of i] and i] but also d . The network, which we will indicate by N 0, thus has eight inputs and three outputs. The task of the network is to generate a joint rotation
N ( i] i] d) = ( i] 0
such that
R ( i] i] ( i]) = (d i + 1]) :
The learning samples that are gathered, however, give information how the robot moves from i] to i + 1], where i + 1] need not coincide with d . Thus the neural network learns the mapping N 0 ( i] i] i + 1]) = ( i] such that R ( i] i] ( i]) = ( i + 1] i + 1]) : This has the advantage that the neural controller can be used to move the robot to any target point, and that certain systematic errors (e.g., from the camera) are taken care of. As is shown in (van der Smagt et al., 1992a van der Smagt, Groen, & van het Groenewoud, 1994), learning by interpolation leads to lower accuracy (when using the simulated robot) due to the large network input dimensionality. This also leads to noticeably slower learning, since a much larger number of learning samples is required. To circumvent these problems, we consider the input adjustment method.
The input adjustment method By using certain a priori information of the camera system (viz. that, in a rst approximation, it performs a linear mapping), the number of inputs of the network can be reduced as follows: given a (, nd the corresponding begin position ( i] i]) such that the system|after applying (|ends in the state (d i + 1]). This is equivalent to nding a sample which corresponds with the ideal network C 0: R( i] i] ( i]) = (d i + 1]). However, normally there are only measurements R( i] i] ( i]) = ( i +1] i +1]). By imaginarily rotating and translating the camera over a transformation matrix T such that the i + 1] is translated to d, we have T i + 1] = d. The same T is used in translating i]. However, T is expressed in the image frame of i + 1], so before it can be applied to i] the latter must be expressed in the same image frame. This is done
61
4.2. Constructing the controller
through the rotation matrix i Ri+1 which expresses the rotation of the image from i] to i + 1]. Thus a learning sample i] i] ; i Ri+1 i + 1] i + 1] ; i] (4.3) is obtained. This method, called the input adjustment method and introduced in (Krose et al., 1990) is illustrated in gure 4.3. Since the rotation is only about the before move
after move
I i]
I i + 1] i]
I i]
i + 1]
I i + 1]
Figure 4.3: The position of the target in the coordinate frames before and after the movement of the arm.
z-axis, the rotation matrix can be calculated from 1 , the rotation about the robot base. Given an adequate shape of the object (e.g., a non-isosceles triangle), the rotation angle can also be visually derived.
Inuence of measurement errors
Note that this method uses a priori information on the camera: the camera mapping should be linear to allow the above translations. The input adjustment will generate a correct learning sample if the projection of the Cartesian world coordinates onto the CCD () is linear in the translation over i Ri+1 i+1]. However, this is only from a rst approximation. The measurement errors discussed in section 3.1.2 lead to incorrect learning samples. In that section we found that due to deformations in the lens an error of up to 8% in the measured (x y) position of a pixel can occur for moves over the whole image.
Solutions to the error problem The problem can be solved by calculating the goodness of a learning sample. Applying the input adjustment to a move introduces a systematic error, depending on how far the
62
Chapter 4. Eye-in-hand positioning
camera-robot system is from the target state after making this move. A sample is perfect if the corresponding move has made the robot to reach the target state. The `error' in every sample can be weighed by the distance kd ; k where is the visual state after the corresponding move these weight are available after the goal state has been reached. These samples, which are used in nding the optimal parameters of the neural controller, are then collected in such a way that samples with a smaller error replace samples with a larger error the binning technique described above is ideal to implement this. In the implementation, a binary approach is taken: `perfect' learning samples, those that bring the camera-robot system `close enough' to the target state always replace `imperfect' samples. To reduce the error on all samples as much as possible, the input adjustment method is applied for each sample individually. The error introduced in the `perfect' learning samples is very small, since the camera deformation around the camera centre is negligible, as can be seen from gure 3.3 (see page 38). Such `perfect' samples will become available as soon as the neural network becomes successful in controlling the robot. When these samples appear, the set of learning samples will be updated by having perfect learning samples replacing imperfect ones. Note that, when the robot-sensory system changes during the experiments (e.g., an unforeseen rotation of the camera) this method will not work properly: perfect `old' learning samples will not be replaced by imperfect `new' samples. This problem again can be solved by using weighted samples, or by time-stamping samples such that `too old' samples are always replaced by new samples.
4.2.4 Implementation of the learning algorithm The conjugate gradient learning method is computationally expensive. To prevent delays in the neural network when it is used for controlling the robot, the neural network is implemented as a parallel algorithm, consisting of a learning part (called `lummel') and a controlling part (called `connel'). Each of the two parts runs on a dierent machine, in this case a Sun workstation. The interrupt-driven communication between the two parts consists of learning samples being sent from connel to lummel, and weight matrices in the opposite direction. This modular structure allows a setup in which multiple `lummel's are connected to one `connel.' Each `lummel' then can have its own number of learning samples, de ning the accuracy and adaptation time. Thus it is made possible that coarse but fast-adapting controllers and exact but slowly adapting controllers can be used at the same time. It is also possible to have one `lummel' always accept perfect as well as imperfect learning samples, in order to obtain better detection of changes in the robot-camera system. Figure 4.4 shows how the various parts are connected. The communication between the processes is illustrated in gure 4.5. At step i +1, connel sends requests for i +1] and i + 1] from the vision and robot software, and waits for the result. Next, connel sends a weight matrix request to lummel (the reply of which usually arrives one feedback step later) and performs a feed-forward sweep through the network with the weight matrix received at an earlier time. The resulting joint rotation ( i + 1] is sent to the robot.
63
4.2. Constructing the controller
camera (hand-held)
robot move learn samples
ethernet
weight matrices
connel
lummel
SPARCstation LX
SPARCstation LX
robot & camera data
Figure 4.4: Implementation of the controlled system.
lummel (learn) process interrupt send weights
connel
animat
request status
send joint & camera
request weights
W
(robot/vision runs) feed-forward step send joint rotation
(learn)
i
(
calculate new path
process interrupt (robot/vision runs)
process interrupt
generate learn sample
add sample to bin
request status
(learn)
request weights
process interrupt send weights (learn) process interrupt add sample to bin
send joint & camera
i+1
(robot/vision runs)
W
feed-forward step send joint rotation
(
calculate new path
generate learn sample (robot/vision runs) process interrupt
i+2
Figure 4.5: Schematic depiction of the steps performed in the feedback loop.
64
Chapter 4. Eye-in-hand positioning
4.2.5 Accuracy of the neural network
In this section, we follow the analysis in (Vysniauskas et al., 1993) in order to determine the optimal number of hidden units and learning samples for the solution of the posed problem. We recall the relation between the average approximation error a of a feedforward neural network and its number of hidden units and samples N given in chapter 2 as X X ij 2a (N ) : (4.4) p i+j =p N i j Here, ij are the parameters of the estimation, and i, j , and p are natural numbers. The obtained model for the error depends on the values of p over which (4.4) is computed we refer to (4.4) as AMEF(pmin, pmax) when pmin p pmax. The values of pmin and pmax are determined by measuring the relative inuence of the terms for the various p in (4.4). This is done by tting (4.4) to curves describing the approximation error vs. the number of samples and hidden units, and determining for which values of p's (4.4) ts best. For the robot-camera mapping, an optimal value of pmin = 1, pmax = 3 was found. Next, for the AMEF(1, 3) model Monte Carlo simulations are used to estimate the parameters ij . The result of this procedure is shown in gure 4.6. The gure shows the approximation by the AMEF(1, 3) function for the estimated ij of the robot-camera mapping function as de ned by the learning samples obtained from (4.3), for a network with 10, 20, 30, and 40 hidden units, for increasing numbers of learning samples. To nd the network with minimal resources, i.e., the network with a certain desired accuracy 0 for which the computational complexity
r N
(cf. (2.16)) is minimal, the AMEF(1, 3) model must be evaluated such that a (N ) = 0 , the desired accuracy (2.17a) cf. gure 2.3. In the case 0 = 1 (corresponding with the attainable accuracy of the camera-robot system, as was concluded in section 3.2.3), the optimal values for and N are found to be = 27 3, N = 200 30 (Vysniauskas et al., 1992). In the experiments, networks with 27 hidden units are used due to the binning techniques, the number of learning samples was not strictly adhered to but gradually increased during the learning process as can be seen in the next section.
4.3 Results Results of the algorithm are obtained both with the simulated robot (see appendix E) and the OSCAR robot. In the former case, the system can be used in a open loop, and the distance between the robot end-eector and the object can be measured after a single move. This leads to an expression of the error in approximation between N and C 0 . With the real robot, this approach is not possible. The feedback loop is always used (i.e., the robot moves are always interrupted) to prevent the case that the object is lost out of sight, and the correctness of the controller can be expressed in the number of feedback steps
65
4.3. Results
4.0 3.0
= 10
r
r
r
r
100
r
r
r
r
r
200
r r
300
N
4.0
r r
400
r
r
100
r
1.0 0
r
r
r
r r
r
200
N
r
r
r
r
r
r
r
r
r
r
r
r
r
100
200
r
r
r
r
r
r
300
400
400
r
r r
1.0 0
300
N
= 40
2.0
r
r
2.0
3.0
r
1.0
= 20 r
4.0
= 30
2.0
0
r
r
0
3.0
3.0
r
2.0 1.0
4.0
r
r
100
r
r r
200
N
r
r
r
r
r
r
r
r
300
400
Figure 4.6: Accuracy in approximation of the camera-robot mapping by the feed-forward network with 10, 20, 30, and 40 hidden units. The learning error is represented by error bars at the bottom, the generalisation error by error bars at the top, AMEF(1, 3) is plotted with the thick solid line. On the vertical axes is printed the error in the network output in . An error in the network output of 1 in joint rotations is not met by the network with 10 hidden units. (From: (Vysniauskas et al., 1993). Reprinted with permission.)
66
Chapter 4. Eye-in-hand positioning
necessary to reach the object. Note that this number of steps has a lower bound, since in each feedback step a certain amount of time is used for visual and robot preprocessing, augmented with the time for communication.
4.3.1 Simulated robot
With the simulated robot all positioning data is available. Therefore it is possible to measure the Euclidean distance between the end-eector and the target object. In order to test the accuracy of the neural controller, in the simulated environment the feedback loop is only used to make a second, third, etc. move towards the target, after the previous move has been nished. The tests are performed as follows: 1. the target object is placed at a random position in the reach space of the robot. The reach space used in the experiments is depicted in gure 4.7 2. the robot is given a random initial joint position, such that the end-eector resides in the reach space 3. the learning algorithm is started, where the robot nishes each move. The algorithm is continued until Figure 4.7: Reach space of the distance between the end-eector and the object the robot used in the simulais not more than 0.5cm, or too many steps are made tor experiments, indicated by dots. 4. back to 1. The functions that are approximated by the neural controller when controlling the simulated robot are, in essence, the robot's inverse kinematics using an egocentric viewpoint. The resulting equations are given in appendix C. In the rst experiment, 400 goals are run using a network without null-samples. Figure 4.8 shows the Euclidean distance between the end-eector and the object after one, two, and three steps towards the object. The average error after one step lies in the order of 2.5cm. This error corresponds with the expected value according to the analysis in section 4.2.5 for 0 = 1. The same gure also shows the grasping error after a camera rotation has taken place, for the same granularity of hypercube. Next the eect of adding null-samples is investigated. The results, which are shown in the top row of gure 4.9, demonstrate a drastic improvement over the experiment without null-samples. The average error in the rst step is less than 1cm (compared to 2.5cm in the experiment without null-samples gure 4.8]), and in two steps approximately 1mm (compared to 12mm in the experiment without null-samples). Note that after a few iterations, the target can already be reached in the feedback loop such fast learning makes the method very well suited for real applications. Secondly, the method's adaptability is tested. The state of the network after the initial 800 goals is taken as the begin state. The system is changed by rotating the hand, and hence the hand-held camera, by 45. When a very large number of bins is used, the replacing of the old samples by the new samples takes many trials. Therefore the
67
4.3. Results
rst step
15.0
second step
15.0 10.0
10.0
5.0
5.0
5.0
0
200
400
0.0
0
200
400
0.0 15.0
10.0
10.0
15.0
5.0
5.0
0.0
0.0
0.0
30.0
0
200
400
0
200
400
0
200
400
0
200
400
after 45 rotation
15.0
45.0
initial
10.0
0.0
third step
15.0
Figure 4.8: Grasping error in cm with the simulated robot after 1 (left column), 2 (second column), and 3 (third column) steps. In this experiment, no null-samples are put in the hypercube. On the horizontal axes, the number of goals is plotted. Top row. Initial learning: results when learning from scratch. The hypercube size is 1003 202 i.e., virtually unlimited. Second row. After initial learning: rotation of the camera by 45. The hypercube size is the same as during the initial learning. Note the dierent vertical scales.
68
Chapter 4. Eye-in-hand positioning
rst step
15.0 10.0 5.0 0.0 0
second step
1.0
400
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
800
0
400
800
0
45.0
15.0
15.0
30.0
10.0
10.0
15.0
5.0
5.0
0.0
0.0
0.0
0
400
800
0
400
800
45.0
15.0
15.0
30.0
10.0
10.0
15.0
5.0
5.0
0.0
0.0
0.0
0
400
800
0
400
800
third step
1.0
400
800
0
400
800
0
400
800
Figure 4.9: Grasping error in cm with the simulated robot after 1 (left column), 2 (second column), and 3 (third column) steps. In this experiment the hypercube was initialised with null-samples. On the horizontal axes, the number of goals is plotted. The desired accuracy was a Euclidean distance of 0.5cm between the end-eector and the target object. Top row. Initial learning: results when learning from scratch. The hypercube size is 1003 202 i.e., virtually unlimited. Second row. After initial learning: rotation of the camera by 45 . The hypercube size is the same as during the initial learning. Third row. After initial learning: rotation of the camera by 45 . In this case, a hypercube of reduced size is taken: only 55 maximum elements can be stored. The relearning is much faster, and the nal accuracy is better. Note the dierent vertical scales.
69
4.3. Results
performance of the neural controller will be bad for a long time. This is illustrated in the second row of gure 4.9. When fewer learning samples are stored (shown in the third row of the same gure) the old samples are more rapidly replaced, leading to a much faster adaptation. By connecting more than one lummel to connel, each of which has a dierent granularity of bin structure, faster adapting lummels with small hypercubes can be used when the lummels with larger hypercubes degrade in performance. Using this setup, an optimal relearning strategy can be devised. The number of samples that is stored in the bin is shown in gure 4.10. Due to the structure of the sample memory, this number grows asymptotically towards a maximum. The same gure also shows the eect of relearning. The old samples, which were created number of stored samples 600
80
400
large bin
60
200 0
percentage of old samples during relearn 100
40 0
200
400 goal #
20
small bin 0
400
800 goal #
Figure 4.10: (Left) Number of stored learning samples gathered during the initial 500 goals with the simulator experiment. (Right) Replacement of old samples during relearn. The upper curve shows the percentage of old samples during training for the large (ne-grain) hypercube the lower curve the percentage of old samples for the small hypercube.
before the camera rotation, are only slowly replaced by new samples. Especially in the ne-grain hypercube, there remains a large number of old patterns in the hypercube. This is also caused by the fact that, due to the camera rotation, some parts of the input space are not reached anymore. This eect is less cumbersome in the course-grain hypercube. When compared with other methods, learning in this algorithm is very fast. By applying a topology conserving neural network for a very similar task, i.e., hand-eye coordination, Ritter et al. report a positioning accuracy of approximately 0.5cm after two moves (a result which should be compared with the network used for gure 4.8), but need in the order of 5,000 learning goals to obtain this result (Ritter et al., 1989, 1990). Results, when run on the monocular OSCAR simulator, are the same (Jansen, van der Smagt, & Groen, 1995). The topographic map used by Kuperstein (Kuperstein, 1988) needs approximately 4,000 trials for convergence and reaches the same accuracy as Ritter et al.
70
Chapter 4. Eye-in-hand positioning
4.3.2 Results with OSCAR
For the OSCAR robot, it is not possible to measure the distance in world coordinates between object and end-eector, since not only is the position of the object unknown, but such a calculation would also include uncertainties in the available robot model. Also, as explained above the feedback loop makes such an evaluation of the algorithm impossible. Consider gure 4.11. This gure shows the object observed by the hand-held camera
Figure 4.11: An example of how the object is observed to move towards the centre of the camera image during a grasping trial. This gure was created by tasking a picture of the video screen, while keeping the shutter of the photo camera open during the whole trajectory.
during a grasping trial. From the gure it can be clearly seen that it is dicult to decide how `good' this deceleration trial is after all, the object does reach the right position in the camera frame (i.e., the hand moves to the right position). Therefore it is chosen to evaluate the dierence between the nal reached position in joint space, and the previously predicted positions during the feedback loop. This is therefore an error in the setpoints (i for joint i: n X j(i j] + i j ] ; i n ]j (4.5) j =0
where n is the number of feedback steps for this goal. The i j ] ; i n ] is the dierence between the current and goal position, and equals the exact (i to reach that goal position. This sum gives the average predicted error in joint space for one trial (i.e., one target position). We reiterate that the error is expressed relative to the nal reached joint position, which need not exactly coincide with the desired joint position, but may vary by the average uncertainty in the nal joint position, which is shown in gure 3.8. That means that the curve would never be expected to be lower than approx. 0.25 for joint 1, 0.15 for joint 2, and 0.35 for joint 3. The results are shown in gure 4.12. The number of imperfect and perfect learning samples which are stored in the bin maintained by lummel is also shown in gure 4.12. Note that, in this experiment, not all imperfect samples are replaced by perfect samples. This is because many more imperfect samples can be created than perfect ones however, in the limit all imperfect samples will be replaced by perfect ones. The number of perfect learning samples grows asymptotically towards the optimal
71
4.3. Results
avg. error in prediction / degree 1.6 1.4
3
1.2 1.0
2
0.8
1
0.6 0.4 100
200
300
400
500
600
# goals
# samples / 1000 2.8 2.4
imperfect samples
2.0 1.6 1.2
perfect samples
0.8 0.4 0.0 0
200
400
600
# steps needed to reach target
# goals
84
80
76
72 100
200
300
400
500
# goals
Figure 4.12: (Left) Error in degrees in the generated ( depicted for joints 1, 2, and 3. The data is smoothed with a moving average (MA) lter of width 100. (Centre) Number of perfect and imperfect learning samples stored in the bin, divided by 1,000. (Right) Number of steps between the beginning of the grasping and the moment when the object is reached. The data is smoothed with a 200 MA lter, and has a lower bound depending on the rotational speed of the robot arm and the delay in the feedback loop.
72
Chapter 4. Eye-in-hand positioning
number as found in (Vysniauskas et al., 1992) the same steepness is encountered in the error curve 4.12 (left), indicating that a stable solution has been found. Another indication to illustrate the learning behaviour of the system is to show the number of feedback steps needed to reach the target. This number will have a lower bound depending on the speed of the robot arm. In the current implementation, this is set to 5 per second. The average joint rotation that the robot has to perform to reach an object is in the order of 30{35 i.e., on the average 6{7 seconds. With a single feedback step taking 100ms, this would mean a minimum average of 60{70 feedback steps. This number is matched by the results. A second way of checking this is the `straightness' of a followed trajectory. In gure 4.13 two trajectories are displayed: the rst trial, and distance from camera centre / camera units
trial 1
20
x
distance from camera centre / camera units 20 16
10
12
0
8
y
;10 ;20 50
x y
4
z 0
trial 500
z
0 100
150 step #
0
20
40
60
80 step #
Figure 4.13: The left gure shows the trajectory in x, y , and z dimension towards the goal during the rst trial. Clearly, the path towards the 0-point is not straight. After various learning trials, however, the neural controller manages to guide the robot arm in a straight path towards the object, as shown in the right gure.
the 500'th trial. Clearly, in the beginning the camera does not approach the target in a straight line. After extensive learning, however, the path is close to optimal.
4.4 Discussion We have designed a method which is able to control the positioning of a monocular robot arm without knowledge of the arm or the camera, and without requiring extensive calibration. From the chosen number of hidden units, and the number of learning samples that is eectively stored in the bins, according to (Vysniauskas et al., 1992), an error of approximately 1 per joint, or approximately 2.5cm in the end-eector is expected. This error is
4.4. Discussion
73
indeed found in the results however, when the set of learning samples is augmented by null-samples to decrease the system's limit-cycle, much improved results (1cm in one step, 1mm in two steps) are attained. Using the theory presented in (Vysniauskas et al., 1992, 1993), the desired precision can be adapted by changing the number of learning samples and hidden units. The simulator experiments also show that the system recovers well from excessive changes (e.g., the 45 camera rotation). This adaptivity depends on the granularity of the binning technique to store learning samples, and it has been shown that the presented system is ideally suited to incorporate fast adaptivity. Since the size and construction of the sample memory play an important role, faster adaptation can be obtained with smaller sets of learning samples. The solution to the adaptation problem then is to connect multiple learning modules (`lummel's), each with its own number of hidden units, as well as its own hypercube granularity in which the learning samples are stored. More accurate representations (` ne approximations') can then be built over larger datasets, while their smaller counterparts (`coarse approximations') are used for immediate control when the ne approximations are not learned well enough. By using the dierence between the coarse and ne approximations, the life cycle of the feed-forward networks and their samples can be optimally controlled. This idea has been extended to a setup in which a feed-forward network splits itself automatically in those regions where the approximation (i.e., the grasping precision of the robot) is worse, creating sub-approximations for those parts of the input space. This leads to a multi-resolution approximation tree (van der Smagt et al., 1992a Jansen et al., 1995 van der Smagt et al., 1994), in which the worsening in approximation at various levels of the tree can be used to detect and adapt for changes in the controlled system.
74
Chapter 4. Eye-in-hand positioning
Chapter 5
Visual feedback in motion Living organisms, in contrast with the static robot control system described in the previous chapter, use continuous feedback from their eyes in order to interact with their dynamically changing environment. In the meantime, there is sensory activity due to ego-motion which is taken care of. While the information that is used in a static system only takes into account the position of points of interest, in a moving system such visual observations are interpreted in spatio-temporal domain. Optic ow, which is de ned as the motion of the observer's environment projected on its image plane, is much more commonly used by living organisms than information obtained from static retinal patterns. The fact that optic ow is fundamental to vision has only been realised since the pioneering work of Gibson (Gibson, 1950). A classical example of the use of optic ow in biological systems is the gannet( ) . When hunting for sh, this bird drops itself from the sky from a great height. Since the sh is moving, the bird needs its wings to correct its path while falling down however, at the moment of contact with the seawater its wings must be folded to prevent them from breaking. It has been shown that the time remaining between the moment that the bird folds its wings, and that it hits the water, is always the same for one particular bird. It is not controlled by its height or Figure 5.1: The gannet. velocity separately, but the quotient of the two. This remaining time is known as the time-to-contact and indicated by the symbol . When the system is not accelerating, is given by the quotient of the distance from an approaching surface and the velocity towards it. Exactly the same information can be obtained from the divergence of the approaching surface (Koenderink & van Doorn, 1975 Lee, 1980)|a Part of the material in this chapter was previously printed in (van der Smagt, Groen, & Kr ose, 1995a) and submitted as (van der Smagt, Groen, & Kr ose, 1995b). () The gannet is a seabird from the Sulid family (related to the pelican) and is common in the northern parts of the Atlantic Ocean, as well as in more temperate regions (Africa, Australasia, and South-America). The bird, which has a span of 1.8m, has white feathers except for its black primary wing feathers, and its yellowish neck and head equipped with a long, straight beak. In the broodingseason it nests in large colonies of up to 60,000 birds on clis and islands. It feeds itself by diving for sh from a great height, and following the sh under water. ()
75
76
Chapter 5. Visual feedback in motion
feature that can be observed monocularly( ). Since this bird cannot measure its velocity, it measures the time until it hits the water from time derivatives of the visual observation. It is this mechanism that underlies the method presented in this chapter for controlling a monocular robot arm such that it `lands' on an object. In one aspect, the task that is tackled is the same as in the previous chapter: at some time, the distance between the object and the hand-held camera must be zero. We go one step beyond that requirement: the distance must be zero at some time (which, we now know, is related to the time-to-contact), while the system must be at rest: the velocity, acceleration, and higher derivatives must also be zero. We will call this the goal state. But this can be seen as the endpoint of a trajectory towards that point in time. In the case of the bird, the decision to fold its wings can be made with the available visually measured quantities. In the sequel it will be shown that by extending the above example to higher-order time derivatives of the visual data, criteria can be developed which specify a trajectory which ends in a rest state (i.e., zero velocity, acceleration, etc.) at the end point. These criteria will lead to visual setpoints along the followed trajectory, and are used as inputs to a neural controller which must generate robot joint accelerations in order to follow the setpoints in visual domain. Thus it is possible that the eye-in-hand robot arm exactly stops on an observed object by use of optic ow. The extension of the system presented in the previous chapter by using time derivatives of the visual eld leads to an important advantage: the model of the object is not needed anymore to obtain depth information. The system need not be calibrated for each object that has to be grasped, but can approach objects with unknown dimensions. The use of optic ow for robot navigation is no novelty. Cipolla and Blake (Cipolla & Blake, 1992), for example, describe a system for estimating time-to-contact from the ow eld, and use this for obstacle avoidance of a robot arm with known kinematics. Sharma (Sharma, 1992) uses time-to-contact derived directly from the optic ow as a basis for mobile robot trajectory planning. Vernon and Tistarelli (Vernon & Tistarelli, 1990) use visual velocity vectors to estimate depth in a complex visual scene, but again for a robot with known kinematics. In this chapter we generalise previous time-to-contact based robot arm guidance methods to generation of 3D motion trajectories from optic ow, and use this to construct a model-free self-learning robot arm controller (in fact, not the optic ow eld will be used but rather the visual position, velocity, acceleration, etc. of a single white object to be grasped against a black background). Of importance is the fact that no model of the robot arm or the observed world is necessary to obtain depth information with monocular vision. First, we will investigate which visual criteria are necessary to let the robot arm follows the desired trajectory towards an object. Secondly, it will be shown how these criteria can be used as reference signals for a self-learning neural controller. This controller, as in the previous chapter, will use no model of the robot arm or camera, and will also not need a model of the approached object. In section 5.1, the trajectory followed by the hand-held camera point will be mathematically investigated. This leads to the timeConsidering the fact that birds have very small binocular elds (Martin, 1993), it is indeed very unlikely that binocular vision can be used for such a task. ()
5.1. The trajectory of an eye-in-hand system
77
independent motion constraints in section 5.2, and the time-dependent motion constraints in section 5.3. Next, the question of how such motion can be observed with a camera moving in its environment will be studied in section 5.4. The conclusions of the preceding sections lead to a control algorithm, described in section 5.5. Results of the algorithm applied to a robot are given in 5.6. A discussion in 5.7 concludes the chapter.
5.1 The trajectory of an eye-in-hand system A typical con guration of the object relative to the manipulator is shown in gure 5.2. The task of the neural controller is to generate robot joint accelerations which make the
dx(t) dy (t) object
camera x
(world coordinates)
y (world coordinates)
Figure 5.2: Setup of OSCAR trying to grasp an object. The object is observed by the camera at position d(t = 0), relative to the coordinate system of the end-eector. Subsequent measurements of the object occur at subsequent time steps. The gure also shows the projections of the vectors and objects on the z = 0 plane.
robot manipulator follow a trajectory leading to the goal state in which the end-eector (c.q., the camera) is placed on the object, and the robot arm is in rest. This trajectory will be expressed in the available visual domain. Therefore, visual criteria will be determined to which the visually measured trajectory of the observed object will have to adhere. These criteria have to be de ned from the visual observations (t), and we can write the goal state as 8t : (t) = d (5.1) i.e., we require a rest state in the visual domain. Notice the dierence with equation (4.1). In the previous chapter, d was equal to (0 0 A) here, however, we assume that no model of the object is available such that A is unknown. We will rst consider the goal state in world domain, and thereafter the goal state in visual domain. The act of grasping can be described as the joining of two coordinate frames, one for the robot's end-eector and one for the object. Correct position and orientation of the robot's end-eector is obtained when those two coordinate frames join. As we will only discuss position and neglect the orientation, both the object and the robot end-eector
78
Chapter 5. Visual feedback in motion
can be described by a single point, and grasping simply corresponds to joining those points. Just as in chapter 4, the world positions of these points are given by xo and xr (t), respectively. Furthermore, we introduce the distance d(t) in world space as follows.
Denition 5.1 The distance between the camera and object is denoted by the symbol d(t), and is determined from the Cartesian positions xt (t) of the camera and xo of the
object as
d(t)
xr (t) ; xo:
(5.2)
The distance of the observer relative to the object is described by d(t). We will use d(t) to determine the trajectory that must be followed in the x, y, and z direction by the manipulator to reach the target. How can the goal state then be expressed in world domain? We can formulate it as a stopping criterion:
Denition 5.2 The stopping criterion is equivalent to reaching the state where the relative distance is zero at all times t > :
8t :
d(t) = 0:
(5.3)
I.e., the system is in rest. Equation (5.3) can also be interpreted as: all time derivatives of d(t) must be zero at t = .
Corollary 5.1 The stopping criterion of denition 5.2 can be expressed as
8k 0 :
d(k) ( ) = 0:
(5.4)
In other words, the distance, velocity, and higher derivatives must become zero at the moment of grasping. We will now derive how the stopping criterion (5.4) can be used to determine the trajectory d(t) before the time of contact (i.e., when t < ).
5.1.1 The trajectory as a function of time
In order solve this problem we will rst partition d(t) it in its x, y, and z components such that d(t) = (dx(t) dy (t) dz (t)), as is illustrated in gure 5.2. In the sequel, the symbol d(t) will be understood to mean either one of dx(t), dy (t), or dz (t). As the trajectory is described in terms of its time derivatives, and these time derivatives describe physical concepts such as velocity and acceleration, a natural choice is to approximate d(t) by a nite order polynomial expansion around t = 0,
d(t) =
n X j =0
aj tj + "
(5.5)
79
5.1. The trajectory of an eye-in-hand system
where an 6= 0 and " is the approximation error, which equals o(tn ) for small t. Here, the nth derivative is the highest derivative of d(t) that is taken into consideration. Therefore, also, the stopping criterion (5.4) will only be taken into consideration for 0 k < n. The derivatives of d(t) are given by n X d(k) (t) = (j ;j ! k)! aj tj;k : j =k Dierently put, the derivatives of d(t) and the ak are related as: ak = k1! d(k) (t = 0): (5.6)
d(t) d_(t)
d(t)
t
Figure 5.3: A possible trajectory followed by the manipulator. Apart from d(t), the rst and second derivatives of d(t) are also shown. At t = , a discontinuity is applied to d(t): the deceleration is `switched o.' The relative scale of the three curves is distorted. Although the sign of d(t) is positive, we will indicate it as `deceleration' due to the fact that its sign is opposite that of d_(t).
What does a trajectory d(t) look like? Consider gure 5.3, which shows a system where d(t) is taken to be a quadratic trajectory in time, i.e., where n = 2 leading to a trajectory with constant deceleration. Note that at t = the criterion (5.4) has indeed been satis ed for 0 k < 2. The system has reached the target and is at rest, as required. In order to prevent the system from moving away from 0 again, the deceleration is externally `switched o' at t = as shown. The de nition of d(t) as a polynomial is related to its physical interpretation: we consider the trajectory followed by the manipulator in terms of position d(t), velocity d_(t), deceleration d(t), etc. with respect to time. Thus the n = 2 case is a physically familiar one: gravity-induced motion, or, more general, motion resulting from a constant force. The n = 3 case, where the jerk (the time derivative of deceleration) is constant, will also be considered large deceleration dierences are undesirable because they can cause vibrations in mechanical structures (robot arms). In the n = 3 case, the system is thus controlled that the deceleration also goes to 0 at t = and the jerk is switched o at t = . We proceed with substituting the d(t) and its derivatives in the criteria (5.4). These criteria will lead to constraints on the parameters of the trajectory, which can be used by the adaptive controller.
80
Chapter 5. Visual feedback in motion
5.2 The time-independent constraints The parameters ak uniquely determine the trajectory d(t) that is followed by the manipulator in the x, y, and z direction of motion. The stopping criteria (5.4) therefore lead to conditions on these parameters ak . Thus the conditions on the trajectory at the end can be translated to conditions along the trajectory. Although the conditions on the parameters ak can be expressed in various ways, since they are used as input to the controller it is imperative that they be measurable from the visual data. In section 5.4 it will be derived that, although the parameters ak cannot be measured, they are known scaled by an unknown constant. Therefore, only the relative quantities aj =ak are known, since only in those expressions the constants of proportionality disappear. Therefore we have to nd criteria in which the relative quantities aj =ak appear. In the previous section we have formulated the requirements for the trajectory that is followed by the manipulator in time. These requirements can be summed up as follows: 1. d(t ) = 0 2. d(t) can be well approximated by a polynomial of nite order. The criteria for the ai to meet these conditions can be found from the stopping criteria (5.4), which require that d( ) = d_( ) = d( ) = : : : = 0. This results in:
Theorem 5.1 Each of the components d(t) of d(t) is described by a dierent polynomial d(t) = a0 + a1 t + : : : + antn: Then the stopping criterion
9 8k 0 k < n :
d(k) ( ) = 0
where > 0 is an unspecied moment in time, leads to the following set of simultaneous constraints on the parameters of the trajectory:
! ann;k;1ak = 1 n nn;k k ann;;k1
0 k < n:
(5.7)
Proof Given in appendix B.1. The k = n ; 1 case reduces to a trivial equation therefore, eqs. (5.7) lead to n ; 1 nontrivial constraints. Since is not a parameter of these constraints, we will refer to (5.7) as the time-independent constraints. problem that is tackled can be formulated as, `given a moving system d(t) = Pn The i a t i=0 i which is in state a0 , a1 , : : :, an;1 , nd the an such that at some time , d( ) = _d( ) = = d( )(n;1) = 0.' Formulated as this, we see that there are n equations with two unknowns ( and an ). Therefore a solution can only be guaranteed when n = 2. In practice this means that, given a system at position d(t) moving with velocity d_(t), you
81
5.2. The time-independent constraints
can always nd a constant deceleration d such that d(t) = d_(t) = 0 at some time. Cases for n > 2 are, in general, not soluble. For instance, considering the n = 3 case, it is not always possible to nd a linearly decreasing deceleration with which the velocity and position also go to 0 at some time t = . To clarify these statements, we will consider the case n = 2 in more depth, and illustrate the impossibility of the n = 3 case for a constant jerk a3 along the whole trajectory.
Quadratic polynomials. For n = 2, (5.7) reduces to the single equation n=2:
a2 a0 = 1 : a12 4
(5.8)
Lemma 5.1 (Model equation.) The time-independent constraints for the n = 2 case lead to an optimal deceleration
2 a2 = 4aa1 : 0
(5.9)
Proof Immediately from (5.7) when is nite.
Equation (5.9) gives the required deceleration to lead the system to the state where the position and velocity are zero at the same time. Thus the condition (5.4) is satis ed for k < 2. Equation (5.9), when written as a2 a0 =a12 = 1=4, is a constraint on the measurable relative quantities a2=a1 and a0 =a1. It is therefore possible to design a controller which uses these measurable quantities for a system to perfectly decelerate towards an object, by having it satisfy the constraint on those quantities. Note that the time until contact
cannot be controlled it is, in fact, equal to 2a0=a1 and thus depends on the initial position and velocity of the system only. This case has been investigated by (Lee, 1980).
Cubic polynomials. As mentioned above, for n > 2 no solutions exist in the general
case. As an example, for n = 3, (5.7) reduces to the combined equations a32 a0 = 1 a3 a1 = 1 : (5.10) n=3: a2 3 27 a2 2 3 This set of equations leads to two, in general dierent, values for a3 which solve the equations. In order to bring the system to the desired rest state (5.4) where the position, velocity, and deceleration are zero, both solutions must give the same result otherwise, the desired rest state cannot be reached. The solution to equations (5.10) describes a line in the three-dimensional space spanned by a0, a1 , and a2. Clearly, there exist initial values of a0 , a1 , and a2 which do not lie on this line thus, for those initial values a constant a3 does not exist to follow that line. This problem can be solved by taking a two-step approach, where a dierent a3 is given in the rst and second steps. In the rst the system will be brought on the desired (a0 a1 a2 ) line, and in the second step this line will be followed.
82
Chapter 5. Visual feedback in motion
Discussion. There are a few problems with the above solution using quadratic poly-
nomials. First, although, in the n = 2 case, the optimal deceleration can be found to bring the system to the desired state where the position and velocity are zero, the time over which this deceleration is performed cannot be controlled. This is clear from (5.7):
is not a parameter of the system. Therefore, the time needed for deceleration is solely dependent of the starting conditions, e.g., the position and velocity in a second order system at the moment that deceleration is initiated. Figure 5.4 illustrates this eect. It means that, with a system controlling quadratic trajectories, it may happen that the
200 200
150 150
100 100
50
50
0
0
0 0
20 20
40 40
a0
a0
60 60
80 100 80
100
-2 2
-4
4
-6
6 a1
a1
-8 8
-10 10
Figure 5.4: Time-to-contact depends on beginning conditions. The depends on the values of a0 (position) and a1 (velocity) along the trajectory.
distance d(t) is large while the velocity d_(t) is small, such that it will take a very long time before the system will reach the steady state. A result of this rst problem is that, if cannot be controlled, it will in general be dierent for dx(t), dy (t), and dz (t). In other words, even though the form of the 1D trajectory is controlled, the trajectory followed in 2D or 3D Cartesian space cannot be controlled it need not be straight. One such possible trajectory for the 2D case is depicted in gure 5.5. The gure illustrates that, while the
dx(t)
dy (t) Figure 5.5: A planar trajectory followed when the x and y trajectories are not synchronised. Both the x and y trajectories that lead to this gure are quadratic polynomials. Note that the x coordinate will become 0 before the y coordinate does.
83
5.3. The time-dependent constraints
y component of d(t) is still non-zero and decreasing, the x component is already 0. A second problem is that the above solution only works for n = 2. To solve these problems, the stopping criteria have to be expanded to incorporate .
5.3 The time-dependent constraints The solution found in the previous section is one with a constant deceleration. In this case we used a single-step approach: during the whole deceleration, the a2 is kept at a constant value. This method, however, has a disadvantage: the large acceleration dierences are dicult to realise by mechanical structures, leading to increased wear-and-tear. Secondly, as we saw in the previous section it is imperative that the desired time-to-contact d be a parameter of the system. A multi-step approach, in which the deceleration a2 is not constant along the whole trajectory must be used to tackle these two problems. Since the deceleration is no longer constant along the trajectory, it must be repeatedly adapted which can be realised by feedback control. Here the deceleration is assumed constant in each time interval, but varies from interval to interval.
5.3.1 Feedback control of the trajectories We consider time in intervals i, i + 1, : : :. In each interval t i] = 0 t i+1] = 0 t i+2] = 0 i i+1 i+2 i is de ned a time-axis t i] with corresponding trajectory parameters ak i]. This is illustrated in gure 5.6. Note that the Figure 5.6: The time-axis. intervals need not be equidistant. Consider once again the polynomial expansion of d(t) in (5.5). This polynomial expansion of order n is globally valid over the whole time interval, i.e., the whole trajectory of the robot arm. After splitting up the global time axis in intervals, the d(t) can be repeatedly approximated in these intervals by polynomials with parameters aj i]. These approximations are written as
d i](t i]) =
n X j =0
aj i]t i]j + ":
(5.11)
Note that d(t) d 0](t 0]), but that the parameters aj i] are in general not equal to aj i +1]! The starting time t at which the d i] and thus the aj i]'s are de ned is repeatedly changed. As set out above, the task of the feed-forward based neural controller is to make the robot manipulator follow a pre-speci ed trajectory. During a time interval i the system measures the i] (note that, due to the discrete-step feedback loop, the are now discrete variables indexed by the step number instead of varying continuously in time). From these measurements the controller generates joint accelerations i + 1] and sends them to the robot. This marks the beginning of a new time interval i + 1.
84
Chapter 5. Visual feedback in motion
In gure 5.7 a trajectory is shown which is realised by feedback control. In this gure, the deceleration is computed for each time interval i, i + 1, : : :, leading to a continuously adapted d(t) and d_(t).
d(t) d_(t) i
d(t) (in step i) i+1
t
t
i+2 i+3 t
t
time steps d(t) (in i + 1)
Figure 5.7: Exemplar trajectory followed by a feedback-controlled manipulator in a time scale. Shown are the position d(t) and velocity d_(t) during the decelerated motion (both are scaled to t in the gure, and their signs are not shown). The gure starts at i, when the rst deceleration is applied. From i to i + 1, new measurements lead to a newly estimated deceleration, and thus the motion prole changes at i + 1. Ditto at i + 2, from when on the motion prole of i + 2 is supported in the subsequent time slices. The dashed lines give the predicted d(t) from i and i + 1, which would have been followed if the deceleration would not have been changed.
To nd the relationship between ak i] and ak i +1], we equate d i](t i]) and d i +1](t i + 1] ; (t i]), where (t i] equals the time dierence between the t = 0 for d i] and d i + 1]:
a0 i + 1] = a0 i] + a1 i] (t i] ; t i + 1]) + : : : + an i] (t i] ; t i + 1])n a1 i + 1] = a1 i] + a2 i] (t i] ; t i + 1]) + : : : + nan i] (t i] ; t i + 1])n;1 ::: n j! X j ;k (5.12) a ak i + 1] = j i] (t i] ; t i + 1]) j =k k ::: an i + 1] = an;1 i] + nan i]:
From these equations we can see a clear correspondence between the ak i]0 s and the d i](k) 's:
d i](k) (0) = k! ak i]
(5.13)
in correspondence with (5.6) and (5.11).
5.3.2 Finding the time-dependent constraints
We will expand the criteria (5.7) by introducing a desired time-to-contact d . The conditions that now lead to the trajectory are:
85
5.3. The time-dependent constraints
1. d(t d ) = 0 2. d(t) can be well approximated by a polynomial of nite order. This can be done as follows. Since d(t) is approximated by a polynomial of order n, the nth derivative of the approximation of d(t) must be constant in time an is constant. Therefore, the n ; 1st must be linear in time. Consequently, the time to bring the n ; 1st derivative to 0 is equal to the quotient of the two. For n = 2, this is the velocity divided by the acceleration. In the general case (cf. appendix B.1), n;1 : (5.14)
= ; ana n This can be combined with the constraints (5.7), such that the system is now faced with n non-trivial constraints: ! ann;k;1ak = 1 n 0 k < n and = ; an;1 : (5.15) d an;1n;k nn;k k nan These constraints can be rewritten as ! ak = (; )n;k;1 1 n d an;1 n k
0 k n:
(5.16)
Note that, again, a trivial case results for k = n ; 1. Similar to the time-independent case, satisfying these n non-trivial constraints leads to the desired trajectory. However, the constraints are all related and a simpli cation is in order. This leads to the following theorem.
Theorem 5.2 Each of the components d(t) of d(t) is described by a polynomial d(t) = a0 + a1 t + : : : + antn: Then the stopping criterion
8k 0 k < n :
d(k)( d ) = 0
where d is the desired `time-to-contact', leads to the following constraint on the parameters of the successive intervals i of the trajectory:
a0 i] = ; ( d ; t i]) a1 i] n where n and ( d ; t ]) 0. Proof Given in appendix B.2.
8i : 0 i <
We will refer to (5.17) as the time-dependent constraint.
(5.17)
86
Chapter 5. Visual feedback in motion
The time-dependent constraint is obtained by extending the local time intervals towards the moment when ( d ; t i]) = 0. Although the d i](t i]) is a local approximation, we can just pretend that it ts the whole d(t) starting at t i] = 0, and let the timedependent constraint be valid for that trajectory. This is clearly illustrated in gure 5.7 in this case, however, the constraint is repeatedly applied to each local approximation to d i](t i]). Note that, since the time-dependent constraint is valid for each beginning point of a local approximation to d(t), it is valid for each t i].
d Is usually chosen equal for the x, y, and z components of d(t i]), to ensure that all components go to zero at the same time.
Discussion. Theorem 5.2 shows that the trajectory of any n-order system can be de-
scribed by a single constraint on the ratio of position and velocity of that system. The n simultaneous criteria (5.16) which have to be satis ed at some time are replaced by satisfying a single constraint (5.17) during n subsequent steps. The `simultaneous conditions' are `laid out in time.' Equation (5.17) leads to a true feedback system: not only is the feedback loop necessary to manage the imperfect control system, but it is inherent to the control law.
( d ; t i])=n
0 1 2 3 4 5 6 7 8 9
t i
Figure 5.8: Desired time-to-contact and the resulting constraint. Whereas decreases linearly in time, the constraint ( d ; t i])=n (shown for n = 2) changes only between interval i and i + 1. Note that the intervals need not be equidistant in time.
Note the important advantage with respect to the time-independent constraints: the trajectory can be expressed in only the setpoint ( d ; t i]) and a0 i] and a1 i] higher-order ak i]'s are not required. Figure 5.8 shows an example of how the setpoints ( d ; t i])=n vary from time interval i to i + 1 for the case n = 2. we formulate the problem that is tackled as `given a moving system d(t) = Pn Asa before, i is in state a0, a1 , : : :, an;1, nd the an such that at the given time d , t which i=0 i _ d( d) = d( d ) = = d( d )(n;1) = 0.' At each step there is one equation (5.17) that has to be satis ed, and for which the required deceleration a2 i] in the time interval i can always be found.
87
5.3. The time-dependent constraints
Smoothness of the deceleration path. As an illustration, let us assume quadratic polynomial trajectories in the x, y, and z direction of motion.
Lemma 5.2 (Model equation.) The time-dependent constraints for quadratic polynomials lead to an optimal deceleration for unit time intervals
i](n + ( d ; t i]) ; 1) : a2 i] = ; na0 i] +n a+1 2(
; t i]) ; 2 d
(5.18)
Proof When taking (5.12) for n = 2, i.e., a0 i + 1]
= a0 i] + a1 i] + a2 i] a1 i + 1] = a1 i] + 2a2 i]
(5.19) (5.20)
and substituting a0 and a1 in (5.17) for i + 1, (5.18) is obtained.
In gure 5.9 this model is applied taking n = 2 (globally quadratic) for three curves with dierent initial conditions a0, a1 . Consequently, the resulting a2 diers for each of the curves. Notice that all curves go to 0 at the same moment. The gure clearly shows
d1 (t) d2 (t) d3 (t)
d1(t) d2(t) d3(t)
= d
Figure 5.9: Time-dependent deceleration with an n = 2 system. Three globally quadratic trajectories are displayed. For each of the trajectories, the velocity d_(t) at the beginning of deceleration is equal, but the distance from which is decelerated diers. The resulting decelerations (also shown, but not in scale) are dierent such that the trajectories all have the same d . Note that after i = 0 (the rst time interval) the decelerations are constant.
the disadvantage of using motion pro les which are assumed to be globally quadratic. In order to bring the velocity `in step with' the position, the rst deceleration that is applied can be very large. This is mostly a disadvantage resulting in problems already mentioned above: large acceleration dierences are dicult to realise by mechanical structures (i.e., robots), and such large steps resulting from small changes in the input variables (the switching-on eect) are dicult to represent with continuous approximators such as feed-
88
Chapter 5. Visual feedback in motion
forward neural networks. Therefore a more gradual approach, i.e., more feedback steps, is required. When choosing a value for n, this can be interpreted as the assumption that an will be constant along the trajectory, while the adjustment takes place in n steps gure 5.9 illustrates this for n = 2. The same gure also illustrates that n = 2 was not a good choice for those begin conditions fa0 a1g: an=2 could not be kept constant along the whole trajectory. So, instead, we have to choose n based on the initial values of a0 and a1.
Varying order polynomials
Given a moving system, the measured a0 =a1 contains the information about the optimal trajectory given the initial condition that has to be followed until contact. With a0 =a1 = ; d=n, n > 0, following a deceleration trajectory of order n renders the most smooth path to contact. In theorem 5.2 we have seen that when we choose a2 in each time interval such that (5.17) is satis ed, an optimal trajectory of order n is followed. A deceleration algorithm, expressed in the desired a2, could look like this: 1. determine the desired order
&
'
n = ; ( d ;at ii])] a1 i] : 0
(5.21)
(We use the nearest integer although, strictly speaking, n need not be integervalued.) 2. determine the desired deceleration using (5.18) 3. apply the a2 i] to the system 4. goto 2. Note that the algorithm can only be used when the calculated n > 1 when n < 1 the system rst needs to be accelerated. Figure 5.10 shows the results of this algorithm applied to the same curves as in gure 5.9. In this case, the deceleration does not make a single large step instead, the position, velocity, and deceleration all smoothly decrease to 0.
5.4 Visual measurement of the stopping criteria The deceleration algorithms described above is expressed in parameters ak . From chapter 3 we know that the d(t) and thus the ak cannot be measured with one camera when there is no model at hand of the observed object(s). Instead of looking at dx(t), dy (t), and dz (t), we have to consider the quantities that are actually measured: x(t), y (t), and z (t). The rst of these indicate the observed
5.4. Visual measurement of the stopping criteria
89
n=3 n = 4:5 n=9
= d Figure 5.10: Time-dependent deceleration with a system where n is automatically chosen. For each of the trajectories, the velocity d_(t) at the beginning of deceleration is equal, but the distance from which is decelerated diers. This time, the three trajectories have a dierent n = 3, n = 4:5, and n = 9. The resulting decelerations (also shown) are dierent such that the trajectories all have the same d (n = 9 and n = 4:5 appear to go to 0 earlier, but they remain positive until t = d ).
position of the object with respect of the centre of the camera image( )(y) , and the z (t) is de ned as (5.22) z (t) q 1 A(t) as in chapter 3. Assuming a pinhole camera, the following model can be used for the observed area A(t): 2 A(t) = df (tA)2 (5.23) z such that z (t) = dzp(t) f A where f is the focal length of the lens and A is the `real' area of the object. Since dz and z are linearly related, the assumption that a second or third order polynomial ts dz (t) can also be used for z (t). Next, the models for x(t) and y (t) are investigated. If the camera-lens system is idealised by a pinhole camera, then the observed x and y positions x(t) and y (t) are ( gure 5.11) x(t) = f ddx((tt)) y (t) = f ddy ((tt)) : Substituting (5.23),
z
z
Without loss of generality. The `centre' can be de ned anywhere in the image frame. This will generally be the number of pixels counted on the line between the centre of the object and the centre of the image frame. () (y)
90
Chapter 5. Visual feedback in motion
area A
lens
centre at x , y
area A centre at dx, dy Figure 5.11: Exemplar mapping of an observed object in front of the lens on the CCD behind it. The object is placed at position (120 60 100) `units' w.r.t. the lens, which has an f = 10. The area of the object is 502, such that the area of the observed object is 52 .
s
x(t) = dx(t) AA(t) (5.24) are linearly related to their Cartesian and similarly for y . The observed x, y positions q counterparts, however, scaled with a factor A(t). Thus, if the Cartesian motion can be described by a second (third) order polynomial in the x, y, and z direction, this cannot be done in the observedqvisual domain x and y . This problem can be solved by scaling x and y by the same A(t), i.e., x0 (t) = qx(t) A(t)
(5.25)
and similarly for y0 . In conclusion, we nd that
x0 (t) = dpx(t) A
y0 (t) = dpy (t) A
z (t) = dzp(t) : f A
(5.26)
The resulting x0 (t), y0 (t), and z (t) are just as well tted by the polynomial as x, y, and z and from (5.5) follows n X (t) = iti + o(tn) (5.27) i=0
where (t) is either x0 (t), y0 (t), or z (t). Similarly, i indicates the x, y, or z parameters. Once the parameters i are known, the polynomials for x0 (t), y0 (t), and z (t) are known. Knowledge of these parameters, however, does not give sucient information on the position, velocity, etc. of the end-eector relative to the object, since they are scaled by the constants A and f . However, the constraints can still be expressed in visual
5.5. Controlling the manipulator
91
parameters: using the polynomial expansions for d(t) and (t), and combining these with equations (5.26), the i's have a common constant i = cai (5.28) where c is cx, cy , or cz for the x, y, and z components of d, given by cx = A;1=2 cy = A;1=2 cz = (f 2A);1=2 : (5.29) This means that the time-dependent constraint from theorem 5.2 can be written as 0 i] = ; ( d ; t i]) : (5.30) 1 i] n
5.4.1 Measuring
(t)
Finally, we have to nd the parameters i. The method for nding these parameters is the same for x0 , y0 , and z . In order to nd the parameters of the polynomial expansions, least-mean-square optimisation is used. Since the parameter dependency is linear, we can even revert to linear least squares and no iterative minimisation method is required. Hereto we de ne the
chi-square merit function
1 0 M (ti) ; Pn j tj 2 X i j =0 A (5.31)
2 = @ i i=1 which is the quantity that must be minimised. M indicates the number of observations (t) that are available. As usual, i is the standard deviation in (ti). Appendix B.3 shows how the i can be found from (t). Also, the deviations of the j are the diagonal elements Cjj of the covariance matrix C . Using the covariance matrix for the n = 2 case in appendix B.3 we nd
2 (3M 2 + 3M + 2) = 3 M (M ; 1)(M ; 2) 2 M + 1) : 2 (1) = 12 M(M(82M;+1)(11)(2 2 M ; 4)
2 (0)
(5.32a) (5.32b)
Figure 5.12 shows the standard deviations (0) and (1) for several M when 2 = 1. From the gure it can be derived which M must be taken such as to obtain a required accuracy in the parameters 0 and 1.
5.5 Controlling the manipulator We have thus developed a method for determining constraints in visual domain which lead to the desired goal state of rest in visual domain. In this section we will show how a controller for a robot arm can be constructed which uses the above theory. The controller consists of two parts: (1) a system which generates trajectories in visual domain using the above theory (2) a neural network which translates the trajectory in visual domain to joint motor commands.
92
Chapter 5. Visual feedback in motion
relative standard deviation ( )=( ) absolute standard deviation ( ) 10 1.4 8 6 1 1 1.0 4 0.6 2 0 0 0.2 M M 0 4 8 12 16 20 4 8 12 16 20
Figure 5.12: (Left) relative and (right) absolute standard deviation of j . The curves in the left gure express the standard deviations of 0 and 1 with respect to the standard deviation of the measurements. In the right gure are shown the 1 and A = 20 000 pixels2 . calculated standard deviations when taking 2 (z ) 0:05p A
5.5.1 A trajectory in visual domain How can the previously developed theory be used to construct a controller which decelerates a robot arm such that its hand-held camera `lands' on an object and comes to a standstill? In the deceleration method described above, the visual acceleration of the camera 2 i] is determined from its position 0 i] and velocity 1 i] relative to the object, and desired time-to-contact d. So we can say that 2 i] is the desired visual acceleration necessary to satisfy the time-dependent deceleration criterion hence we prefer to write 2d i]. During deceleration, this 2d i] is applied to the system, which leads to a new (0 i + 1] 1 i + 1]) in the next time-step. Why do we choose to control the acceleration 2 i]? Well, the construction of most industrial robots is such that the joints are controlled through constant torques, and therefore constant i]. Assuming small time intervals, a constant i] approximately corresponds with a constant 2 i]. We thus assume that the repeated approximations d i](t i]) by second degree polynomials are precise enough when terms higher than the second order are ignored. The error introduced by this simpli cation is corrected for by re tting d at i + 1, i + 2, etc. Instead of considering 2d i] as the control variable for the controller, it is also possible to express a setpoint in a desired position and velocity at i + 1. So we say that the next setpoint is a pair (0d i + 1] 1d i + 1]), the elements of which are computed as
0d i + 1] = 0 i] + 1 i] + 2d i] 1d i + 1] = 1 i] + 22d i]
(5.33a) (5.33b)
where 2d i] is computed with
i](n + ( d ; t i]) ; 1) 2d i] = ; n0 i] +n +1 2(
; t i]) ; 2 d
(5.34)
93
5.5. Controlling the manipulator
cf. (5.18) and n using
&
'
n = ; ( d ;t ii])] 1 i] (5.35) 0 cf. (5.21), assuming local quadratic polynomials for the trajectory d(t i]) that is followed. Thus we end up with a system which has as inputs the measured 0 i], 1 i], and the remaining ( d ; t i]), and as output the desired 0d i + 1] and 1d i + 1].
5.5.2 Translating the trajectory to joint domain
It has been shown that a trajectory can be determined in visual setpoints (0d i +1] 1d i + 1]) so as to satisfy the time-dependent constraint. The nal question we have to ask is the following: how can we compute joint accelerations that make the robot-camera system follow those visual setpoints? The trajectory in visual domain must be translated to a trajectory in joint domain. In the case that the area A of the object and focal length f of the camera were known constants, then the relationship between i and ai would be known. In that case the trajectory (0d i + 1] 1d i + 1]) in visual domain can be calculated from the trajectory in visual domain, and with known kinematics of the robot this can be used to compute a trajectory in joint domain. We know that the mapping a0 i] a1 i] a0d i + 1] a1d i + 1] i] _ i] i] ! i + 1] (5.36) is a function (i.e., can be computed) for the given robot-camera system. Appendix D gives the exact equations to realise this computation. Of course, in our case the relationship between i and ai is not known and it even varies with the size of the object. The question is whether the mapping (5.37) 0 i] 1 i] 0d i + 1] 1d i + 1] i] _ i] i] ! i + 1] is a function also, i.e., does not have multiple solutions. The reason that this is so, is that the end-eector velocity can be measured (in joint space), such that the visual measurements are disambiguated. A mathematical argument proceeds as follows. As will be shown in appendix D, eqs. (D.4){(D.6), a constant function d_ ( _ ) = (d_x ( _ ) d_y ( _ ) d_z ( _ )), changing only with the kinematics of the robot, exists with which the velocity of the end-eector in world-space can be computed from the and _ . Recalling the relationship between i and ai in (5.29), with the (unknown yet constant) inverse kinematics of the robot we have an expression for a1 in terms of d( _ ) in joint domain, while we can measure 1xyz with the camera. Therefore we have an expression for A and f : 2 _ _ _ _2 A = a1x 2 = dx( 2) f = pa1z = dd_xz(( _ ) ) : (5.38) A1z 1x 1x 1 z
1x Therefore we conclude that the left-hand sides of (5.36) and (5.37) carry the same information. The mapping (5.37), even though it cannot be computed without knowledge of the robot's inverse kinematics, can be learned using a neural network. We will show in the next section how this is done.
94
Chapter 5. Visual feedback in motion
5.5.3 Controller structure The resulting controller consists of three parts: 1. the interpolator uses the measured (t) and (t) to determine the joint positions i] and their derivatives _ i] and i], as well as the visual trajectory parameters j i] 2. these visual trajectory parameters are input to the criterionator which applies equations (5.34), (5.35), (5.33a), and (5.33b) in that order to calculate the next visual setpoint (0d i + 1] 1d i + 1]) 3. the current visual measurement (0 i] 1 i]), the visual setpoint, and i], _ i], and i] are input to the neural network 4. the neural network generates a joint acceleration i + 1] which is sent to the robot 5. the robot-camera system closes the control loop. criterionator
0 i] 1 i] i] _ i] i]
interpolator
0d i + 1]
(t)
1d i + 1] neural network N
i + 1]
(t)
robot-camera system
Figure 5.13: The structure of the time-to-contact control loop.
Figure 5.13 shows the components of the time-to-contact controller in the control loop. Note that the criterionator, in order to determine ( d ; t i]), needs a clock.
Implementational details. The neural network consists, as in chapter 4, of a separate
control process called `connel' and a learning process called `lummel.' These two processes cooperate as follows. Each newly available learning sample is sent from connel to lummel, which adds it to its bin of recent learning samples. Each time before a forward propagation through the feed-forward neural network is done in order to compute the next joint acceleration, a weight matrix is sent from lummel to connel.
5.5. Controlling the manipulator
95
5.5.4 Generating learning samples
As with the controller described in the previous chapter, it is imperative that the learning samples be generated on-line. We investigate which data is necessary to create learning samples which describe the desired behaviour of the neural network N , and rst investigate the functions that are performed by the neural network and the robot-camera system.
The neural network. The input to the neural network consists of the desired trajec-
tory in visual setpoints (measured visual state at i and desired visual state at i + 1), in conjunction with the state of the robot. The output of the controller consists of joint accelerations which must be applied to the robot to instantiate the supplied trajectory: (5.39) N 0 i] 1 i] 0d i + 1] 1d i + 1] i] _ i] i] = i + 1] where k j ] are separately input for the x, y, and z components. The total number of inputs to N () equals 20.
The robot-camera system. The robot, observing a stationary object, performs the following transition:
R
0 i] 1 i] i] _ i] i + 1] = 0 i + 1] 1 i + 1] i + 1] _ i + 1] :
(5.40)
In this case we take 0 i] and 1 i] as inputs of R since they describe the position of the object w.r.t. the moving hand-held camera.
A learning sample. From the two above transitions it is concluded that a valid learning sample is constructed as follows:
C0 0 i] 1 i] 0 i + 1] 1 i + 1] i] _ i] i] = i + 1]
(5.41)
where C 0 indicates the ideal controller. The neural network thus approximates a function of the inverse kinematics of the robot-camera system, where the visual observations are scaled by a `varying constant' c. In eect, learning by interpolation is used.
5.5.5 Experimental procedure
The exact experimental procedure is as follows. 1.1 the target object, which has randomly chosen dimensions, is placed at a random position in the reach space of the robot (see gure 4.7). Starting from a random initial joint position, the robot is given random joint accelerations for joints 1, 2, and 3 1.2 choose a desired time-to-contact d (in the simulator experiments it was randomly chosen between 10 and 40 simulator time-steps) and set i = 0
96
Chapter 5. Visual feedback in motion
1.3 the learning algorithm is started: 1.3.1 measure (0 i] 1 i]) for the x, y, and z direction, and measure i], _ i], and i] for joints 1, 2, and 3 1.3.2 if i > 0 a learning sample is created 0 i ; 1] 1 i ; 1] 0 i] 1 i] i ; 1] _ i ; 1] i ; 1] ) i] and added to the set of learning samples 1.3.3 if ( d ; t i]) 0 go to 1.4 1.3.4 0d i + 1] and 1d i + 1] are determined using the criterionator 1.3.5 the network requests a new weight matrix from lummel, and generates a i +1] for joints 1, 2, and 3, which is sent to the robot 1.3.6 i i + 1 go to 1.3.1 1.4 back to 1.1.
5.5.6 Exploration
Naturally, the self-learning controller can never learn what it has not seen. Some method of exploration of the input space and output space is required. This is done as follows. The neural controller is used to approach the target. When the nal distance at ( d ; t 0]) = 0 between the target and the end-eector exceeds some threshold (10cm in the simulator experiments), the test is performed again with the same initial robot position and object position, but the output of the neural network is augmented with additive white noise. This is repeated up to three times with increasing noise levels.
5.5.7 Finding the optimal network
As in the previous chapter, we use the theory presented in (Vysniauskas et al., 1993) to determine the optimal network size and number of learning samples for this problem. The procedure for this is as follows: 1. 5,500 samples are generated 2. for dierent numbers of hidden units and learning samples N , perform the minimisation procedure and note the resultant summed squared error. In this case we take 5 40 and 100 N 1500. 4,000 Separate samples are used for cross-validation 3. compute the AMEF(pmin, pmax) (as described in chapter 2) function to t the data obtained in the previous step, and determine the optimal parameters pmin, pmax, and . In this case, pmin = 1 and pmax = 4 was taken 4. the resulting model can be used to nd the minimal resources line (i.e., where computational resources are minimal to obtain a certain error).
97
5.5. Controlling the manipulator
However, it appears that the AMEF() function does not t the data very well. We follow a suggestion by Vysniauskas (Vysniauskas, 1994), see also (Wilks, 1962)] and change the expansion of the AMEF() function of equation (2.18) (see page 16) to X X ij a (N ) : (5.42) p i+j =p N i=2 j
p
The dependency of a on N is also used by (Barron, 1991). The above steps are repeated for this adapted AMEF() function, leading to pmin = 1 and pmax = 4. The results of these experiments are shown in gure 5.14. The gures show the approximation by the 0.025
r r
0.02 0.015
r
r
r
r
r
r
r
r
r r
r r
=2
r r
r r
0.025 r r
0.015
0.01
0.01
0.005
400
0.025
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
400
0.02
800
N
0.025
=6
0.02
r
0.005
800
N
=4
0.02
=8
r
r
0.015
r
0.01 0.005
r
r
r
r
r
r
400
r
r
r
r
r
r
r
r
r
r
r
r
N
800
0.015
r
0.01 0.005
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
400
N
800
Figure 5.14: Error graphs with model. Accuracy in approximation of the camera-robot mapping by the feed-forward network with 2, 4, 6, and 8 hidden units. The learning error is represented by error bars at the bottom, the generalisation error by error bars at the top, the adapted AMEF(1, 4) is plotted with the thick solid line. On the vertical axes is printed the error in the network output (joint acceleration) in =tick2 , where a tick indicates a simulator time unit. Note that the AMEF(1,4) model is not very successful for small N and .
adapted AMEF(1, 4) function for the estimated ij of the robot-camera mapping function as de ned by the learning samples obtained from (4.3), for a network with 2, 4, 6, and 8 hidden units, for increasing numbers of learning samples. Apparently, for large number of hidden units the neural network over ts the data when the number of hidden units
98
Chapter 5. Visual feedback in motion
increases: although the error for the learning samples decreases, the error for the test samples does not always decrease when more hidden units are taken. As before, to nd the network with minimal resources, i.e., the network with a certain desired accuracy 0 for which the computational complexity
r N (cf. (2.16)) is minimal, the adapted AMEF(1, 4) model must be evaluated such that a (N ) = 0 (2.17a) cf. gure 2.3. Figure 5.15 shows the minimal resources line for the 2
4
6
8
0.015 0.014
10
200 400
0.013
600 800 1000
N
Figure 5.15: Minimal resources for the time-to-contact neural network. The gures show the contour lines of the adapted AMEF(1, 4) model, as well as the minimal resources (dots) for this problem. Each contour line indicates an increase in the error of 0.001. As the gure shows, it is not useful to take more than 6 hidden units in the neural network this will lead to an expected average error in the network output of just below 0:013 =tick2 , where a tick indicates a simulator time unit. Note that the minimal resources line makes a wrong prediction for large N .
time-to-contact robot control neural network. The gure clearly shows that a network with in the order of 5{6 hidden units is optimal, and that the number of learning samples is not very relevant when chosen larger than approximately 400. This graph was used to choose the optimal network and learning set size as follows. If it is assumed that the relationship between the visual acceleration 2 and joint acceleration is locally linear, then the error of the network output in gure 5.14 can be coarsely interpreted as the error in visual acceleration. Thus it was determined that in this case, a network with 6 hidden units and at least 400 learning samples is optimal. This network size was used in the experiments.
99
5.6. Results
5.6 Results In order to measure the success of the method while applied to the simulated robot, we measure the d(t), d_ (t), and d (t) during the trajectory with the simulator, these data are available. A correct deceleration leads to a d(t) = d_ (t) = 0 when d = 0, i.e., at the end of the trajectory. The results of a run with the simulated OSCAR robot are shown in gure 5.16. This graph shows the distance between the end-eector and the target object at d = 0. The results show that after only a few trials (in this case, 4), the positional distance at = 0 / cm velocity at = 0 / (cm/tick) 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0
20
40
60
80
100 trial #
00
20
40
60
80
100 trial #
Figure 5.16: Distance and velocity at = 0. The left gure shows the distance q d( = 0)2x + d( = 0)2y + d( = 0)2zqbetween the end-eector and the approached object. The right gure shows the velocity d_( = 0)2x + d_( = 0)2y + d_( = 0)2z of the end-eector in cm/tick. Typical end-eector velocities during the trajectory are between 0.5 and 2.0. The horizontal axis indicates the trial number.
error at ( d ; t 0]) = 0 is below one millimeter, while the velocity is below 0.1cm per simulator time unit (typical end-eector velocities during deceleration are 0.5{2.0 cm per simulator time unit). To illustrate the followed trajectories, the velocity in the x, y, and z direction for a trial in the beginning (not successful) and a trial after learning (successful) are plotted against .
5.6.1 In uence of input noise
The inuence of noise on the input to the network on the results is experimentally determined. Thereto we use the noise estimates from chapter 3 for typical cases, and see how the addition of such noise to the network input inuences the results of the deceleration. To be exact, the following noise levels are added. We take as standard deviations q x = 0:010 A
100
Chapter 5. Visual feedback in motion
velocity / (cm / tick) 0.4 0.0 -0.4 -0.8 -1.2 -1.6 -2.0
0.2
velocity / (cm / tick)
0.0 -0.2 -0.4 -0.6 -0.8 -1.00
40
30
20
10
0
40
30
20
10
0
Figure 5.17: Two exemplar trajectories followed with time-to-contact control. Both gures show the velocity d_( )x (solid), d_( )y (dashed), and d_( )z (dotted). In both gures, the order n of the deceleration was chosen to be 2. (Left) a trial at the beginning is not successful. The velocities vary greatly during the trajectory, and do not go to 0. (Right) a later trial is successful: the velocities in the x, y , and z directions decay smoothly to 0.
q y = 0:016 A q A = 0:0036 A: Furthermore, the x and y positions are computed by taking the average position of the number of pixels of the measured area, such that the x and y both must be divided by A (which is measured in pixels). Next we have to nd the standard deviation of the z component of motion. It is shown in appendix B.4 that this z component also has a normal distribution, when Aqis a normally distributed random variable. In fact, z is normal distributed with mean 1= A q and variance A2 =(2A A). Finally, we have to take into account the relationships (5.32). The resulting standard deviations are
q q (0x) = 2l 0:010= A (1x) = 7l 0:010= A q q (0y ) = 2l 0:016= A (1y ) = 7l 0:016= A q q (0z ) = 2lA=(2A A) (0z ) = 7lA=(2A A) where l is a noise level factor the noise levels derived in chapter 3 result for l = 1. These standard deviations are used in the computations. Note that the standard deviation of the joint measurements is taken to be 0. More importantly, the positioning inaccuracy of the robot is not taken into account. This positioning inaccuracy would, however, not inuence the behaviour of the neural network.
101
5.7. Discussion
Figure 5.18 shows the results of the control algorithm when tested with dierent noise levels. Noise is present in the measurements, and also in the learning samples which are taught to the neural network. A graceful degradation is shown up to very high noise levels, as high as 7 times the expected amount of noise. avg. error / cm 0.16 0.14 0.12 0.1 0.08 0.06 0.04
0
2
4
6 noise level
Figure 5.18: Results of the simulation q when noise in the visual input is applied. The gure shows the distance d( = 0)2x + d( = 0)2y + d( = 0)2z between the endeector and the approached object averaged over 200 goals (marked by dots in the graph). The vertical axis shows the noise level l. The gure clearly shows that the noise level and the grasping error are linearly related. At values of l 7 the signal-tonoise ratio is thus large that the system cannot always reach the target at d = 0, but sometimes overshoots. The average error at l = 7 goes up to 2.0cm at l = 8 it is as high as 4.0 due to overshoots.
5.7 Discussion A method has been proposed which can control the end-eector of a monocular robot in three-dimensional space by using optical ow feedback to obtain depth-information. The system does not use knowledge of the robot arm or camera, and does not require any calibration at all. Instead, from the visual feedback setpoints in visual domain are calculated, and these setpoints are translated to joint domain using an adaptive neural controller. Results show that this system can be used to create a robust feedback robot arm control system. The error in the simulator experiments is very low: after only a few trials, the grasping distance at the desired time-to-contact is around 0.05cm while the gripper velocity is nearly 0. The system has also been shown to remain stable under the inuence of noise in the input parameters and learning samples.
102
Chapter 5. Visual feedback in motion
The choice of the controller input signals, by requiring only rst derivative visual information, makes the system very viable for implementation on a real robot arm. However, a few questions remain to be asked:
Where is `contact'? Recalling the basic lens equation (3.1), the moment of `contact'
where dz = 0 is that moment where the object is placed at distance f from the (pinhole) lens. But what happens when we want to place this nal point further away from or closer to the lens? Changing the position where dz = 0 can be best done by changing the optical system to get a dierent focal distance after all, by decoupling the x, y, and z components of the visual observation, only the latter depends on the focal distance. Alternatively, the time-dependent deceleration constraint (5.17) can be augmented with an oset constant, depending on the actual size A of the object. On the downside, this would mean that the resulting distance would be dierent for dierent size objects.
Accelerating towards the object. In this chapter we have only concentrated on the
deceleration of the manipulator. Naturally, the acceleration is also of importance, yet no absolute positional information of the object is available at all when the camera is not moving. Therefore, the only way to solve that problem is to assume a quasi-random motion, and correct from that moment as soon as the rst visual measurements are available. This drawback is, in fact, not untypical for various biological systems.
Extension to the controller. In the controller described above, the values of 0d i +1]
and 1d i + 1] are calculated using the criterionator. However, this can also be replaced by a neural network. This neural network N 0 has to generate the following transition:
N (0 i] 1 i] ( d ; t i + 1])) = (0d i + 1] 1d i + 1]) : 0
(5.43)
After the transitions (5.39) and (5.40) the actually reached 0 i + 1] and 1 i + 1] are available. The transition from 0 i] and 1 i] to these signals at i +1 contains the necessary information on which transitions are reasonably performed by the robot these samples thus form the learning samples of the robot. Secondly, N 0 should always map those inputs where ( ; t i + 1]) = 0 on (0 0). Finally, the resultant `grasping error', i.e., the remaining 0 and 1 when = 0, can be used as the variance of the learning samples generated for N 0 along the trajectory leading to that grasping error. Thus it is also possible to generate dierent types of trajectories in visual domain, e.g., generated by human operators, which can be translated to trajectories in joint domain by the neural network.
Chapter 6
Dynamic control of a rubbertuator arm When a robot system is designed, the focus generally is a design such that friction, gravity, and payloads can be practically neglected. Therefore, robots are built extremely sti (i.e., non-compliant) and are equipped with joint actuators which are strong enough to overcome threshold friction, position-dependent gravity, and payloads. The merit of such an approach is that relatively simple control algorithms can be used to position the endeector with high accuracy. However, apart from the high cost of such robot systems and their high energy consumption because of their heavy construction, their large strength makes their use in environments where humans operate, such as hospitals and household environments, too dangerous. Therefore the search for simpler, more compliant robot systems is of importance. One of such systems is the SoftArm robot. The pneumatically driven actuators of this robot consist of rubbertuators, which are modelled after skeletal muscle systems. The rubbertuators have a high force-to-weight ratio and are very compliant, such that the robot is safe for operation in direct contact with human operators. Yet control of such a system is a dicult problem. Naturally, coarse positional control can be obtained with simple feedback algorithms. As is shown in this chapter, it is possible to use a standard PID controller in a feedback loop to control the joint values of the robot towards their desired values. The resulting precision, however, is rather bad the desired trajectory is only coarsely followed (lagging and hysteresis problems), and the error in joint position is up to 10. The algorithms that can be used for controlling industrial non-compliant robots are not usable to control compliant robots due to their complex, highly nonlinear dynamics. The neural-network based adaptive algorithms developed in the previous chapters have been implemented on non-compliant robots with known dynamics and kinematics. Of course, complex adaptive algorithms are|strictly speaking|not necessary when these robots are controlled. However, it has been shown that neural networks can be well applied to robot control. But how do these algorithms behave when applied to robots where selfimposed problems such as changing kinematics are no academic assumption anymore, but are reality? In this chapter we want to demonstrate the usability of neural-network based Some of the material in this chapter was previously printed in (Hesselroth et al., 1994) and (van der Smagt & Schulten, 1993). ()
103
104
Chapter 6. Dynamic control of a rubbertuator arm
adaptive algorithms in those cases where conventional algorithms cannot be used anymore. The research presented in this chapter is only exploratory, and is intended to demonstrate the potential of neural network to control such complex robot systems. It should therefore be noted that the results presented in this chapter can only be interpreted in their limited test environment nevertheless, this work, which shows good results in short learning times, may serve as a basis for further research. Neural networks have previously been applied to the control of this robot. In (Hesselroth, Sarkar, van der Smagt, & Schulten, 1992), a visual observation of the desired position in world domain is directly translated to rubbertuator pressure by using a Kohonen-type neural network. This network learns to position the end-eector within 1cm of its desired position after learning, by using visuo-motor coordination. However, learning sessions are very time-consuming (in the order of hours), and each positioning trial takes in the order of 30s|a system like that has limited applicability. Furthermore, the trajectory of the robot arm which connects one endpoint to another is uncontrolled and oscillatory. In (Katayama & Kawato, 1992), a single joint of a similar robot arm is dynamically controlled to follow a trajectory in joint space, resulting in an error in the order of about 1 for a fast movement. Again, these good results are only obtained after long training sessions: in this case, the authors report to have followed a trajectory 2,000 times before this acceptably high accuracy obtained. In this chapter a feed-forward network based dynamic control system for the SoftArm robot as produced by Bridgestone Corp. is proposed. As before, it is required that the system learns correct behaviour quickly. In section 6.1 the various parts of the robot system are described. In section 6.2 the measured dynamical behaviour of the robot arm is described, and theoretically explained. Section 6.3 describes a simple neural system for stably controlling a single joint of the robot, the results of which are given in section 6.4. A wrapup of the chapter is given in section 6.5.
6.1 The robot system The robot arm was built mainly from components manufactured by Bridgestone Corporation of Tokyo, Japan. The whole robot system consists of a robot arm, an air compressor, servo-drive and servo-valve units, and a gripper.
6.1.1 Kinematic system
The robot is a four-link manipulator with ve degrees of freedom. It is mounted by suspending it from its top joint. A labeled picture of the SoftArm is reproduced in gure 6.1). The arrangement of the joints and their range of movement is basically modelled after the human arm. Because its pneumatic actuators, each consisting of two or four inatable rubber tubes named rubbertuators, are relatively light, the arm weighs only 12 kg and can lift 3 kg. Because of its weight and compliant characteristics, this arm can be employed for application around human operators or fragile equipment. Intended uses are in hospitals, around the handicapped, for household tasks and in areas where
105
6.1. The robot system
joint 0
base
joint 1 articial muscle `rubbertuator' joints 3, 4, 5
joint 2 gripper Figure 6.1: The structure of the SoftArm robot.
electrical circuits cannot be introduced. The dimensions and range of movement of the joints are given in table 6.1. The torque applied to each joint can be controlled by setting the pressures 1 of the agonist and 2 of the antagonist rubbertuator pairs. The rubbertuators to drive link i are xed parallel to each other in link i ; 1. The free ends are connected to each other by a chain. The chain goes around a sprocket xed in link i ; 1 and connected to link i. The angular position of joint i thus depends on the relative lengths of the tubes as shown in g. 6.2. This relationship can be expressed as
= l12;rl2
(6.1)
where r is the radius of the sprocket and l1 and l2 are the respective lengths of the rubbertuators. One of the greatest advantages of a rubbertuator is its very high force-to-weight ratio, about 240, compared to a value of about 16 for DC servo motors. This is especially good for robotics applications in which the actuators for the extreme joints are in motion as part of the arm. The stiness of any joint is de ned as the total pressure 0 = 1 + 2 of the rubbertuators that drive it. When this total pressure is high, the joint behaves sti, whereas a low 0 results in a compliant joint.
106
Chapter 6. Dynamic control of a rubbertuator arm
Table 6.1: Dimensions of the links and motion range of the joints.
Specication
Item
model degree of freedom
rst (shoulder) rota- second tion (upper arm) angle third and (lower arm) arm fourth length (wrist pitch)
fth (wrist roll) lifting capability
FAS{501 5 60 { 50 410 mm 50 370 mm 45 270 mm 90 { max. 3 kg
angle length angle length angle length angle length angle length
l1 T1
1
link i ; 1
2 l2
T2
i to link i
Figure 6.2: An agonist and an antagonist rubbertuator are connected via a chain across a sprocket their relative lengths determine the joint position i .
107
6.2. Dynamics of the SoftArm
6.1.2 The rubbertuator drive system
The robot is supplied with compressed air of constant pressure. Five servo-drive units (SDU's) provide the internal control circuitry for the robot. Each unit receives 11-bit precision pressure signals from the host computer, converts them to analogue signals, and sends them to a servo-valve unit (SVU). The servo-valve unit senses the pressure of each of the two rubbertuators it controls, and regulates this pressure by opening or closing electric valves.
6.1.3 The gripper and its controlling valves
A gripper of about 1kg is installed at the end of the arm. It has a simple two- ngered clamping action and is powered by air pressure. The ngers are approximately 10 cm long. Two inlets are required, one for opening and the other for closing. The air pressure is supplied through electric valves which can be controlled by the computer.
6.2 Dynamics of the SoftArm
position/degrees
60
position/degrees
position/degrees
position/degrees
The servo-drive units allow the robot to be controlled in two modes: position control mode (closed loop control) and pressure control mode (open loop control). When the SoftArm is controlled in position control mode, an internal PID controller (see, e.g., (Craig, 1986)) is used in a feedback loop. This PID controller uses joint position feedback from the optical shaft encoders mounted on each joint to determine the pressure of the joints in a closed loop. Figure 6.3 shows a representative move of one joint of the robot arm. The
40
20
66 64 62 60 58
0 0
1
2
time/s time/s 3
4
5
6
56
0
50
time/s time/s 100
150
200
Figure 6.3: Joint 2 of the rubbertuator robot moving in position and pressure control mode. The left gure shows the position control mode, i.e., closed loop control. Notice the jagged curve due to the feedback in the internal PID controller this is probably caused by an incorrect parameter setting for the Integrator part of the PID controller. In the right gure, pressure control mode (open loop control) is used. The gure clearly shows that, due to the behaviour of the rubber, it takes a long time before the joint settles to its steady state due to the elastic behaviour of the rubber.
108
Chapter 6. Dynamic control of a rubbertuator arm
feedback mechanism should generate a smooth motion, but due to non-optimal feedback control the move is oscillatory. In pressure control mode, the pressure values sent by the host computer are directly translated to currents for the valves and the rubbertuator pressures are set correspondingly. The pressure generates a force in the rubbertuators which makes the joint rotate to assume a new equilibrium position.
6.2.1 Behaviour of a rubbertuator driven joint
In order to further understand the dynamics of the SoftArm robot, we will rst have to investigate the behaviour of a single rubbertuator. Figure 6.4 shows the structure of a rubbertuator. a. increasing pressure
tting ber cord rubber tube b. Figure 6.4: The structure of a single rubbertuator. In a., a photograph of a rubbertuator is shown. In b., a schematic representation of the structure of the rubbertuator tube is given.
Each actuator consists of a rubber tube sealed on one end and with an air inlet on the other end. The contraction force Tj exerted by rubbertuator j 2 f1 2g for each joint is speci ed by the manufacturer as (6.2) Tj = j Dj2 a(1 ; $j )2 ; b where j is the supply pressure, a and b are constants depending on the particular tube, 0 $j < 0:2 is the contraction ratio which is directly (approximately linearly) related to the rubbertuator length lj , and Dj is the eective diameter of the tube before displacement. Although (6.2) is not a precise model of the rubbertuators, it suces to qualitatively describe their behaviour. The driving force of a rubbertuator varies with pressure and the contraction ratio. For instance, under constant pressure it has such a spring characteristic that the contracting force becomes stronger as the degree of contraction becomes smaller. When the contraction ratio is constant, the force increases with increasing pressure. Thus the rubbertuator has spring-like characteristics, while allowing a varying spring constant.
6.2. Dynamics of the SoftArm
109
Pressure{position relation. From (6.2) it can be seen that for any speci c choice of
j there exists an in nite number of values $j and Dj which realise a speci c exerted force Tj . Therefore, when a joint is in equilibrium, i.e., the external forces (gravity) are equal to T1 ; T2 , the joint angle is not only dependent on the pressure but also on the diameter of the tube before displacement. Since the diameter depends on the pressure and the elongation (before the displacement), the new joint position depends on the new pressure as well as on the previous position. Figure 6.5a demonstrates this hysteresis eect for joint 1. This hysteresis can be shown by moving a joint along a pressure trajectory from 1 = 0 2 = max to 1 = max 2 = 0 and back again by incrementing and decrementing the pressures by a constant value (. This results in the behaviour shown in gure 6.5b. The width of the gap between the two curves depends on how fast the pressures are changed the slower the change of the pressures, the narrower the gap. The trajectory and velocity in joint space followed for a constant pressure increase and decrease is depicted in gures 6.5c and d. The velocity is numerically computed from the position. Near the extreme values the joint velocity decreases since the increase in exerted force for a constant change in pressure is less.
Elasticity of the rubbertuators. The long-term settling behaviour of the rubber has
a large eect on the position of a joint after the desired pressure is reached and the joint seems to have reached its position. Figure 6.3 shows the position of joint 1 in time when the rubbertuators are allowed to settle for 200 seconds in pressure control mode. During this settling time, the joint rotates for about 1. Inuence of the temperature of the rubbertuators (which can occur due to varying climate conditions or simply by using the arm for extended periods of time) has also a large inuence on the pressure{position relation. When repeatedly moving the robot to the same pressure, the system drifts gradually to dierent positions (see g. 6.6).
6.2.2 Analysis of the rubbertuator behaviour
In order to explain and attempt to model the behaviour described above, we have to consider the structure of a pair of rubbertuators as shown in gure 6.2. The total force (T which the combined rubbertuators exert on the joint is, according to (6.2), (T = 1 (a(1 ; $1)2 ; b)D12 ; 2 (a(1 ; $2)2 ; b)D22: (6.3)
Pressure-force relation
If we assume that D = D1 = D2, i.e., the rubbertuators are in their `middle' position, then h i (T = 1 (1 ; $1)2 ; 2 (1 ; $2)2) aD2 + (2 ; 1)bD2 (6.4) By letting ( = 1 ; 2 (the `dierence pressure') and 0 = 1 + 2 (the `base pressure' or stiness) we get h i (T = 21 0((1 ; $1)2 ; (1 ; $2 )2) + (((1 ; $1)2 + (1 ; $2)2) aD2
110
a.
10
position/degrees
position/degrees
position/degrees
Chapter 6. Dynamic control of a rubbertuator arm
0
-10
b.
60 30 0
-30 -60 -20 1
2
trial number trial number 3
4
5
-2000
velocity / (/ms)
50
joint velocity/(deg/ms)
position/degrees
position/degrees
0
0
-50
c. 0
500
time/ms 1000
time/ms
1500
2000
-1000
( 0
1000
2000
4
2
0
-2
d.
-4
0
500
time/ms 1000
1500
2000
time/ms
Figure 6.5: Hysteresis in SoftArm positioning for joint 1. In a., the joint angle reached when 1 =2 is changed from 600/900 to 675/825 to 750/750 to 825/675 to 900/600 to 825/675 to 750/750 to 675/825 and back again. The position reached depends both on the previous position as well as on the new pressures. In b., the joint is moved by applying a constant pressure increment ( to rubbertuator 1 and the same decrement to rubbertuator 2. When the extreme pressures are reached, the direction is reversed. Figure c shows the joint position while the trajectory of gure b is followed d shows the velocity. An extreme joint position is reached at t = 0 ms and t = 1000 ms. From t = 0 to t = 1000 the change in pressures for the rubbertuators is constant. Since the exerted force is constant near the equilibrium point, the joint has a constant acceleration prole.
111
6.2. Dynamics of the SoftArm
position/degrees
-4
-5
-6
0
2
4
6 time/min
8
10
12
Figure 6.6: Drift of the rubbertuators when the robot is used for a long period of time. The pressures of the rubbertuators are repeatedly increased/decreased by 1% of the total pressure.
112
Chapter 6. Dynamic control of a rubbertuator arm
;(bD2
By setting c = $1 + $2 and K = $1 ; $2,
"
#
2 2 (T = 0($1 ; $2)(c ; 2) + 2((1 ; c) + ($1 ; $2) ( + c ( aD2 2 2 ;"bD2( 2 2 # ( K ) c 1 = 2 0K(c ; 2) + 2((1 ; c) + 2 ( + 2 ( aD2 ; bD2( "2 2 # 2K 2 c aD aD 2 2 2 = 21 aD K (c ; 2) 0 + 4 ( + 4 ; bD (: } | {z | {z } {z } | 1 2
1
2
3
In conclusion, we can write (T = 10 + 2 (2 + 3(
(6.5)
To understand this result, we must look at those values of and ( where (T in (6.5) is 0, i.e., the system is in equilibrium. The trajectory followed in gure 6.7 corresponds with the measured trajectories of gure 6.5b. 0.6
0.4
0.2
-1
-0.5
00
( 0.5 dp
1
-0.2
-0.4
-0.6
Figure 6.7: Equilibrium line of eq. (6.5). For the gure we take 1 > 0, 2 > 0, and 3 > 0.
Explaining hysteretic behaviour Secondly, the hysteretic behaviour of a rubbertuator driven joint can be modelled by substituting dierent values for D in (6.2). This hysteretic behaviour is, in fact, an understood result of the material used in the rubbertuators (Holownia, 1977).
113
6.2. Dynamics of the SoftArm
If we assume that rubbertuator 1 has a diameter D1 = D + (D before displacement, and that rubbertuator 2 has a diameter D2 = D ; (D before displacement, then (6.2) can be written as (T = P1(a(1 ; $1)2 ; b)(D + (D)2 + P2(a(1 ; $2)2 ; b)(D ; (D)2 = P1(a(1 ; $1)2 ; b)(D2 + (D2 + 2D(D) +P2(a(1 ; $2)2 ; b)(D2 + (D2 ; 2D(D):
(6.6) (6.7)
By splitting this equation in three parts for D, (D, and D(D we can apply (6.5) by substituting rst D and then (D in (6.4), such that (T = 10 + 2(2 + 3( + 010 +h02(2 + 03( + i 2D(D 1 (a(1 ; $1)2 ; b) + 2 (a(1 ; $2)2 ; b) = (1 + 01)0 + (2 + 02)(2 + (3 + 03)( + 001 ( + 002 0 2 + 003 0
(6.8) (6.9)
by applying similar transformations as above. Thus the parameters are related as 2 0i = i (DD2 00i = i (DD : Again we can plot the equilibrium lines of (6.9) by solving (T = 0. This has to be done for (D > 0 and (D < 0, which results in dierent signs for the parameters 00i . The result is shown in gure 6.8. From this gure it is clear that the hysteresis can be
(
Figure 6.8: Equilibrium lines of eq. (6.9). In this case we take 1 + 01 > 0, 2 + 02 > 0, 3 + 03 > 0, 001 > 0, 002 < 0, and 003 > 0 for the dotted line, and 001 > 0, 002 < 0, and 003 > 0 for the solid line.
explained from the model of a rubbertuator driven joint.
114
Chapter 6. Dynamic control of a rubbertuator arm neural network
(t + (t)
( Θ Θd Θd ) (t) d
(t) ( Θ Θ Θ ) (t)
interpolate
Θ (t)
Figure 6.9: The neural robot control system in a feedback loop with the robot.
6.3 Control of the SoftArm From the above it is obvious that a precise model for the pneumatic actuators cannot be easily constructed. When the robot arm is used for accurate positioning and orientation of the end-eector, an adaptive algorithm is preferred for controlling the robot. In this section an adaptive system will be described which is capable of making one of the robot's joints follow a prescribed trajectory.
6.3.1 System setup
From equation (6.5) it can be seen that the torque depends on the pressure dierence (, the stiness 0 , and the diameter before displacement D12. Since the value of D12 is not known and cannot be easily measured, we instead can use 12 before displacement in combination with to carry the same information. pressures 1(t) for the rst muscle of The task of the robot controller is to generate _ a joint, such that a speci ed trajectory d (t) d (t) d (t) is followed. The `stiness' 1 + 2 is always kept constant, such that the pressure from the second rubbertuator can be derived from the rst. Since the system is a discrete-step closed loop, we will write index i] instead of continuous time (t). The robot control system, which is depicted in gure 6.9, receives values i] from the robot at intervals of approximately 20 ms. In order to obtain estimates of _ and which are not too much noise-sensitive, these values are tted to orthonormal polynomials following an incremental algorithm derived from (Forsythe, 1957 Hayes, 1970) and described in section B.5. Thus we can, with some accuracy, nd , _, and at each desired time. The measured pressure 1 i], i], _ i], and i], and the desired d i], _d i], and d i] are input to the neural network. The network then generates a target pressure 1 i + 1] which is sent to the robot. The obtained rotation, after the pressure change has been applied, is used as a new learning sample. Thus, the network performs the mapping N 1 i] i] _ i] i]d i] _d i] d i] = 1 i + 1]: The new pressure is sent to the robot: R i] _ i] 1 i] 1 i + 1] = i + 1] _ i + 1]
115
6.4. Results
Thus the following learning sample is available:
C0 1 i] i] _ i] i] i + 1] _ i + 1] i + 1] = 1 i + 1]:
6.3.2 Network structure As in the previous two chapters, the control system consists of two programs `lummel' and `connel' running on two processors in parallel. One program gathers the data from the robot and calculates the joint velocity and acceleration. These data are transmitted to the neural network. The neural network is a feed-forward network trained with conjugate gradient optimisation with Powell restarts as described in chapter 2. Newly generated samples are continuously added to the bin of available samples, upon which minimisation is performed. For the single joint problem, the network consists of 7 inputs, 15 hidden units, and one output.
6.4 Results In order to evaluate the success of the neural controller, it is compared with the internal PID controller on two trajectories. For the rst trajectory one of the joints of the robot arm has to follow a trajectory d (t) = c sin(t) for the second, the trajectory is more complicated: d (t) = c sin(t) cos2(11t). The success of each controller is measured by comparing d (t) with the realised trajectory (t).
6.4.1 A simple trajectory First the internal PID controller is tested on the trajectory. The result of this trial is shown in gure 6.10. As this gure shows, the PID controller suers from three problems. First, the internal parameters of the controller are not set correctly, leading to the jagged form of the curve. Second, the controller is lagging behind with the desired signal. The third problem, however, is most serious: as the right gure in 6.10 shows, the PID controller does not solve the hysteresis problem. The relationship (t){d (t) depends on the direction of motion. Furthermore, this position dierence is larger than the error caused by the incorrect PID parameters (jaggedness). Next, the neural controller is tested on the same trajectory. Figure 6.11 shows the result of a long training session. As the gure shows, the trajectory is accurately followed except in the extrema where an error of approx. 1:5 remains. The right gure 6.11 shows the desired vs. the realised trajectory. It is clear that the system does not suer from hysteresis anymore. The initial learning behaviour is shown in gure 6.12. It is clearly shown in this gure that the network learns to control the robot after only a few trials. However, accurate control is only possible after a longer training session.
116
Chapter 6. Dynamic control of a rubbertuator arm
4000
2000
2000 0
d
4000
0
100
200
;4000 ;2000 0 0
300 t
;2000
;2000
;4000
;4000
;6000
;6000
2000 4000
Figure 6.10: Using the internal PID controller to follow the trajectory d (t) = c sin(t). The left gure shows the desired and realised trajectories vs. t the right gure depicts the desired (horizontal axis) vs. the realised (vertical axis) trajectory.
4000
4000
2000 0
d
2000 0
100
200
300 t
;4000 ;2000
;2000
;2000
;4000
;4000
00
2000
4000
Figure 6.11: Using the neural network controller to follow the trajectory d (t) = c sin(t). The left gure shows the desired (solid line) and realised (dotted line) trajectories vs. t the right gure depicts the desired (horizontal axis) vs. the realised (vertical axis) trajectory. This behaviour is recorded after ve minutes of learning.
117
6.5. Discussion 6000
4000
2000
0
500
1000
1500
0
2000
2500
3000
t
-2000
-4000
-6000
Figure 6.12: Learning the control. The gure shows the desired (grey line) and realised (solid line) trajectories vs. t. This gure clearly shows that the network learns to follow the trajectory very quickly.
6.4.2 A more complex trajectory
The system, with no a priori knowledge, is trained on a trajectory sin(t) cos2 (11t). Initially, the trajectory is only followed very coarsely. After 16 trials, however, accurate trajectory following is obtained, with an error of 1 near the extrema, and less than 0:1 on the slope (see gure 6.13).
6.5 Discussion In this chapter we have applied a neural-network based controller to a pneumatic robot arm with complex, highly nonlinear dynamics, which change in time due to external inuences. This controller has been shown to perform better than the manufacturerspeci ed PID controller, and learns correct trajectory following after only a few trials. The investigations have led to a better understanding of the SoftArm robot. It has been shown that the pressure-position relationship of the joints are a direct consequence of the behaviour of the rubbertuators. Furthermore, it has been shown that the time integral of the base pressure 1 + 2 is directly related to the joint rotation that this pressure change instigates. Nevertheless, further research is required. First, more extensive testing of the neural controller is required to fully assess its capabilities. Secondly, the controller is applied to only one joint at a time. This means that the system is reduced from a 35-dimensional (or 21-dimensional when the gripper is treated separately) to a 7-dimensional one the scalability of the approach has not been tested.
118
Chapter 6. Dynamic control of a rubbertuator arm
angle/degrees 60 40 20 0 -20 -40 -60 0
100
200
300
400
500 time/s
Figure 6.13: Training the system on a sin(t) cos2 (11t) trajectory. The solid line is the target trajectory, the dotted line the realised trajectory. Initially, the trajectory is reasonable followed where velocity is constant after only 16 trials, the whole trajectory is followed with an average error of 0:1 .
6.5. Discussion
119
Investigation of the rubbertuator SoftArm is only in an early stage. The dynamics control problem at hand remains a tricky one because of its extremely high dimensionality. In that aspect the approach taken in (Hesselroth et al., 1994) shows, despite its slow learning and very slow robot movements, promise: it makes a direct translation from (stereo) visual coordinates (a four-dimensional vector) to dierence pressures (5-dimensional). It would be a genuine challenge to attempt to generate trajectories of joint pressures to satisfy the time-to-contact constraints given in chapter 5, thus eliminating the need for a separate dynamic controller.
120
Chapter 6. Dynamic control of a rubbertuator arm
Chapter 7
Discussion In this thesis we have focused on the basic ingredient for grasping an object: to position the end-eector of a robot arm above a stationary target. In order to obtain an adaptive control system, the monocular robot arm was controlled with a neural network using no a priori information of the robot properties or of its sensors.
The best neural network
In chapter 2 we have described a method to nd the optimal feed-forward neural network structure, as well as a heuristic to nd good weight values. The optimal neural network structure, viz. the number of hidden units and the number of learning samples N in order to reach a certain approximation error, can be found by determining an asymptotic model (`AMEF') of the error as a function of and N as described in (Vysniauskas et al., 1993). Secondly, the Polak-Ribi$ere conjugate gradient optimisation method with Powell restarts appears to be a good method for estimating correct weight values in a neural network.
Limitations and open questions The AMEF, as proposed in (Vysniauskas et al., 1993), has a linear dependency on the number of learning samples N . However, for the experiments in chapter 5, a square root dependency is more realistic (Vysniauskas, 1994). This is also proposed in (Barron, 1991). A second problem with the AMEF model is that it requires extensive (and very time-consuming) simulation to estimate the model, each time it is applied to a certain problem. Although the conjugate gradient method is mathematically well-understood, there is little theoretical knowledge about avoiding local minima and saddle points in the optimisation process. The seriousness of this problem is, apart from the optimisation method, also dependent of the structure of the neural network. The question of whether a global minimum is reached or not remains unanswered. Stochastic search methods (Aarts & Korst, 1989) overcome these problems but are too slow nevertheless, new feed-forward network training algorithms such as Alopex (Unnikrishnan & Venugopal, 1994) seem promising alternatives. 121
122
Chapter 7. Discussion
The problem of the approximation of high-dimensional functions both fast and accurately is not solved. There is a tradeo involved: when local representation is used for approximating learning samples of an unknown function, an accurate representation can be obtained, yet at the cost of very extended training times to get good generalisation. On the other hand, global approximation methods are very successful in generalising with only few learning samples, but generally do not obtain a high approximation accuracy. A compromise is proposed in (van der Smagt et al., 1992a Jansen et al., 1995 van der Smagt et al., 1994), by creating a hierarchical multi-resolution approximation of a function in a tree. At the root of the tree, the coarsest level of approximation (by a linear approximator) is kept as the tree grows deeper because the approximation at higher levels is not precise enough, more precise representations are created. Thus exact representation and fast learning are combined in one method.
Static hand-eye coordination With the methods described in this thesis to determine the optimal neural network to solve a speci c task, we can design a self-learning controller which positions a monocular robot arm above a stationary target. A simple model of the environment (viz. a known area of an object above which the manipulator with camera must be positioned) is used for positioning a monocular robot arm however, the method can be used with more complex visual front-ends which extract the necessary features (two or more marker points on the object) from the image. Simulation studies as well as implementation on a real robot show that the proposed neural network learns to position the robot within 1cm of the target after one step, and less than 1mm after two steps (one feedback step), after only several tens of trials, which takes in the order of a few minutes to learn. Furthermore, the system automatically relearns the hand-eye coordination when the robot-camera system changes: a rotation of 45 of the hand-held camera is adjusted for in several tens of trials.
Limitations The presented control system generates learning samples from the assumption that the target is static and not moving. A wider applicable system would be obtained when it can also position itself above moving targets while generating learning samples. The existing controller has been extended to this purpose. In (van der Linden, 1994) the input-adjustment method has been extended for moving objects by using the measured visual velocity of the object (Schram, van der Linden, & Krose, 1995) replace the velocity estimates by a tapped delay line. Secondly, in this thesis we have used a simple visual setup consisting of a white object against a black background, such that simple blob detection algorithms suce for the visual front-end. The detection of a moving target against a cluttered, moving background (due to the ego-motion of the camera) is a more dicult problem but can also be solved in real-time, as is shown in (Bartholomeus, Krose, & Noest, 1993) however, the robustness and precision of the latter method needs to be investigated more fully.
123
Open questions
Although an accurate adaptive controller has been designed, the accuracy can still be improved by using dierent neural network architectures, such as the hierarchical neural network mentioned above. Secondly, a better theoretical understanding of fast adaptivity (relearning) vs. stability (accurate representation) is required.
Visual feedback in motion
Due to the fact that monocular vision is used, the algorithm proposed in chapter 4 assumes knowledge of the object that is grasped. This limitation is relieved by the method proposed in chapter 5. We have shown that simple optic ow measurements (viz. visual measurement of distance divided by velocity, which can be measured via divergence) can be used to generate a trajectory in visual space. In chapter 5 it is assumed that the goal is that the robot arm with the camera equipped in the end-eector reaches a position where the visual position 0 and velocity 1 and possibly higher derivatives are all 0. It is shown that this requirement can be interpreted as a constraint on those two measured quantities, such that 0 i] = ; ( d ; t i]) i] n 1
during successive time intervals i, where d is the desired time that the trajectory may take, t i] the time in interval i, and n a smoothness parameter. This expression can subsequently be used to compute visual reference signals for the controller, without the requirement of a model of the observed scene. A self-learning neural network can be used to translate this trajectory to joint space (joint accelerations) such that the robot manipulator decelerates towards a stationary object. This controller, when used in a simulator testbed, leads to a grasping accuracy of less than 1mm in the end-eector position at the desired time, while the robot arm is at rest. Furthermore, the system has been shown to degrade gracefully with increasing levels of noise in its measured data. The attained grasping precision is directly related with the number of feedback steps that is executed during one trial, and can be increased by slowing the robot down.
Limitations As in chapter 4, it is assumed that the object is stationary. This limitation again eects the generation of learning samples otherwise, the method generates a path towards a stable state in visual domain, which also includes tracking an object. Secondly, the moment of `contact' is that moment where the object is placed at focal distance from the lens. But what happens when we want to place this nal point further away from or closer to the lens? The time-dependent deceleration constraint can be augmented with an oset constant, depending on the actual size of the object. On the downside, this would mean that the resulting distance would be dierent for dierent size objects. As an alternative, changing the `contact' position can be best done by changing the optical system to get a dierent focal distance.
124
Chapter 7. Discussion
Open questions In the controller described above, the values of the visual parameters along the desired trajectory are analytically determined. However, this could also be replaced by a neural network, which must determine the next desired position on the visual trajectory from the current position and the remaining time of deceleration. The transition from the visual parameters at step i to step i + 1 can be used to compute learning samples. The resultant `grasping error' can be used in a reinforcement-type neural network to determine the `goodness' of the previously obtained learning samples. The proposed method should also be able to be used for object avoidance. Furthermore, in this thesis we have only concentrated on three-dimensional robot arm control the optic ow measurements for a problem where the orientation (more than roll) of the end-eector is also of importance is more dicult.
Low-level robot arm control Finally, in chapter 6 we have investigated the control of the dynamics of a robot arm. For this problem, the well-analysed dynamics of the OSCAR robot are not subject of investigation, but instead those of the SoftArm robot. This robot is a 5-DoF robot arm, where each joint is controlled by an agonist-antagonist muscle pair the arti cial muscles consist of rubber tubes which can be inated or deated, resulting in length dierences. As measurements of that arm show, it exhibits a hysteretic pressure-position relationship, its behaviour is strongly dependent on `external inuences.' This behaviour is qualitatively explained from a simple model of the rubbertuators and the rubbertuator-driven joints. Since the input-output behaviour of the system is easily measured (pressures vs. joint values), a neural direct adaptive controller has been be constructed to control the robot arm. Tested for the control of a single joint, this method is shown to be successful after only a few learning trials, and performs better than the manufacturer-provided PID controller.
Open questions Investigation of the rubbertuator SoftArm is only in an early stage. The dynamics control problem at hand remains a tricky one because of its extremely high dimensionality. In that aspect the approach taken in (Hesselroth et al., 1994) shows, despite its slow learning and very slow robot movements, promise: it makes a direct translation from (stereo) visual coordinates (a four-dimensional vector) to dierence pressures (5-dimensional). It would be a genuine challenge to attempt to generate trajectories of joint pressures to satisfy the time-to-contact constraints given in chapter 5, thus eliminating the need for a separate dynamic controller. Therefore, further research is necessary to see if the feed-forward neural network approach is viable for this dicult problem, when more than one joint is controlled, and when the joints are arbitrarily controlled instead of repeatedly following the same trajectory.
125
Final remarks
As we have shown in this thesis, adaptive control of visually guided robot arms at a reex-level is very well possible with current technology. Nevertheless, robotics is still in an infant stage where applicability is concerned. This problem, which also inuences the impact on robot-related research funding, could be solved by examining novel robot structures in conjunction with intelligent and highly adaptive control algorithms. The theoretical framework presented in chapter 5 shows that such control algorithms can be developed using very simple sensory feedback, and in conjunction with an adaptive control mechanism leads to a stable and computationally cheap control method. It is therefore concluded that neural networks are useful mechanisms for control of high-dimensional systems such as robot-camera systems.
126
Chapter 7. Discussion
Appendix A
Orthogonal basis functions When a set of one-dimensional data points (xp yp) is tted with polynomials, the most obvious choice of basis functions would be i(x) = xi. The set of m data points then can be tted by minimising the error function, also known as chi-square merit function
2
=
2 m X 4
p=1
yp ;
n X j =0
32
aj xjp5
(A.1)
where n is the desired degree of the approximation. Minimum values for (A.1) are reached where its derivatives vanish, i.e., 0=
2 m X 4
p=1
yp ;
n X j =0
3 aj xjp5 xkp
k = 0 : : : n:
(A.2)
Solving (A.2) involves inversion of the matrix Q with
Qij =
m X p=1
xipxjp m
Z1 0
xi xj = i + m j + 1:
(A.3)
The matrix Q, written like this, is the principal minor of order n +1 of the in nite Hilbert matrix 2 1 1 1 3 66 1 21 31 77 66 21 31 41 77 (A.4) 5 4 3. 4. 5. .. .. .. which is extremely dicult to invert for large n, since roundo errors will have a tremendous impact. When orthogonal basis functions are used, however, all the elements of Q except those on the diagonal disappear, and inversion reduces to a simple scalar division.
127
128
Appendix A. Orthogonal basis functions
Appendix B
Proofs for chapter 5 This appendix contains proofs and deductions needed in chapter 5.
B.1 Derivation of the stopping criteria The derivation of the stopping criteria given in chapter 5 is presented here.
Theorem
A signal d(t) is approximated by a nite Taylor expansion
d(t) =
n X
aiti
(B.1)
d( )(k) = 0
(B.2)
i=0
where an 6= 0. Then the boundary conditions
80 k < n :
imply
! ann;k;1ak = 1 n : nn;k k ann;;k1
(B.3)
Proof
Using the Taylor expansion for d(t) the kth derivative of d(t) equals
d(t)(k) =
n X
i! a ti;k : i i=k (i ; k)!
(B.4)
Therefore, the constraints (B.2) can be rewritten as
80 k < n :
n X
i! a i;k = 0: i i=k (i ; k)! 129
(B.5)
130
Appendix B. Proofs for chapter 5
If we rst look at the constraint for k = n ; 1, then or, since an 6= 0,
(n ; 1)! an;1 + n! an = 0
(B.6)
n;1
= ; ana :
(B.7)
! n ah = h (; )n;han
(B.8)
n
Secondly, if it can be shown that for all 0 h < n,
then, by combining (B.7) and (B.8), we nd that
! ann;k;1ak = 1 n : nn;k k ann;;k1
(B.9)
To prove assumption (B.8), rst observe that, with the equation for in (B.7), it is clearly true for h = n ; 1. Now let us assume that eq. (B.8) is true for all ai with n > i h. By multiplying both sides of eq. (B.5) by 1=h! we nd that n i! X 0= ai i;h: h i=h
Similary for k = h ; 1,
n X
i
!
i;(h;1) a i
i=h;1 h ; 1 ! n X i i;(h;1) : = ah;1 + a i
h;1
0=
(B.10)
i=h
(B.11)
Using the assumption (B.8) for ai with i h,
! ! n n;i (;1)n;ia i;(h;1) 0 = ah;1 + n i=h h ; 1 " i ! !# n X i n : n ;(h;1) n ;i (;1) = ah;1 + an
h;1 i i=h n n;h n X
i
(B.12)
The factor between brackets is equal to h;1 (;1) , which we will show below, such that ! n ah;1 = h ; 1 (; )n;(h;1) an: (B.13) By induction we prove eq. (B.8) for all 0 h < n. Now we only need to prove the following.
131
B.2. Proof of theorem 5.2
Lemma
For positive integer n and positive integer k n, n X i=k
(;
1)n;i
! ! ! n = n (;1)n;k : k;1 i k;1 i
(B.14)
Proof
If we introduce new variables j = i ; k and m = n ; k, then n X i=k
(;
1)n;i
! ! ! ! X n = m (;1)m;j j + k m + k k;1 j+k k;1 i j =0 ab aa;c (since b c = c b;c ) ! !X m m + 1 m + k j m (;1) j + 1 = (;1) k ; 1 j =0 ! 3 ! 2mX +1 + 1 n m j 5 4 = ;(;1)n;k k ; 1 j=0 (;1) j ; 1 (using the binomial theorem for (a + b)m+1 with a = ;1 and b =! 1) n (0 ; 1) = ;(;1)n;k k ; 1 ! n (;1)n;k = k; 1 i
which completes the proof.
B.2 Proof of theorem 5.2
Theorem
Using the equivalence of d i](k) and ak i], we have to show that, if 8i : dd0 ii](](tt ii])]) = ;( d n; t i]) where n nX ;1 X d i](t i]) = aj t i]j and d0 i](t i]) = (j + 1)aj+1t i]j then
j =0
ak = (; )n;k;1 1 n d an;1 n k
!
j =0
for 0 k < n.
(B.15) (B.16) (B.17)
132
Appendix B. Proofs for chapter 5
Proof
Substituting (B.16) in (B.15) we get
8i :
n X
j =0
naj t i] = (t i] ; d ) =
nX ;1 j =0
= t i] such that
8i : or
na0 +
8i :
n X j =1
naj t i]j ;
nX ;1 j =0
(j + 1)aj+1t i]j
(j + 1)aj+1t i]j+1 ; nX ;1 j =0
nX ;1 j =0
(j + 1)aj+1t i]j ; d
n X j =1
jaj t i]j + d
d (j + 1)aj+1t i]j nX ;1 j =0
(j + 1)aj+1t i]j
nX ;1 j =0
(j + 1)aj+1t i]j = 0
(B.18)
na0 + (nan ; nan)t i]n nX ;1 + (n ; j )aj + (j + 1) daj+1] t i]j + d a1 t i]0 = 0: j =1
Therefore, it follows that 8j 0 j < n : (n ; j )aj = ; d(j + 1)aj+1 which can be written as 8j 0 j < n : aaj+1j = ; dn(j;+j1) : This solution is equivalent to (B.17), which proofs the theorem.
(B.19) (B.20)
B.3 Least-mean-square minimisation In order to nd the parameters i from the measurements (t), least-mean-square minimisation is used (Press et al., 1986). Let us de ne the a quantity that has to be minimised, the chi-square merit function
0 1 N (ti ) ; Pn j tj 2 X j =0 i A :
2 = @ (B.21) i i=1 Here, N is the number of observations (t) that are available. As usual, i is the standard deviation in (ti). Let also be de ned the design matrix j t Dij = i (B.22) i
133
B.3. Least-mean-square minimisation
and a vector b
bi = ( ti) :
(B.23)
i
We are looking for minimum values of the eq. (B.21) for j . The minimum for 2 is reached when the rst derivatives vanish:
2 3 n N 1 X X j 0 = 2 4 (ti) ; j ti 5 tki j =0 i=1 i
or
n X j =0
k = 0 : : : n
pkj j = qk
(B.24) (B.25)
which are known as the normal equations for the least-squares problem. Here, N tj +k N tj tk X X i i pkj = 2 = i 2 i=1 i i=1 i
and
qk =
N (t )tk X i i 2 i=1 i
or p] = DT D or q] = DT b:
(B.26) (B.27)
Thus, a solution to eq. (B.24) can be found by solving the set of equations (DT D) = DT b:
(B.28)
Propagation of the error in (t) to We de ne Cjk p];jk1, such that
j =
n X k=0
Cjk
N (t )tk X i i 2 : i=1 i
(B.29)
Assuming independency in the noise between subsequent measurements,
2(
j) =
while also Therefore, it can be derived that
N X i=1
i2
@j @ (ti)
!
n C tk @j = X jk i @ (ti ) k=0 i2 :
2 (j ) =
"X N
k tl # t i i : CjkCjl 2 i=0 i k=0 l=1 n X N X
(B.30) (B.31)
(B.32)
134
Appendix B. Proofs for chapter 5
The factor between brackets is equal to p]kl , while also Cjk p];jk1 , such that
2 (j ) = Cjj :
(B.33)
Consequently, given the variances in the measurements, we can calculate what the variances in the estimated motion parameters are. The matrix C is known as the covariance matrix of the parameters i, and gives the covariances of the parameters that result from the parameter estimation method. In order to get an idea of the values of 2 (i), let us for the moment assume that all the measurements have the same variance 2 , which is approximately true if the measurements are all close to each other in time. In this case the design matrix p] can be written as N X N pjk = 2 tji +k : i=1
(B.34)
We thus see that the elements of C , the inverse of p], have a factor 2=N in common. As an example, consider the case where the measurements are tted by a second order polynomial. Then, 0 1 p0 p1 p2 p] = B (B.35) @ p1 p2 p3 CA p2 p3 p4 N j X where pj = ti2 . Then i=1 i
C = p];1 = 0 (;p32 + 2p1p2p3 ; p0 p23 ; p21p4 + p0p2 p4);11 2 + p p p p ; p p ;p2 + p p ; p B@ p2p33 ; p21p44 ;2p23 + p01p44 p1p22 ; p10 p33 CA : (B.36) 2 2 2 ;p2 + p1p3 p1p2 ; p0p3 ;p1 + p0p2 measurements have the same deviation, i.e., i = for Now, if it is assumed that all 1 i N , then pj = 12 PNi=1 tji . This assumption is reasonable since, in the case of estimating the parameters of the trajectory described by the observed object, the measurements over which the t is made will be close to each other (in time and space), such that their variances will not dier too much. Secondly, let us assume that the measurements are .quidistant and scaled between 0 and 1, such that ti = i=N . In that case N X 1 p0 = 2 1 = N2 (B.37a) i=1 N X p1 = 12 Ni = N2 +2 1 (B.37b) i=1 N i 2 (N + 1)(2N + 1) X 1 p2 = 2 (B.37c) = 6N 2 i=1 N
B.4. Nonlinear transform of a noisy signal
135
N i 3 (N + 1)2 X 1 = 4N 2 (B.37d) p3 = 2 i=1 N N i 4 (N + 1)(2N + 1) N (N +1) ; 1 X 1 2 6 p4 = 2 = : (B.37e) 3 2 N 5 N i=1 The standard deviations of the i are the diagonal elements of the matrix C. By substituting the p0 : : : p4 in (B.36), we obtain 2 2 2) 2 (0) = 3 N ((3NN; +1)(3NN ;+ 2) (B.38a) 2 N (8N + 11)(2N + 1) 2 (B.38b) (1) = 12 (N 2 ; 1)(N 2 ; 4) 3 2 2 (2) = 180 (N 2 ;N1)( N 2 ; 4) : (B.38c)
Computational requirements
For inversion of the matrix D, Gauss-Jordan elimination can be used, for a total cost of N 3 operations (Press et al., 1986). If the covariance matrix C is not needed, one can switch to LU decomposition, for a factor 3 gain in speed. Furthermore, realisation that the matrix p] is a Toeplitz matrix brings the operation count back to N (N ; 1).
B.4 Nonlinear transform of a noisy signal In the previous section, it was assumed that the measurements (t) are normally distributed. How reasonable is this assumption? After all, by tting measurements z (t) instead of the measured area A(t), the noise in the original measurement is also transformed. Recall that z (t) is de ned as z (t) q 1 : (B.39) A (t) In general, the probability distribution P (z ) of z = f (A) can be derived from the probability distribution P (A) of A as Z P (z ) = (z ; f (A))P (A)dA A
df = P f ;1(z ) d f ;1(z ) : z
As can be seen from measurements shown in gure 3.4, it is reasonable to assume a normal distribution for A, such that the probability distribution of A is ( )2 ; A 2A 1 2A : (B.40) P (A) = q 2 e 2 A ;
136
Appendix B. Proofs for chapter 5
The resulting distribution for z is then ((1=z ) A ) ; P (z ) = 3q2 2 e 2A2 z 2 A q = 2A AP (A): 2
;
2
This is not a normal distribution anymore. With knowledge of the standard deviation of A , however, it can be shown to be approximately normal. If the rst three terms of the Taylor series for z (t) are considered,
z (t) q1 + A
1q
2A A
jA ; Aj +
3q ( ; )2: A A 2
4A A
(B.41)
If the second term of this Taylor series is `suciently small' in comparison with the rst, then z can be approximated by a linear qfunction of A, such that qz is indeed approximately normal distributed with mean 1= Aqand variance A2 =2(A A). From jA ; Aj 0:0036 A, such that the quadratic term section 3.1.2 it is known that p in (B.41) is approximately 150 e times smaller than the linear term. With a typical circumference of 200 pixels, the quadratic term will be over a factor 25,000 smaller, and can be safely ignored.
B.5 Robust parameter estimation methods If the order of the Taylor approximation in chapter 5 is large, then the normal equations tend to become rather instable, especially for large ti . A rst remedy for this problem is rescaling ti such that all values are bounded within (;1 1) and higher powers of ti do not explode (Hayes, 1970). However, this remedy alone does not suce. Why this is so can be seen as follows. In the general case, we are trying to t data points yi = y(xi), for 1 i N . If the xi are assumed to be uniformly space on the interval (0 1), then the elements pjk look like N X i=1
xji xki
N
Z1 0
xj xk
=N
Z1 0
xj+k = j + Nk + 1 :
(B.42)
The matrix p], written like this, is the principal minor of order n +1 of the in nite Hilbert matrix 2 1 1 1 3 66 1 21 31 77 66 21 31 41 77 (B.43) 5 4 3. 4. 5. .. .. .. It has been observed that minors of this matrix are very dicult to invert.
137
B.5. Robust parameter estimation methods
We can solve the ill-conditioned nature of the problem by using orthogonal polynomials. Let us generalise the above discussion for polynomials in ti to polynomials pi(t). These polynomials are called orthogonal when
Z1
;1
pi(x)pj (x)dx = cij
(B.44)
where ij is the Kronecker delta. When the constant c equals unity the polynomials are called orthonormal. However, instead of having the polynomials orthogonal over the whole input space, we would rather have them orthogonal over our data set. In that case, all elements in (B.28) except those on the diagonal will vanish, and the matrix inversion will reduce to a simple scalar division. We will discuss a method after Forsythe (Forsythe, 1957). Let us de ne polynomials pi(x) using a Gram three-term recurrence relation
p;1(x) = 0 p0(x) = 1 pi+1(x) = (x ; ai+1)pi (x) ; bipi;1 (x): We are given a set of measurements (xk yk ), and want to minimise
Pn c p (y ) !2 X y ; k i=0 i i k
2n = : k k
(B.45)
The nite orthogonality for the polynomials then means that
X k
Let us de ne and
pi(xk )pj (xk ) = 0 if i 6= j: wij =
(B.46)
X pi(xk )pj (xk ) k2 k
(B.47)
X yk pi(xk ) k2 k
(B.48)
!i =
such that the set of equations that has to be solved reduces to w00 c0 = !0 w11c1 = !1 ... ... wncn = !n: It can be easily shown that, when we set
P x p (x )2 = 2 ai+1 = Pk pk (ix )k2 = 2 k k i k
k
(B.49)
P p (x )2= 2 bi = P kp i (xk )2= k 2 k i;1 k
k
(B.50)
138
Appendix B. Proofs for chapter 5
the orthogonality is satis ed. Since we assume the polynomials are orthogonal, n n X yk 2 X X yk ph(xk ) X X ph(xk )pi(xk ) 2
n = ; 2 c + c c h h i k2 k2 k k h=0 k hi=0 k n n X X yk 2 X ; 2 c c2hwhh = h !h + k h=0 h=0 k n X yk 2 X = ; c2hwhh: k k h=0 Therefore, 2 adheres to the recurrence relation
2n = 2n;1 ; wnnc2n
(B.51)
and the polynomials can be added incrementally. To determine the optimal degree of the t, we consider the residual mean square 2n=(N ; n ; 1). This quantity will initially decrease considerably as n increases, but will stabilise at a certain degree of polynomials. The optimal stopping criterion is determined by experimentation. From the parameters ci , ai+1 , and bi we can reconstruct the parameters i.
Appendix C
The OSCAR Robot System C.1 OSCAR mechanical structure
link 2 link 3
manipulator cylinder link 1
Figure C.1: The OSCAR robot.
The OSCAR robot, depicted in gure C.1, is a Philips-built antropomorphic robot with six degrees of freedom, each of which consists of a rotational joint. The rst three joints inuence the position of the gripper joints the last three, which have intersecting axes of rotation, implement a pitch-, yaw-, and roll-angle. These three joints are used to set the orientation of the gripper. A schematic representation of the OSCAR robot can be found in gure 3.7 (page 43). 139
140
Appendix C. The OSCAR Robot System
In order to increase the stiness of the robot, each of the joints 1{5 is driven by two DC motors. The gravitational force on the arm is compensated by two air-pressured cylinders.
C.2 Hardware structure The robot is controlled via a VME-based computer architecture. The eectively used hardware consists of a Force-30 processor card, running the VxWorks real-time operating system on a 68030 microprocessor a DataCube MaxVideo-20 vision card a Philips bus master
in-house A/D and D/A converters Philips voltage ampli ers.
C.3 Software structure Apart from the VxWorks operating system and the MaxVideo-20 libraries, all the software is home-brewed.
C.3.1 OSCAR controller
The controller at the dynamic level for the OSCAR robot is a PID controller. Originally this software was derived from a Philips product, but the only parts that survive from that version are some magic constants. The controller consists of two parts: a trajectory controller and a P(I)D controller. An important feature of the current setup of this software is that the trajectory computation must be interruptable. That is to say, when a new setpoint arrives over the communication channel, the current motion of the robot arm must be changed to move towards that setpoint (see also gure 4.1 on page 57). This is required to support closed loop control.
The trajectory controller When a new setpoint is received by the software, the trajectory controller computes the required acceleration, time for acceleration, maximal velocity, time for deceleration, and deceleration for each joint. In fact, the current version of the software only supports a constant acceleration and deceleration time MAX AC DEC TIME, and computes the trajectory with that constant and the user-speci ed maximal velocity. This computation is performed in the function process endp().
C.3. Software structure
141
The clock of the Force-30 card is set to generate an interrupt every 12.5ms. When such an interrupt is received, the next via-point is computed by updating the current joint position, velocity, and acceleration for all the joints in the function next setp().
The P(I)D controller This via-point, which is the desired joint position projected 12.5ms in the future, is translated by a P(I)D controller to a voltage by adding the Proportional and Derivative parts together, and mixing in the Integral part depending on the velocity of the joint. For a velocity 0, the integral part is included with a factor 1 for higher velocities, this factor is smaller. This scheme is used to prevent oscillations during motion.
C.3.2 Vision software Every 40ms an image I is grabbed from the camera's CCD. During acquisition of I i +1], the image processing system has time to extract features from image I i]. By keeping the visual scene suciently simple, the required features can be extracted from the approximately 480 512 pixels of 8 bits width within 40ms by using a DataCube MaxVideo-20 image processing board as follows. The original image is thresholded and subsequently subsampled until a 16 16 grey-level image is obtained and sent from the MaxVideo-20
image processing board to the Force-30 processor card. From this image the centre of mass of the pixels is determined and the number of white pixels in the original image is reconstructed. This renders the desired position and area of (what is supposed to be) the single white blob against a black background. Details of this operation are described in (Bartholomeus, 1993).
C.3.3 Putting the parts together
The whole thing is used as follows. A workstation is connected to the Force-30 processor card via (currently) an ethernet connection. Socket communication is used to convey setpoints, computed by the neural controller, to the trajectory generator. Furthermore, the current joint values of the robot as well as the observed object position are communicated by the Force-30 card to the workstation. In the meantime, images are sent over the VME-bus from the MaxVideo-20 card to the Force-30 card the latter uses the same VME-bus to send joint voltages to the D/A controller. Furthermore, the workstation running the neural controller (`connel') also has a socket connection to another workstation running a learning process (`lummel'). This setup has been shown in gure 4.4 on page 4.4.
142
Appendix C. The OSCAR Robot System
Appendix D
OSCAR inverse kinematics For the robot kinematics the following equations are used. They calculate the position, velocity, and acceleration of the end-eector. In this appendix we the following notation: l1, l2, and l3 are the lengths of links 1, 2, and 3 written as b1 , e2 , and b4 in the DenavitHartenberg notation. px, py , And pz indicate the position of the robot's end-eector with respect to the robot base. The following is derived: px = (l2 cos 2 + l3 sin 3 ) cos 1 (D.1) py = (l2 cos 2 + l3 sin 3 ) sin 1 (D.2) pz = lh1 ; l2 sin 2 ; l3 cos 3 i (D.3) p_x = ;l2 sin 2 _2 + l3 cos 3 _3 cos 1 ;h l2 cos 2 + l3 sin 3] sin i1_1 (D.4) p_y = ;l2 sin 2 _2 + l3 cos 3 _3 sin 1 + l2 cos 2 + l3 sin 3 ] cos 1 _1 (D.5) p_z = ;l2 cos 2 _2 + l3 sin 3 _3 (D.6) 2 2 px = ;l2 cos 2 _2 ; l2 sin 2 2 ; l3 sin 3 _3 cos 1 +l3 cos 3 3 cos 1 ; 2 l3 cos 3 _3 sin 1 _1 ;l3 sin 3 cos 1_12 ; l3 sin 3 sin 11 (D.7) py = ;l2 cos 2 _22 ; l2 sin 2 2 ; l3 sin 3 _32 sin 1 +l3 cos 3 3 sin 1 + 2 l3 cos 3_3 cos 1 _1 (D.8) ;l3 sin 3 sin 1_12 + l3 sin 3 cos 11 (D.9) 2 2 pz = l2 sin 2 _2 ; l2 cos 2 2 + l3 cos 3 _3 + l3 sin 33 (D.10) The measurements at time t are pxpx=p_2x, py py =p_y2 , pz pz =p_2z , 1 , 2 , 3 , _1 , _2 , _3 , 1 , 2, and 3 . The objective is to calculate a 10 , 20 , and 30 such that px0 p0x=p_0x2 = 1=2, py 0p0y =p_0y 2 = 1=2, and pz 0p0z =p_0z 2 = 1=2. Suppose a 10 , 20 , and 30 are sent to the robot, then at the next iteration the joint values and velocities are (for simderella) 10 = 1 + _1 + 21 10 _10 = _1 + 10 (D.11) 143
144
Appendix D. OSCAR inverse kinematics
(D.12) 20 = 2 + _2 + 12 20 _20 = _2 + 20 0 0 0 0 1 (D.13) 3 = 3 + _3 + 2 3 _3 = _3 + 3 : The correct way of nding the desired 10 , 20 , and 30 is by substituting the above equations in (D.1){(D.3), and inverting the result. However, this inversion is not feasible. Instead, equations (D.1){(D.3) are inverted:
1 = arctan2 (py px) 0 2 2 2 2 21 ! j p yj 2 = ; arctan jp j ; arccos @ px +qpy 2+ pz 2+ l2 ;2 l3 A x 2 l2 px + py + pz 0 1 ! 2 + p 2 + p 2 + l2 ; l2 p j p j 3 = ; arctan jpy j ; arccos @ x qy 2 z 2 2 2 3 A x 2 l2 px + py + pz 2 + l2 ; p 2 ; p 2 ; p 2 ! l 3 x y z 2 3 + ; arccos 2 2l l 2 3
(D.14) (D.15)
(D.16)
where one of the two solutions for 3 is the preferred one and shown above. Equations (D.14){(D.16) are dierentiated giving the desired results:
!
!
2 2 ;1 1 = py ; 2 (p_y )2 p_x + 2py p3_x ; py p2x 1 + py 2 ; px px px px px ! ! ! p_y ; py p_x 2py p_y ; 2py 2 p_x 1 + py 2 ;2 px px2 px2 px3 px2
! 2 p p 2 j 2 j j p j p j p _ p _ j p _ j p j p y x y y x y y y x 2 = ; p jp j ; p jp j p + jp j p 2 ; jp j p y x y x x x x x x ! 2 2 ! p 2 j p j y j p_y 2 jpy j p_x y j p_y jpy j p_x + p jp j ; jp j p jpxj2 py ; jpxj2 px y x x x
2! 1 j p yj 1+ 2 jpxj ! 2 2 j p yj 1+ 2 jpxj
0 ; @ 2p2xl p_qx p+ 22p+y pp_y 2++2ppz2p_z 2 x y z ! 2 2 2 2 ( p x + py + pz + l2 ; l32 ) (2pxp_x + 2py p_y + 2pz p_z ) ; 4l2 (px2 + py 2 + pz 2 )3=2 2 2 2 2 2 ; (px + py + pz 2l+22 (lp2x;2 +l3 )p(2y 2 p+xp_pxz 2+) 2py p_y + 2pz p_z ) ! 2 ( p x 2 + py 2 + pz 2 + l22 ; l32 ) (2pxp_x + 2py p_y + 2pz p_z ) + 4l22 (px2 + py 2 + pz 2 )2 2 2 2 2 2 2 !;3=2 ( p x + py + pz + l2 ; l3 ) 1=2 1 ; 4l2 (p 2 + p 2 + p 2) y z 2 x
;
;
145
0 2 2p_ + 2pxpx +q2p_2y + 2py py + 2p_2z + 2pz pz +@ x 2l2 px2 + py 2 + pz 2 2 (2 p xp_x + 2py p_y + 2pz p_z ) ; 2l2 (px2 + py 2 + pz2)3=2 2 2 2 2 2 2 + (3px + 3py + 3pz + 3l22 ; 3l23 ) (2px2 p_5x=2+ 2py p_y + 2pz p_z ) 8l2 (px + py + pz ) 1 (px2 + py 2 + pz 2 + l22 ; l32 ) 2p_2x + 2pxpx + 2p_2y + 2py py + 2p_2z + 2pz pz A ; 4l2 (px2 + py 2 + pz 2)3=2 1 s 2 1 ; (px4l+22 (ppyx2++ppzy 2++lp2z;2l)3 ) 2
2
2
2
2
!
!
2 ;1 2 3 = ; jpy j py ; 2 jpy j p_y p_x + 2 jpy j p_2x ; jpy j px 1 + jpy j2 py jpxj py jpxj px jpxj px jpxj px jpxj ! ! ! 2 2 ;2 2 + pjpyjjpp_yj ; jjppy jj pp_x 2 jpy2j p_y ; 2 jpy2j p_x 1 + jpy j2 jpxj py jpxj px jpxj x x 0y x ; @ 2p2xl p_qx p+ 22p+y pp_y 2++2ppz2p_z 2 x y z ! p ( x 2 + py 2 + pz 2 + l22 ; l32 ) (2px p_x + 2py p_y + 2pz p_z ) ; 4l2 (px2 + py 2 + pz 2)3=2 2 2 2 2 2 ; (px + py + pz 2l+22 (lp2x;2 +l3 )p(2y 2 p+xp_pxz 2+) 2py p_y + 2pz p_z ) ! 2 ( p x 2 + py 2 + pz 2 + l22 ; l32 ) (2pxp_x + 2py p_y + 2pz p_z ) + 4l22 (px2 + py 2 + pz 2 )2 2 !;3=2 ( p x 2 + py 2 + pz 2 + l22 ; l32 ) 1=2 1 ; 4l22 (px2 + py 2 + pz 2) 0 2 2p_ + 2pxpx +q2p_2y + 2py py + 2p_2z + 2pz pz +@ x 2l2 px2 + py 2 + pz 2 2 ; (22pl2xp(_pxx+2 +2ppyyp_2y ++p2zp2z)p3_=z2) 2 2 2 2 2 2 + (3px + 3py + 3pz + 3l22 ; 3l23 ) (2px2 p_5x=2+ 2py p_y + 2pz p_z ) 8l2 (px +py + pz ) 1 (px2 + py 2 + pz 2 + l22 ; l32 ) 2p_2x + 2pxpx + 2p_2y + 2py py + 2p_2z + 2pz pz A ; 4l2 (px2 + py 2 + pz 2)3=2
146
Appendix D. OSCAR inverse kinematics
s
1
1 ; (px4l+22 (ppyx2++ppzy 2++lp2z;2l)3 ) 2
2
2
2
2 2
+ (;2pxp_x ; 2py p_y ; 2pz p_z )2 l22 + l32 ; px2 ; py 2 ; pz 2
!;3=2
2 + l2 ; p 2 ; p 2 ; p 2 )2 ( l x y z 2 3 l2;3 l3;3 1=8 1 ; 4l22l32 + ;2p_2x ; 2pxpx ; 2p_2y ; 2py py ; 2p_2z ; 2pz pz 1 l;1l3;1 : 1=2 s 2 2 2 2 2 2 2 1 ; (l2 +l3 ;px 2;2py ;pz ) 4l2 l3
Note that this is not the exact solution as required! In this solution, the exact inverse kinematics for joints 1, 2, and 3 at time t is computed, and no prediction for the desired joint position will ensue. However, due to small stepsize this approach will still give a useful result. By measuring px and p_x, then px = p_2x=2px (and similarly for py and pz ) can be combined with the derivatives of (D.14), (D.15), and (D.16) to nd a near-optimal solution. This solution is used in the experiments.
Appendix E
Simderella E.1 Introduction Simderella is a general purpose Public Domain simulator of robot manipulator kinematics. It was developed over the course of several years to aid my research in neural networks for robot arm control. Simderella is intended to be a tool for speeding development of controllers for robot arms. Computational eciency is essential for controllers, and can be best evaluated by separating controller and simulator processes. Such a separation facilitates code modularity and code migration to extant platforms connected to real robot arms. The operation of the separate processes is synchronised by message passing. Under Unix, sockets are the most widely supported method for interprocess communication. For example, shell pipes are implemented using sockets. The three programs which constitute Simderella: connel, simmel, and bemmel, communicate over sockets. Connel is the controller, and should be programmed by the neural network experimenter. Simmel is the actual robot simulator, and calculates (among other things) the forward kinematics. Bemmel is an X-window based program for robot visualisation. In the following sections, the functionality of connel, simmel, and bemmel are explained in ner detail.
E.2 Simulator Organisation The communication channels between connel, simmel, bemmel, and the user/experimenter world are depicted in gure E.1. Each program is self-contained and interacts with the other programs by passing messages back and forth. Since sockets are used for communication, the programs can run on dierent computers when connected via an ethernet.
E.2.1 Connel
Due to the exible communication setup, connel can control multiple simmels and/or real robots simultaneously, if necessary. Note however that while outgoing data from connel is ()
This text was previously published in Neurocomputing (van der Smagt, 1994b).
147
148
Appendix E. Simderella bemmel
connel
simmel
Figure E.1: Communication channels in simderella.
broadcast to all connected robots (real and/or simulated), incoming data is only requested from the robot or simmel which rst established a socket connection with connel. The inverse kinematics for a Philips OSCAR{6 anthropomorphic arm are included in the simderella distribution since this is the robot arm I use for my research. These algorithms (e.g., (van der Smagt et al., 1993 Jansen et al., 1995 van der Smagt, Krose, & Groen, 1992b)). were developed initially with simmel and subsequently ported to control the real OSCAR{6 with only minor (mostly timing and error handling related) adaptations. For example, in the research reported in (van der Smagt et al., 1993) the following algorithm is the heart of the neuro-controller, connel: 1. call get joint values() and get camera() (see below): two messages are being sent to simmel which after appropriate computation responds by communicating the relevant data (current joint values and the object position x as seen by the hand-held camera) to connel, the neural controller 2. propagate the and x through the neural network, which computes a joint rotation ( for the robot arm 3. call send joint values() to send this joint rotation to simmel 4. from the data obtained at this and the previous iteration, calculate a learning sample, and update the neural network. In fact for reasons of computational economy, the neural network was split into two communicating processes: a controller process computing a forward pass through the neural network and a learning process performing conjugate gradient learning. The two processes communicated learning samples and updated network weight matrices using an interrupt-driven protocol via a mutual socket. For the research in (van der Smagt et al., 1992b), not only the observed object position is required, but also the apparent velocity and acceleration of the object due to motion of the (hand-held) camera. Note that the neural network code is not included in the distributed version of simderella.
E.2.2 Simmel
Simmel, the core of the package, is a simulator for any robot with rotational and/or translational joints. The robot is described in a Denavit-Hartenberg (DH) matrix which
149
E.2. Simulator Organisation
is read in at startup. The assignment of frames to links is chosen in accordance with (Fu et al., 1987) (see gure E.2). y joint 3
3
x3 x 2 z 2 y joint 2
2 180−θ3
link 2
z
−θ 2
1
x
1
y
joint 4
joint 5
z3 link 3
z y
4
x4 x
4
z
y
6
link 1
z
y0
5
5
1
0
y
x
θ5 6
6
joint 6
z
5
i 1 2 3 4 5 6
DH-matrix for the robot. i i () ei(cm) bi(cm)
1 2 30 4 5 6
;90
0 90 ;90 90 0
0 51 0 0 0 0
46 0 0 50 0 18
joint 1
x0 link 0 (virtual)
Figure E.2: Assigning frames to OSCAR.
During normal operation, target joint values d , velocities _d , and accelerations d (6 of each for a 6 degree-of-freedom robot) are received from the robot socket connected to the controller. The values d , _d , and d specify the new path. This new path results, at the following time slot, in setting the current acceleration to d. The acceleration remains constant until either _ = _d or = d . In the former case, is set to 0 until = d . In the latter, both _ and are set to 0|the robot stops. Note that deceleration is not automatically provided by simmel. The user must give the correct value for d , and a 0 for _d to implement this. In its current structure, simmel simulates a robot manipulator with a hand-held camera, but also the projections of two xed cameras are computed. At each time step, the current position of the robot is translated into a set of homogeneous matrices describing the position and orientation of the links in Cartesian space. From the DH-matrix of the last link, the position, acceleration, and velocity of the last link in Cartesian domain relative to the object are calculated and available to the controller on request likewise for the observed motion of the object, which is dependent of the rotation of the hand-held camera. For each bemmel which is connected, the DH-matrices are then sent to those bemmels, followed by a matrix describing the position of the target object.
E.2.3 Bemmel
Bemmel is a program responsible for graphically representing the state of the robot manipulator. At startup, bemmel reads the robot description le. This le contains, for each link of the robot, a list of points and the connections between the points. A point is given
150
Appendix E. Simderella
by its (x y z) coordinate in the frame of the link. E.g., a joint consisting of a cube which has its coordinate frame in its centre and sides of length 4 would be described as 1: 2 2 2: -2 2 3: 2 -2 4: -2 -2
2 2 2 2
5: 2 2 6: 2 -2 7: -2 2 8: -2 -2
-2 -2 -2 -2
1 1 2 3
2 3 4 4
5 5 6 7
6 7 8 8
1 2 3 4
5 6 7 8
In the simderella distribution, there are two extra `links': the world, and the object. For each of the links described in the robot description le (except for the rst, `world', one), bemmel expects to receive a homogeneous matrix from simmel. Each of these matrices describes the position and orientation of the corresponding link with respect to the world coordinates. In eect, the only thing that bemmel does is place the objects in its description le on the canvas as described by the (continuous ow of) homogeneous matrices. Thus, bemmel can also be used as a stand-alone program for 3D object placement. In addition, bemmel has a user interface for object rotation, translation, scaling, and for picture dumping (in g or pic format). In order to present the user with a clearer picture, bemmel can also render a z-projection of all shown objects|i.e., a shadow on the oor. Bemmel could be replaced by fancier graphical representations utilising hidden line removal, illumination, etc. However, since this is tangential to issues in neurocontrol and would reduce computational eciency, it has been decided not to pursue this path. The robot pictures in gures E.1 and E.2 are directly created from bemmel output.
E.3 Availability The software can be obtained in various ways: 1. from the ftp site of the Dutch Foundation for Neural Networks galba.mbfys.kun.nl (ip 131.174.82.73) directory pub/neuro-software/pd 2. from the CMU Arti cial Intelligence Repository ftp.cs.cmu.edu (ip 128.2.206.173) directory user/ai/software/testbeds/simderella/
3. from the `Prime Time Freeware for AI, Issue 1{1' (`Prime Time AI' for short) CD ROM, edited by Mark Kantrowitz. Followup versions will be announced on the comp.robotics newsgroup. Permission is granted to use, change, and redistribute the software, in agreement with the distribution notice included. All matters arising from simderella should be addressed to the author of this article.
E.4. Acknowledgements
151
E.4 Acknowledgements All software was written with nancial support of the Dutch Foundation for Neural Networks. The basic idea behind the simulator was inspired by a robot simulation package developed by F. J. Jungen at the Department of Computer Systems at the Vrije Universiteit at Amsterdam. Conor Doherty from Aston University, Birmingham helped considerably to debug the documentation. The inverse kinematics controller for the simulated OSCAR robot has been implemented by Arjen Jansen, then at the University of Amsterdam. Other people used simderella and suggested where the software or documentation lacked clarity. Ports to systems other than SUN's and SGI's were instigated and supported by many people from the computing community. Among others, Floyd Moore provided the port for the HP700, while Scott Heavner from Case Western Reserve University provided for corrections for compilation under (POSIX and) Linux. Simderella is in use by various researchers, not all of them in neurocontrol. For example, it has been used at CMU to facilitate software development of the Shuttle servicing robot before the hardware and mechanics are available to test the various parts of the controller. Ports to RS/6000 and Dec Alpha, as well as to PC/DOS are on their way an addition of robot arm dynamics is being investigated.
152
Appendix E. Simderella
Bibliography Aarts, E., & Korst, J. (1989). Simulated Annealing and Boltzmann Machines. John Wiley & Sons. Cited on pp. 18, 121.] Albus, J. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement and Control, Transactions of the ASME, series G, 97, 270{277. Cited on p. 2.] , Astrom, K. J., & Wittenmark, B. (1989). Adaptive Control. Addison-Wesley Publishing Company. Cited on p. 46.] Ballard, D. H., & Brown, C. M. (1982). Computer Vision. Prentice Hall, Inc. Cited on pp. 34, 40.] Barnard, E. (1992). Optimization for training neural nets. IEEE Transactions on Neural Networks, 3(2), 232{240. Cited on p. 10.] Barron, A. R. (1991). Approximation and estimation bounds for arti cial neural networks. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pp. 243{249. Cited on pp. 10, 17, 97, 121.] Bartholomeus, M. (1993). Vision. Tech. rep. Department of Computer Systems, University of Amsterdam. Cited on pp. 36, 141.] Bartholomeus, M. G. P., Krose, B. J. A., & Noest, A. J. (1993). A robust multi-resolution vision system for target tracking with a moving camera. In Wijshof, H. (Ed.), Proceedings of CSN93, Computer Science in the Netherlands, pp. 52{63. Cited on p. 122.] Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton's method. Neural Computation, 4, 141{166. Cited on pp. 10, 20.] Beale, E. M. L. (1972). A derivation of conjugate gradients. In Lootsma, F. A. (Ed.), Numerical Methods for Nonlinear Optimization. Academic Press, London. Cited on p. 24.] Bekey, G. A., & Goldberg, K. Y. (Eds.). (1993). Neural Networks in Robotics. Kluwer Academic Publishers. Cited on p. 2.] 153
154
Bibliography
Bellman, R., & Kalaba, R. (1959). On adaptive control processes. IRE Transactions on Automatic Control, 4, 1{9. Cited on p. 46.] Bromberg, M., & Chang, T.-S. (1992). One dimensional global optimization using linear lower bounds. In Floudas, C. A., & Pardalos, P. M. (Eds.), Recent Advances in Global Optimization, pp. 200{220. Princeton University Press. Cited on p. 18.] Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Robotics and Automation, 2. Cited on p. 32.] Chen, A. M., & Hecht-Nielsen, R. (1989). On the geometry of feed-forward neural network weight spaces. In Proceedings of the Second IEEE International Conference on Neural Networks, pp. 1{4. Cited on p. 17.] Chow, C. K., & Kaneko, T. (1977). Automatic boundary detection of the left ventricle from cineangiograms. Computers and Biomedical Research, 5(4), 288{410. Cited on p. 40.] Cipolla, R., & Blake, A. (1992). Surface orientation and time to contact from image divergence and deformation. In Sandini, G. (Ed.), Computer Vision|ECCV '92, pp. 187{202. Springer-Verlag. Cited on p. 76.] Connell, J. H., & Mahadevan, S. (Eds.). (1993). Robot Learning. Kluwer Academic Publishers. Cited on p. 2.] Craig, J. J. (1986). Introduction to Robotics. Addison-Wesley Publishing Company, Reading, MA. Cited on pp. 3, 41, 44, 107.] Cun, Y. L. (1985). Une procedure d'apprentissage pour reseau a seuil assymetrique. Proceedings of Cognitiva, 85, 599{604. Cited on p. 10.] Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303{314. Cited on p. 6.] Fahlman, S. E. (1988). An empirical study of learning speed in back-propagation networks. Tech. rep. CMU{CS{88-0-162, Carnegie Mellon University. Cited on p. 25.] Feldbaum, A. A. (1965). Optimal Control Systems. Academic Press. Cited on p. 46.] Forsythe, G. E. (1957). Generation and use of orthogonal polynomials for data- tting with a digital computer. J. Soc. Indust. Appl. Math., 5(2). Cited on pp. 13, 114, 137.] Fu, K. S., Gonzalez, R. C., & Lee, C. S. G. (1987). Robotics: Control, Sensing, Vision, and Intelligence. McGraw-Hill Book Company, New York. Cited on pp. 3, 4, 34, 35, 41, 44, 149.] Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3), 193{192. Cited on p. 6.]
Bibliography
155
Gavrila, D. M., & Groen, F. C. A. (1992). 3D object recognition from 2D images using geometric hashing. Pattern Recognition Letters, 13, 263{278. Cited on p. 34.] Gibson, J. J. (1950). The perception of the visual world. Houghton Mi/in. Cited on p. 75.] Gorse, D., & Shepherd, A. (1992). Adding stochastic search to conjugate gradient algorithms. In Proc. 3rd International Conference on Parallel Applications in Statistics and Economics Prague. Cited on p. 30.] Hayes, J. G. (1970). Curve tting by polynomials in one variable. In Hayes, J. G. (Ed.), Numerical Approximation to Functions and Data. University of London, The Athlone Press, London. Cited on pp. 114, 136.] Hecht, E., & Zajac, A. (1974). Optics. Addison-Wesley Publishing Company. Cited on p. 35.] Hesselroth, T., Sarkar, K., van der Smagt, P., & Schulten, K. (1992). Neural network control of a pneumatic robot arm. Tech. rep. UIUC{BI{TB{92{15, Theoretical Biophysics, University of Illinois at Urbana/Champaign. Cited on p. 104.] Hesselroth, T., Sarkar, K., van der Smagt, P., & Schulten, K. (1994). Neural network control of a pneumatic robot arm. IEEE Transactions on Systems, Man, and Cybernetics, 24(1), 28{38. Cited on pp. 54, 103, 119, 124.] Holownia, B. P. (1977). Temperature buildup in bonded rubber blocks due to hysteresis. Rubber Chemistry and Technology, 50(1), 186{193. Cited on p. 112.] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359{366. Cited on pp. 6, 14.] Jansen, A. (1992). Neural approaches in the approximation of the inverse kinematics function: A comparative study. Master's thesis, Universiteit van Amsterdam, Faculteit Computer Systemen. Cited on p. 50.] Jansen, A., van der Smagt, P., & Groen, F. C. A. (1995). Nested networks for robot control. In Murray, A. F. (Ed.), Neural Network Applications, pp. 221{239. Kluwer Academic Publishers, Dordrecht, the Netherlands. Cited on pp. 69, 73, 122, 148.] Kanade, T. (1981). Recovery of the three-dimensional shape of an object from a single view. Articial Intelligence, 17, 409{460. Cited on p. 34.] Katayama, M., & Kawato, M. (1992). A parallel-hierarchical neural network model for motor control of a musculo-skeletal system. Tech. rep. TR{A{0145, ATR Auditory and Visual Perception Research Laboratories. Cited on p. 104.] Kinsella, J. A. (1992). Comparison and evaluation of variants of the conjugate gradient method for ecient learning in feed-forward neural networks with backward error propagation. Network, 3, 27{35. Cited on p. 10.]
156
Bibliography
Knuth, D. E. (1986). The TEXbook. Addison-Wesley Publishing Company. Cited on p. 171.] Koenderink, J. J., & van Doorn, A. J. (1975). Local structure of the motion parallax. Optica Acta, 22(9). Cited on p. 75.] Krose, B. J. A., van der Korst, M. J., & Groen, F. C. A. (1990). Learning strategies for a vision based neural controller for a robot arm. In Kaynak, O. (Ed.), IEEE International Workshop on Intelligent Motor Control, pp. 199{203. IEEE. Cited on pp. 49, 61.] Kuperstein, M. (1988). Neural model of adaptive hand-eye coordination for single postures. Science, 239, 1308{1311. Cited on pp. 2, 69.] Kuperstein, M. (1991). INFANT neural controller for adaptive sensory-motor coordination. Neural Networks, 4, 131{145. Cited on p. 2.] Kuperstein, M., & Wang, J. (1990). Neural controller for adaptive movements with unforeseen payloads. IEEE Transactions on Neural Networks, 1(1), 137{142. Cited on p. 2.] Lagerberg, J., van Albada, D., & Visser, A. (1993). Esprit 2 project car (5220). (personal communication). Cited on p. 37.] Lamdan, Y., & Wolfson, H. J. (1988). Geometric hashing: A general and ecient modelbased recognition scheme. In Proceedings of the Second International Conference on Computer Vision New York. IEEE. Cited on p. 34.] Lee, C. S. G. (Ed.). (1988). Sensor-Based Robotics: Algorithms and Architectures. Springer-Verlag. Cited on p. 2.] Lee, D. N. (1980). The optic ow eld: the foundation of vision. Phil. Trans. R. Soc. Lond. B, 290, 169{179. Cited on pp. 75, 81.] Marr, D. (1982). Vision. W. H. Freeman and Company. Cited on p. 34.] Martin, G. R. (1993). Form and function in the optical structure of bird eyes. In Green, P., & Davies, M. (Eds.), Perception and Motor Control in Birds. Springer-Verlag. Cited on p. 76.] Martinetz, T., & Schulten, K. (1991). A \neural-gas" network learns topologies. In Proceedings of the 1991 International Conference on Articial Neural Networks, Vol. 1, pp. 397{402 Espoo, Finland. Cited on p. 2.] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115{133. Cited on p. 5.] Meijer, G. R. (1991). Autonomous Shopoor Systems. Ph.D. thesis, Universiteit van Amsterdam, Faculteit Computer Systemen. Cited on p. 32.]
Bibliography
157
Miller III, W. T. (1989). Real-time application of neural networks for sensor-based control of robots with vision. IEEE Transactions on Systems, Man, and Cybernetics, 19(4), 825{831. Cited on pp. 2, 13, 54.] Miller III, W. T. (1995). Dynamic balance of a biped walking robot: Adaptive gait modulation using CMAC neural networks. In van der Smagt, P., & Omidvar, O. (Eds.), Neural Systems for Robotics, Progress in Neural Networks. Ablex Publishing Corporation. In preparation. Cited on p. 2.] Miller III, W. T., Hewes, R. P., Glanz, F. H., & III, L. G. K. (1990). Real-time dynamic control of an industrial manipulator using a neural-network-based learning controller. IEEE Transactions on Robotics and Automation, 6(1), 1{9. Cited on p. 2.] M&ller, M. F. (1990). A scaled conjugate gradient algorithm for fast supervised learning. Tech. rep. PB{339, Computer Science Department, University of Aarhus, Denmark. Cited on p. 30.] Moody, J., & Darken, C. (1988). Learning with localized receptive elds. In Touretzky, D., Hinton, G., & Sejnowski, T. (Eds.), Proceedings of the 1988 Connectionist Models Summer School, pp. 133{143 San Mateo. Morgan Kaufman. Cited on p. 13.] Narendra, K. S., & Annaswamy, A. M. (1989). Stable Adaptive Systems. Prentice Hall, Inc. Cited on p. 46.] Narendra, K. S., & Parthasarathy, K. (1990). Identi cation and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1), 4{27. Cited on p. 46.] OUG (1992). Oscar user's meeting.. Cited on p. 45.] Parker, D. B. (1985). Learning-logic. Tech. rep. TR{47, Massachusetts Institute of Technology, Center for Computational Research in Economics and Management Science, Cambridge, MA. Cited on p. 10.] Polak, E. (1971). Computational Methods in Optimization. Academic Press, New York. Cited on pp. 20, 22.] Powell, M. J. D. (1977). Restart procedures for the conjugate gradient method. Mathematical Programming, 12, 241{254. Cited on pp. 20, 22, 165.] Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1986). Numerical Recipes: The Art of Scientic Computing. Cambridge University Press, Cambridge. Cited on pp. 18, 20, 30, 132, 135.] Ritter, H. J., Martinetz, T. M., & Schulten, K. J. (1989). Topology-conserving maps for learning visuo-motor-coordination. Neural Networks, 2, 159{168. Cited on pp. 2, 34, 69.]
158
Bibliography
Ritter, H. J., Martinetz, T. M., & Schulten, K. J. (1990). Neuronale Netze. AddisonWesley Publishing Company. Cited on pp. 2, 54, 69.] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533{536. Cited on pp. 10, 19.] Schram, G., van der Linden, F. X., & Krose, B. J. A. (1995). Predictive robot control with neural networks. In Proceedings of the 1995 International Conference on Intelligent Autonomous Systems. (In print). Cited on p. 122.] Sharma, R. (1992). Active vision in robot navigation: Monitoring time-to-collision while tracking. In Proceedings of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2203{2208. IEEE. Cited on p. 76.] Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. CMU{CS{94{125, Carnegie Mellon University. Cited on p. 20.] Silva, F. M., & Almeida, L. B. (1990). Speeding up backpropagation. In Eckmiller, R. (Ed.), Advanced Neural Computers, pp. 151{160. North-Holland. Cited on p. 28.] Sofge, D., & White, D. (Eds.). (1992). Handbook of Intelligent Control, Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York. Cited on p. 2.] Stoer, J., & Bulirsch, R. (1980). Introduction to Numerical Analysis. Springer-Verlag, New York{Heidelberg{Berlin. Cited on pp. 9, 13, 20.] Unnikrishnan, K. P., & Venugopal, K. P. (1994). Alopex: A correlation based learning algorithm for feedforward and recurrent neural networks. Neural Computation, 6, 469{490. Cited on p. 121.] van der Linden, F. X. (1994). Predictive robot control with neural networks. Master's thesis, Universiteit van Amsterdam, Faculteit Computer Systemen. Cited on pp. 50, 122.] van der Smagt, P. (1994a). Minimisation methods for training feed-forward networks. Neural Networks, 7(1), 1{11. Cited on p. 9.] van der Smagt, P. (1994b). Simderella: a robot simulator for neuro-controller design. Neurocomputing, 6(2), 281{285. Cited on p. 147.] van der Smagt, P., Groen, F., & Krose, B. (1993). Robot hand-eye coordination using neural networks. Tech. rep. CS{93{10, Department of Computer Systems, University of Amsterdam. ftp'able from archive.cis.ohio-state.edu. Cited on pp. 53, 148.] van der Smagt, P., Groen, F., & Krose, B. (1995a). A monocular robot arm can be neurally positioned. In Proceedings of the 1995 International Conference on Intelligent Autonomous Systems. (in print). Cited on p. 75.]
Bibliography
159
van der Smagt, P., Groen, F., & Krose, B. (1995b). Visual feedback in motion: A neural cyclops has learned to grasp. Machine Learning. (submitted). Cited on p. 75.] van der Smagt, P., Groen, F., & van het Groenewoud, F. (1994). The locally linear nested network for robot manipulation. In Proceedings of the IEEE International Conference on Neural Networks, pp. 2787{2792. Cited on pp. 60, 73, 122.] van der Smagt, P., Jansen, A., & Groen, F. (1992a). Interpolative robot control with the nested network approach. In Proceedings of the 1992 IEEE International Symposium on Intelligent Control, pp. 475{480. IEEE Control Systems Society. Cited on pp. 13, 48, 60, 73, 122.] van der Smagt, P., Krose, B. J. A., & Groen, F. C. A. (1992b). A self-learning controller for monocular grasping. In Proceedings of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 177{182. IEEE. Cited on p. 148.] van der Smagt, P., & Omidvar, O. (Eds.). (1995). Neural Systems for Robotics. Progress in Neural Networks. Ablex Publishing Corporation. In preparation. Cited on p. 2.] van der Smagt, P., & Schulten, K. (1993). Control of pneumatic robot arm dynamics by a neural network. In Proceedings of the 1993 World Congress on Neural Networks, pp. III/180{III/183 Hillsdale, NJ. Lawrence Erlbaum Associates, Inc. Cited on p. 103.] Vernon, D., & Tistarelli, M. (1990). Using camera motion to estimate range for robotic parts manipulation. IEEE Transactions on Robotics and Automation, 7(5), 509{ 521. Cited on p. 76.] Vysniauskas, V. (1994) Personal communication. Cited on pp. 97, 121.] Vysniauskas, V., Groen, F. C. A., & Krose, B. J. A. (1992). Function approximation with a feedforward network: the optimal number of learning samples and hidden units. Tech. rep. CS{92{15, Department of Computer Systems, University of Amsterdam, Amsterdam, Netherlands. Cited on pp. 14, 26, 28, 64, 72, 73.] Vysniauskas, V., Groen, F. C. A., & Krose, B. J. A. (1993). The optimal number of learning samples and hidden units in function approximation with a feedforward network. Tech. rep. CS{93{15, Department of Computer Systems, University of Amsterdam, Amsterdam, Netherlands. Cited on pp. 10, 14, 16, 17, 30, 64, 65, 73, 96, 121.] Weiss, L. E., Sanderson, A. C., & Neuman, C. P. (1987). Dynamic sensor-based control of robots with visual feedback. IEEE Journal of Robotics and Automation, 3(5), 404{417. Cited on pp. 4, 54.] Werbos, P. J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. thesis, Harvard University. Cited on p. 10.] Wilks, S. S. (1962). Mathematical Statistics. John Wiley & Sons, New York. Cited on p. 97.]
160
Bibliography
Summary The particular interest of this thesis lies with the coordination between the positioning of a robot manipulator and the visual observation to solve that task (hand-eye coordination). Hand-eye coordinating systems are particularly interesting because of their wide applicability, both in industrial and domestic environments. The goal of such a controller is to accurately position a robot manipulator directly above a visually observed target, and to be robust to unforeseen changes. We investigate both a system which uses static visual sensing, and a system using dynamic visual sensing. This latter system obtains additional visual information by using a component of the optic ow resulting from the motion of a camera mounted in the robot's end-eector. The proposed neural network controller learns from learning samples, which are obtained during operation of the robot-camera system. Continuous adaptivity is important: when the robot system is subject to unforeseen changes, be they gradual or sudden, the controller must remain able to control the robot successfully. The problems that have to be solved are (1) network structure (how does one get an optimal neural network) (2) controller structure (what is measured, and what is controlled) (3) learning sample generation.
The optimal neural network. In chapter 2 we have described a method to nd the
optimal feed-forward neural network structure, as well as a heuristic to nd good weight values. The optimal neural network structure, viz. the number of hidden units and the number of learning samples to reach a certain approximation error, can be found by determining an asymptotic model (`AMEF') of the error as a function of the number of hidden units and learning samples. The best t of this model to the approximation error of the feed-forward neural network is used to predict the minimal computational resources to attain a certain error in the approximation. Secondly it has been shown that, although standard error back-propagation is less sensitive to local minima, second order minimisation methods are far superior with respect to learning time. Conjugate gradient algorithms were shown to be a better choice for feed-forward neural network training. In particular, the Polak-Ribi$ere conjugate gradient optimisation method with Powell restarts appears to be a very good method for estimating good weight values in a neural network.
Static hand-eye coordination. With the methods described in chapter 2 to deter-
mine the optimal neural network to solve a speci c task, we have designed a self-learning 161
162
Summary
controller described in chapter 4 which positions a monocular robot arm above a stationary target, without requiring knowledge of the arm or camera. A simple model of the environment (viz. a known area of an object above which the manipulator with camera must be positioned) is used for positioning the monocular robot arm. Simulation studies as well as implementation on a real robot show that the proposed neural network learns very fast and reaches the expected accuracy of approximately 2.5cm in the end-eector. However, by adding knowledge of the desired mapping, viz. that when the target object is in the camera centre, the robot need not move anymore (so-called `null-samples'), the error goes down to 1cm in one step, and 1mm in two steps. Furthermore, it is shown that the system automatically relearns the hand-eye coordination when the robot-camera system changes (e.g., a 45 camera rotation). This adaptivity depends on the precision of the learning modules, and it is possible that both accurate but rigid and less accurate but adaptive learning modules work in parallel, thus allowing for an optimal control of the life cycle of the feed-forward neural networks.
Visual feedback in motion. Due to the fact that monocular vision is used, the al-
gorithm proposed in chapter 4 assumes knowledge of the object that is grasped. This limitation is relieved by the algorithm proposed in chapter 5. We have shown that simple optic ow measurements (viz. visual measurement of distance divided by velocity, which can be measured via divergence) can be used to generate a trajectory in visual space. In chapter 5 it is assumed that the goal is that the robot arm with the camera equipped in the end-eector reaches a position where the visual position 0, velocity 1, and possibly higher derivatives are all zero. It is shown that this requirement can be interpreted as a constraint on those two measured quantities, such that
0 i] = ; ( d ; t i]) 1 i] n during successive time intervals i, where d is the desired time that the trajectory may take, t i] the time in interval i, and n a smoothness parameter. This expression can subsequently be used to compute visual reference signals for the controller, without the requirement of a model of the observed scene. Thus the system does not use any knowledge of the robot arm or camera, and does not require any calibration at all. A self-learning neural network is used to translate this trajectory to joint space (joint accelerations) such that the robot manipulator decelerates towards a stationary object. This controller, when used in a simulator testbed, leads to a grasping accuracy of less than 1mm in the end-eector position at the desired time. This grasping precision is directly related with the number of feedback steps that is executed during one trial, and can thus be increased by slowing the robot down. Furthermore, it has been shown that the system performs well under the inuence of noise in the measurements as well as learning samples (graceful degradation).
Low-level robot arm control. Finally, in chapter 6 we have investigated the control of the dynamics of a robot arm. For this problem, the well-analysed dynamics of the
Summary
163
OSCAR robot are not subject of investigation, but instead those of the SoftArm robot. This robot is a 5-DoF robot arm, where each joint is controlled by an agonist-antagonist muscle pair the arti cial muscles consist of rubber tubes which can be inated or deated, resulting in length dierences. As measurements of that arm show, it exhibits a hysteretic pressure-position relationship, its behaviour is strongly dependent on `external inuences.' This behaviour is qualitatively explained from a simple model of the rubbertuators and the rubbertuator-driven joints. Since the input-output behaviour of the system is easily measured (pressures vs. joint values), a neural direct adaptive controller has been be constructed to control the robot arm. Tested on two trials for the control of a single joint, this method is shown to be very successful after only a few learning trials, and is remarkably better than a PID controller. However, further research is required to fully interpret the value of this approach in the context of hand-eye coordination.
Conclusions. We have shown in this thesis that neural networks are well suited for the
coordination of perception and motion, and allow for a high degree of adaptivity without requiring models of the controlled system. Hopefully this has lead to another brick on the paved road towards the production of more widely applicable autonomous robot systems.
164
Summary
Samenvatting In dit proefschrift hebben we ons bezig gehouden met besturingssystemen voor het positioneren van de grijper van een robotarm, die gebruik maken van visuele observatie. Het onderzoek heeft zich hierop geconcentreerd omdat visueel gestuurde robot armen een zeer groot toepassingsgebied hebben, zowel voor industriele als huishoudelijke doeleinden. Het doel van het maken van een besturingssysteem voor zulke robots is het ervoor zorgen dat de grijper van de robot exact boven een door de camera waargenomen voorwerp geplaatst wordt. Daarnaast is een vereiste dat dit besturingssysteem zich kan aanpassen aan veranderende omstandigheden, zoals onvoorziene vervorming of slijtage van de robotarm of camera. Ten eerste is een systeem onderzocht dat gebruik maakt van statische camera beelden, en vervolgens een systeem dat gebruik maakt van de verandering van het beeld als de camera, die in de grijper van de robot gemonteerd is, in beweging is. De zgn. "feed-forward" neurale netwerken die voor dit doeleinde onderzocht zijn leren leervoorbeelden die verkregen worden tijdens de besturing van de robotarm. van een aantal Continue adaptiviteit is van belang: als de robot of camera aan plotselinge of geleidelijke veranderingen onderhevig is, moet het neurale netwerk in staat blijven het systeem met dezelfde precisie te besturen. Hiertoe hebben we ons geconcentreerd op (1) het bepalen van de optimale netwerk structuur (2) de structuur van het besturingssysteem en (3) hoe leervoorbeelden gegenereerd worden.
Het optimale neurale network. In hoofdstuk 2 hebben we een methode beschreven
waarmee het optimale feed-forward neurale netwerk kan worden bepaald. De optimaliteit is ten eerste afhankelijk van het aantal neuronen in de middelste laag ("hidden" of verborgen" neuronen), en ten tweede van het aantal leervoorbeelden waarmee zo'n net"werk geleerd wordt. De optimale structuur wordt bepaald aan de hand van een asymptotisch model ("AMEF") van de leerfout als een functie van het aantal verborgen neuronen en het aantal leervoorbeelden. Ten tweede is het aangetoond dat, alhoewel de meest gebruikte leermethode error back-propagation" minder gevoelig is voor het bereiken van locale minima in het"foutlandschap, leermethoden die tweede orde informatie (de afgeleide van de gradient) van het foutlandschap gebruiken veel sneller tot een optimaal resultaat komen. Als beste leermethode voor het trainen van feed-forward neurale netwerken zijn de zgn. geconjugeerde gradienten leermethoden gekozen, in het bijzonder de Polak-Ribi$ere variant met uitbreidingen gesuggereerd door Powell (Powell, 1977). 165
166
Samenvatting
Statische robot-camera coordinatie. Met de in hoofdstuk 2 beschreven methoden
voor het bepalen van het optimale neurale netwerk om een bepaalde taak op te lossen is een zelerend besturingssysteem ontworpen waarmee de grijper van een robot arm, met daarin gemonteerd een enkele camera, boven een niet bewegend object geplaatst kan worden. Deze methode, beschreven in hoofdstuk 4, gebruikt geen voorkennis van de robot of van de camera. Er wordt slechts gebruik gemaakt van een eenvoudig model van de omgeving (nl. kennis van de werkelijke oppervlakte van het object). Deze methode is zowel op een gesimuleerde als een echte robot toegepast, en leidt tot een met het AMEF model voorspelde positioneringsfout van 2.5cm deze precisie wordt al na enkele tientallen malen proberen bereikt. Door het systeem uit te breiden met kennis van de taak, namelijk dat de robot niet meer moet bewegen op het moment dat het object op de juiste positie in het beeld waargenomen wordt, kan deze precisie verbeterd worden tot 1cm na 2e2en stap, en 1mm na twee stappen. Vervolgens is het systeem in staat om zich aan te passen aan veranderende omstandigheden. Dit is aangetoond door de in de grijper gemonteerde camera 45 te draaien tijdens het leren de originele precisie wordt weer na enkele tientallen malen proberen bereikt. Het aanpassend vermogen is afhankelijk van de precisie van de leermethoden, en de gekozen implementatie staat toe dat zowel minder nauwkeurige maar snel aanpassende als nauwkeurige maar langzaam aanpassende leermodules tegelijk worden gebruikt. Deze methode zorgt voor een optimale levensduur van feed-forward neurale netwerken.
Terugkoppeling van de visuele waarneming. Omdat de in hoofdstuk 4 beschreven methode gebruik maakt van slechts 2e2en camera, wordt kennis van de dimensies van het object gebruikt om diepte-informatie te verkrijgen. Deze beperking wordt opgelost in hoofdstuk 5 waarin simpele "optic ow" grootheden (nl. de divergentie van een vlak waar de camera naartoe beweegt) gebruikt worden om een traject in het visuele domein te berekenen. Het doel van het in dit hoofdstuk beschreven besturingssysteem is de robot arm (met de camera in de grijper) zodanig besturen dat een eindpositie bereikt wordt waar de gemeten visuele positie 0, snelheid 1, en hogere afgeleiden allemaal naar nul gaan. Met andere woorden: de grijper is bij het object en staat stil. Het is aangetoond dat dit doel bereikt kan worden door, tijdens het traject, de volgende eis op deze visuele grootheden op te drukken: 0 i] = ; ( d ; t i]) i] n 1
gedurende opeenvolgende tijdsintervallen i, waar d de gewenste tijd voor het traject aangeeft, t i] het tijdstip aan het begin van interval i, en n een parameter waarmee de gladheid van het traject geregeld kan worden. Deze uitdrukking wordt gebruikt om een traject in visueel domein te bepalen, zonder dat een model van de camera of het object nodig is. Calibratie van het systeem is derhalve ook overbodig. Een zelerend neuraal netwerk wordt vervolgens gebruikt om het traject in visueel domein te vertalen naar torsies voor de actuatoren van de robotarm. Deze methode, getest op een gesimuleerde robot, leidt tot een precisie beter dan 1mm in de positie van de grijper van de robot op het gewenste tijdstip (aangegeven door d ). Deze precisie is
Samenvatting
167
direct gerelateerd aan het aantal tijdsintervallen i waarover de beweging uitgevoerd wordt, en kan dus verbeterd worden door de robot langzamer te laten bewegen. Simulatie-experimenten hebben aangetoond dat het systeem onder de invloed van ruis in de metingen en leervoorbeelden goed blijft functioneren.
Besturing van een exibele robot arm. Tenslotte is in hoofdstuk 6 een neuraal
netwerk getest voor de besturing van een robotarm die gedeeltelijk gemodelleerd is naar de skelet/spier structuur van menselijke armen. Elk gewricht van deze arm wordt bestuurd door een paar van antagonistisch werkenden kunstmatige spieren. Deze spieren bestaan uit exibele rubberen buizen die opgeblazen kunnen worden, hetgeen leidt tot een samentrekkende beweging. Metingen aan deze robot arm laten een hysterese zien in de relatie tussen de luchtdruk van de spieren en positie van de arm. Daarnaast is de arm sterk onderhevig aan temperatuursinvloeden van buitenaf. Het gedrag van de arm kan, zoals getoond, kwalitatief verklaard worden aan de hand van een simpel model van de kunstmatige spieren. Een neuraal netwerk is gebruikt voor het besturen van deze robot arm. Dit neuraal netwerk leert de relatie tussen hoekstand van de arm en de bijbehorende luchtdruk van de spieren om die hoekstand te bereiken. Experimenteel is aangetoond dat deze methode aanzienlijk beter werkt dan het door de fabrikant bijgeleverde PID besturingssysteem. Getest op een eenvoudig en complexer traject toont het systeem correct gedrag na slechts enkele leeriteraties. Het besturingssysteem is echter alleen getest voor de besturing van 2e2en arm segment. Verder onderzoek is vereist om de waarde van deze aanpak te evalueren.
Conclusies. In dit proefschrift is aangetoond dat neurale netwerken zeer geschikt zijn voor de aansturing van visueel gestuurde autonome systemen, en leiden tot een hoge mate van adaptiviteit zonder het gebruik van gedetailleerde modellen van het bestuurde systeem.
168
Samenvatting
Acknowledgements The rst `thank-you', when it comes to preparing this thesis, has to go to my research supervisor Frans Groen. He was always patient enough to put up with my stubbornness, sometimes resulting in long discussions about using square brackets or subscripts, but more often about the contents of the work. Frans seems to magically know an answer to any scienti c question posed to him. But this thesis would not have existed if my co-promotor, Ben Krose, had not been there. Ben and I regularly disagreed on articles we produced together: : :but we always got them out successfully. Thanks, Ben, it's a pleasure working with you. And: thanks to all those students who were willing to do parts of the dirty work as well as engage in scienti c discussions without ever complaining: : : The department of Computer Systems at the University of Amsterdam is everinspiring. For instance. Vytas Vysniauskas (a recurrent visitor at the department) was very kind in helping me to run his software for computing the minimal resources lines. All the other PhD students and post-docs upon whom one could always barge in to bother them with trivialities. And, of course, the system management group and the other supporting groups: : :without them, no thesis. Then, of course, there should be a Very Large ThankYouGuys to all those people out there who have a computer and an Internet connection. Especially concerning irrelevant references and remarks, but also more fruitful discussions and, of course, all the people who were friendly enough to distribute that free software such as gcc, gnuplot, simderella, tex, ups, xfig, xgraph, : : : This thesis was typeset with LaTEX 2.09, TEX 3.141, BibTEX 0.99. All gures were made with x g, Mathematica, Maple, and gnuplot, and in a few cases repaired with CorelDRAW or Adobe Photoshop. Text in the gures was made with psfrag. Bibtag was used to put references in the bibliography. My stay at the Theoretical Biophysics group in the Beckman Institute, which I always claim to have been a year but was in fact only six months, has had a wonderful life-long impact on me. But apart from that, I am grateful for the Schulten Group for always being open to a discussion about science. Of course, I'm very grateful to Michael Zeller of SG who ran some experiments for me, from which I derived gure 6.10. Last but de nitely not least, all my thanks go to Britta Platt|the real reason that I'll never regret having gone to Illinois|who never failed to support me in her self-critical (and Patrick-critical) way, and had to put up with my `thesis comes rst!' moods for too long. There's only one thing I know more about than Britta, and as I have just shown that one thing can simply be condensed to 175 pages of text: : : B, I'm forever indebted. 169
170
Acknowledgements
Index The author tried to make this index as complete as possible, with the side eect of creating multiple entries for single occurances of words in the text. However, no references for authors are present these can be found in the bibliography. Entries are sorted in word order e.g., `red lion' will follow `redken'. Following Knuth (Knuth, 1986), denitions of extries are marked by underlined page numbers.
A
binning of learning samples, 59 binocular vision, 34 blob extraction, 141 blur radius, 38 blurring error, 38 Broyden-Fletcher-Goldfarb-Shanno learning, see BFGS learning
acceleration, 78 activation function, 5 Heaviside, 6 threshold, 6 adaptive control, 32, 46 Alopex, 121 ambiguity in arm posture, 53 AMEF, 64, 96{98 variation on, 97 anthropomorphic robot arm, 3 approximation error, 14 model, 16f. autonomous system, 1
C
calibration, 4, 32 camera external vs. internal, 33 pinhole, 35, 89, 102 precision, 37f. CCD, 35 centrifugal forces, 44 chi-square merit function, 91, 127, 132 closed loop control, 56f. comp.robotics, 150 component recognition, 36 conjugacy of vectors, 21 conjugate gradient, 20, 9{30, 48, 56, 115 Fletcher-Reeves, 21 Polak-Ribi$ere, 23 understanding, 20 vs. back-propagation, 22 connel, 62, 94, 115, 141, 147f. constraint
B
back-propagation, 10, 19, 9{30 vs. conjugate gradient, 22 backpropagation, see spelling errors basis function, 10, 11{14 local, 12f. non-orthogonal, 14 orthogonal, 13f. parametrised, 12 sigmoid, 14 batch learning, 10 bemmel, 147, 149f. as 3D display program, 150 BFGS learning, 20 171
172 time-dependent, 85, 83{88 proof, 131f. time-independent, 80, 80{83 proof, 129{131 control adaptive, 32, 46 direct, 46f. indirect, 46f. closed loop, 56f. deterministic, 46 open loop, 55f. stochastic, 46 controller ideal, 48, 55, 95 convergence rate, 22 conveyor belt, 31 Coriolis forces, 44 correspondence problem, 34 covariance matrix, 91, 134, 135 criterionator, 94
D
DataCube MaxVideo-20, 36, 140 Davidon-Fletcher-Powell learning, see DFP learning degrees-of-freedom, 41 delta function, 12f. Denavit-Hartenberg matrix, 41, 148 OSCAR, 43 design matrix, 134 deterministic control process, 46 DFP learning, 20, 24, 26 dilemma, 59 direct adaptive control, 46f. discretisation error, 38 distance, 78 DoF, see degrees-of-freedom dynamic control, 103{119 dynamics, 40, 44
E
elbow up{elbow down ambiguity, 53 error blurring, 38 discretisation, 38
INDEX
in robot positioning, 44 learning samples, 61f. visual area, 38, 57 visual position, 37, 57 external vs. internal camera, 33 eye-in-hand positioning, 53{73
F
f-number, 39 feed-forward network, 6, 11 basis function, 10, 11{14 local, 12 non-orthogonal, 14 orthogonal, 13f. sigmoid, 14 feedforward, see spelling errors sh, 75 Fletcher-Reeves conjugate gradient, 21 Ford, Henry, 31 Fourier transform, 12 friction, 44, 103 function approximation, 10
G
Galba, 150 gannet, 75 Gauss-Jordan elimination, 135 generalisation error, 9, 15 global minimisation method, 14 goal, 55, 76 gradient descent, 19 gravity, 44, 103 greedy algorithm, 10
H
hand-eye coordination, 2 Heaviside activation function, 6 Hessian, 19, 22 hidden unit, 6, 10 Hilbert matrix, 127, 136
I
ideal controller, 48, 55, 95 image acquisition, 36 image description, 36
173
INDEX
image processing hardware, 36 image segmentation, 36 implementation, 62, 139{141 incremental learning, 10 indirect adaptive control, 46f. input adjustment learning, 49f., 60{62 errors, 61f. internal vs. external camera, 33 interpolation learning, 48f., 60 interpolator, 94 inverse dynamics, 3, 44 inverse kinematics, 3, 41, 95
J
jerk, 79 joint, 3
K
kinematics, 40 Kohonen network, 104 Kronecker delta, 13, 137
L
Landau, vii learning, 7 approximation error, 14 back-propagation, 19, 9{30 learning rate, 19 momentum, 19 vs. conjugate gradient, 22 batch, 10 BFGS, 20 Broyden-Fletcher-Goldfarb-Shanno, see learning, BFGS by interpolation, 48f., 60 conjugate gradient, 20, 9{30 vs. back-propagation, 22 DFP, 20, 24, 26 generalisation error, 9, 15 incremental, 10 input adjustment, 49f., 60{62 errors, 61f. optimisation error, 9, 15 quasi-Newton, 20 representation error, 9, 14
learning rate, 19 learning samples, 2, 5, 7, 48{50, 95 binning, 59 input adjustment, 49f., 60{62 errors, 61f. interpolation, 48f., 60 null-samples, 57 set of, 54 least-mean-square, 132{135 Levenberg-Marquardt optimisation, 30 linear convergence, 22 line minimisation, 18 link, 3 local representation, 12 LU decomposition, 135 lummel, 62, 94, 96, 115, 141
M
matching problem, 34 MaxVideo-20, 36, 140 model equation, 81, 87 momentum, 19 monocular vision, 34
N
nested network method, 122 neural gas, 104 neural network, 5{7 Turing machine, 6 universal approximator, 6 neuron, 5 normal distribution transformed, 136 null-samples, 57
O
occlusion, 34 open loop control, 55f. optical system model, 37 optic ow, 75 optimisation error, 9, 15 orthogonal, 13, 137f. orthogonal polynomials, 13 orthonormal, 13, 137
174 OSCAR, 43, 139{141, 148 over tting, 7, 9
P
parameter estimation methods, 17{24 payloads, 103 pelican, 75 perception, 33{36 pick-and-place problem, 32 PID control, 44, 107 implementation, 141 pinhole camera, 35, 89, 102 Polak-Ribi$ere conjugate gradient, 23 precision camera, 37f. robot, 44 Prime Time AI, 150
Q
quasi-Newton learning, 20
R
radial basis function, 13 RBF, see radial basis function redken, 171 red lion, 171 reinforcement learning, 102, 124 representation error, 9, 14 robot ambiguity in posture, 53 anthropomorphic, 3 dynamic control, 103{119 dynamic description, 40 dynamics, 40, 44 inverse dynamics, 3, 44 inverse kinematics, 3, 41, 95 joint, 3 kinematics, 40 link, 3 motion, 40{46 perception, 33{36 precision, 44 SoftArm, vi, 103{119, 124 static description, 40 trajectory planning, 3
INDEX
vision, 33{36 rubbertuator, vi, 104, 103{119 characteristics, 108f. elasticity, 109 hysteresis, 109, 112f. force-to-weight ratio, 105
S
self-learning controller, 32 servo-drive unit, 107 servo-valve unit, 107 setpoint, 3f., 55, 76, 92 sigmoid function, 14 Simderella, 64, 147{151 simmel, 147{149 sockets, 56, 141, 147 SoftArm, vi, 103{119, 124 gripper, 107 position control mode, 107 pressure control mode, 107 SSQ, see summed squared error stability-plasticity dilemma, 59 static hand-eye coordination, 53{73 steepest descent, 10, 19 stereoscopic vision, 34 stiness, 105 stochastic control process, 46 stopping criterion, 78 Sulid , 75 summed squared error, 9 super-linear convergence, 22 switching-on eect, 87
T
table-lookup, 12 Taylor expansion, 12 threshold activation function, 6 thresholding, 40 time-dependent constraint, 85, 83{88 proof, 131f. time-independent constraints, 80, 80{83 proof, 129{131 time-to-contact, 75, 75{102 Toeplitz matrix, 135 trajectory, 76
INDEX
trajectory controller joint level, 140f. trajectory planning, 3 transfer function, see activation function
V
velocity, 78 via point, 3 vision, 33{36 area error, 38, 57 binocular, 34 component recognition, 36 correspondence problem, 34 image acquisition, 36 image description, 36 image segmentation, 36 matching problem, 34 monocular, 34 occlusion, 34 position error, 37, 57 stereoscopic, 34 thresholding, 40 timing, 36, 56 VxWorks, 140
W
wavelets, 13 wear-and-tear, 83 weight, 6
175