Genetic Algorithms for Free-Form Surface Matching K. Brunnstr¨om and A. J. Stoddarty Dept. of Electronic and Electrical Engineering University of Surrey Guildford, Surrey GU2 5XH, UK
Abstract
fail, when nothing is known about where the global optimum is located. Recently genetic algorithms have attracted attention as a powerful tool for optimization problems. Also, in the Computer Vision community a growing number of applications of these techniques can be seen. Recently, [7] used a genetic algorithm for registration of 3D images in a medical application. Cross and Hancock[5] reports fast convergence with a genetic search based on Hamming distance. However, to the best of our knowledge we know of no earlier use of genetic algorithms for matching free-form surfaces. In this paper we will present a simple algorithm applied to this problem. The problem we address is as follows. Suppose there is some detailed surface model of an object and this object is imaged using a range sensor giving 3D information, then we are seeking the transformation that will bring the model and the data into correspondence. In a single view the object is only partly imaged and the image may contain measurements of other objects. Of course the “model” may be the same object imaged from another view. We do not address the problem of accurate registration. Algorithms based on gradient descent, or iterative closest point already exist [3]. They can accurately register two surface datasets given an initial guess reasonably close to the actual solution. They typically require a user supplied initial guess. This paper presents a method that could automatically supply this guess. In some sense the free-form surface matching problem is a special case of the general model based object recognition problem. Many algorithms have been proposed for object recognition, but only a few are appropriate for freeform surfaces. In particular we do not wish to make overly restrictive assumptions such as convexity or the absence of clutter. On the other hand the algorithm presented in this paper will not be suited to matching objects composed of “simple” surfaces such as planes, cylinders and spheres. We would recommend use of other methods that extract such features and performing matching based on them.
The free-form surface matching problem is important in several practical applications, such as reverse engineering. An accurate, robust and fast solution is, therefore, of great significance. Recently genetic algorithms have attracted great interest for their ability to robustly solve hard optimization problems. In this work we investigate the performance of such an approach for finding the initial guess of the transformation, a translation and a rotation, between the object and the model surface. This would be followed by a local gradient descent method such as ICP to refine the estimate. Promising results are demonstrated on accurate real data.
1. Introduction The free-form surface matching problem remains a difficult problem. Apart from being interesting in its own right, there are a number of direct applications. These include inspection and recognition of free-form surfaces from dense depth data. In these problems correspondences need to be established between models and the data. For example in reverse engineering processes range data is acquired by a range scanner. Separate views need to be registered and fused [11]. The registration is usually based on gradientdescent e.g. the iterative closest point (ICP) algorithm. To find the right solution the user must input an initial guess. For these applications an accurate, robust and fast solution is of vital importance. The problem of matching two free-form surfaces can be cast as a search or an optimization problem. This leads to a 6 dimensional optimization problem with many local extrema. Hence, a gradient-descent method would most likely S & T Datakonsulter AB, Box 24183, S-104 51 Stockholm, Sweden Tel: +46-8-7832714, Fax: +46-8-6678230, Email:
[email protected] y Tel: +44-1483-300800 ext. 3012, Fax: +44-1483-34139, Email:
[email protected]
1
not increased for a fixed number of generations; or it is terminated after a certain number of generations. There are a few free parameters that need to be selected. The number is small compared to many other methods1 . The parameters are: population size i.e. the number of chromosomes; the crossover probability; the mutation probability; the stopping criteria; and one parameter in the fitness function, which we call the temperature. The problem of selecting parameters for the GA’s is an open question. They are usually chosen heuristically based on experience of the user community. This has guided us in our choice of parameters.
There are very few algorithms on the market that convincingly address the problem of free-form surface matching without an initial guess. One algorithm that apparently does solve the problem is based on a multi stage algorithm of some complexity [1]. It requires in the region of 30 minutes to compute a solution. Besl’s survey article [2] on free-form surface matching convincingly specifies the problem. It cites 111 other articles almost none of which actually address the general freeform matching problem. Those that do, assume that reliable features (e.g. zero crossings of curvature) can be extracted. We make no such assumption. This paper, which is a shortened version of the technical report[4], is organized as follows. First we present the genetic algorithm for free-form surface matching, assuming that the reader is familiar with how a genetic algorithm is working in general. For such a description see e.g. Goldberg [6]. We then describe our experiments and results. Finally we conclude with a discussion.
2.1. The fitness function A genetic algorithm needs a cost function or fitness function to be defined for a given problem. In this Section we develop a fitness function appropriate for the free-form surface matching problem. Because the fitness must measure match quality in correspondence space it is designed to be invariant to translation or rotation of either the scene or model. It should be robust in the presence of clutter as well. Because of the random sampling we can never achieve an exact match but there will usually be reasonably “good” approximate matches. The quality of the best match will be a function of the average spacing between the randomly selected points on the surface. In genetic algorithms the choice of a fitness function needs careful consideration. The convergence of the algorithm depends crucially on relative sizes of this over solution space. A choice we make is to build up a cost that has some meaning, in particular we aim to find an intermediate quality measure Q which counts the number of good matches. This will then be transformed into f which will be the fitness passed to the GA. We suppose that the M model points are labeled by = 1::M and the N scene points are labeled by i or j = 1::N . If scene point i takes on label this is denoted by i . We denote a joint labeling of all objects by the set f1 ; 2 ; :::N g. As a shorthand we define the multi-index ~ = f1 ; 2 ; :::N g. Now suppose that we choose 2 points i; j from the scene surface. They will contain 4 vectors (2 positions and 2 nor^ i ; ~rj ; n^ j . mal vectors) of information between them, i.e. ~ri ; n There are 10 independent parameters and 4 invariants under rigid translation and rotation. We are interested in features that remain invariant to rotation and translation. The most obvious is the length of the vector ~vij from point i to point j , i.e. ~vij ~rj ? ~ri . Pairwise relative orientation is also an invariant. Finally, there will be a twist angle between the
2. The genetic algorithm To use a genetic algorithm for solving our problem, it must be formulated in a suitable way. First we need to convert the continuous problem of free-form surface matching to discrete form. This is done by randomly sampling the model surface to a ‘sufficient’ density, and randomly sampling the scene surface to some other density. We now pose the problem as a point matching problem. We wish to label (associate) each point in the scene with a point from the model, or a null label. Because the points are randomly chosen we do not expect an exact match. However, when sampled to a sufficient density we hope to match to a relatively close point. We also need a way to measure fitness of the chromosomes. The chromosomes represents an assignment of model points to scene points. The fitness function used will be described in detail below. It is based on a function that counts the number of good matches, by using the translational and rotational invariants such as relative orientation of surface normals and relative distances between points. Finally, we must decide on when a suitable result has been obtained. The fitness function measures the number of good matches. Ideally when all points are correctly matched, it will correspond to the number of scene points. However, due to the random sampling, noise and occlusion this cannot be achieved in practise. Furthermore, there is no way from within the genetic algorithm to check whether we have found the right solution or not. In the field of GA’s the choice of termination is still an open question. We have investigated three different approaches: the optimization is stopped when a certain percentage of the total number of correct matches have been found; the maximum fitness has
1 Their exact values will be given together with the presentation of the results.
2
(1) (2) (3)
surfaces, but not those with many small details. We select the parameter from an estimate of the typical error in a surface normal. Standard methods exist for this [10] and we use a value for = 20 throughout this paper. We have not tested other values. Finally we obtain the pairwise quality measure we require
(4)
Q(i ; j ) = ed(i ; j )en (i ; j )
two normals representing the extent to which they are out of plane. Thus the four invariants are
k~vij k
= k~rj ? ~ri k cos(ij ) = n^ j ~vij cos(ji ) = n^ i ~vji cos( ij ) = (n^j d ~vij ) (n^i d ~vij )
Now consider two model points and two scene points. We wish to define a pairwise match quality Q(i ; j ) which will be a product over four terms. The first term qd (i ; j ) can then be expressed as
[k~v k ? k~v k]2 qd (i ; j ) = exp ? ij 2 ij 2
This quantity is equal to 1 for a perfect match and degrades to zero. From this expression we can form a total quality measure for a particular match by summing up all the pairwise relations for that match i.e.
Q(i ) =
(5)
(b)
X j 6=i
s XX
Q(~ ) = 2
i j 1, so we have used the value 1 throughout this work.
(f)
Figure 1. Range data of a rabbit from a Cyberware range scanner taken from different viewing directions, rotated around one axis.
Suppose we have M points randomly distributed on the model surface. Given an arbitrary scene point how far will it be to the nearest model point (when correctly registered)? For an infinite plane the closest distance between points will be a random variable following the Poisson distribution [9]. The expected p value of closest distance can be calculated to be ?( 23 ) . Where is the density of points on the surface, which we estimate to be the area of the surface divided by the number of randomly sampled points. This agrees well with the empirical average closest distance. The angular quality measure is defined as
qn (i ; j ) = e?
(8)
If all N matches are correct then this will give (approximately) N . Thus far we have designed a heuristic measure of match quality with an easy intuitive definition. The next step, also heuristic, is to design a “fitness function” that is appropriate for the GA. We choose the following
f (~ ) = exp (d)
Q(i ; j )
This measure will range between 0 to N ? 1. We can now build up from this a global match quality
For a perfect match this returns 1, and decays as a Gaussian with standard deviation . Clearly must be related to measurement noise and sampling density. We expect the sampling density to dominate here, and we use a simple probabilistic model to determine . We do not tune .
(a)
(7)
2.2. Finding the transformation
~ , but we actually The result of the GA is a good labeling want a transformation, a translation and a rotation, which takes the object data set into the model. When the genetic algorithm are stopped, the chromosome of the best scoring individual will contain both good and bad matches. If we can pick out the subset that just contain good matches we could use a standard method to find the transformation. This could indeed be done by looking at the point-wise matching
i ;j ?ij )2 +(j ;i ?ji )2 +( i ;j ? ij )2 22
(
(6) As the normal is a derivative quantity we assume that the error will be dominated by the measurement error and not the random sampling process. This is true on gently varying 3
score Q(i ), Equation 8. If this is close to N ? 1 we can use it for calculating the transformation. In this work we have simply been using a threshold, typically set to about 30% of N. The transformation is then found by least-squares fitting, which is solved using a SVD-based based method [8].
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
2.3. Complexity The core computation is the fitness function which is evaluated for each individual in the population, to determine its relative strength. As can be seen by equation 9, this computation will be of O(N 2 ). Consider the situation that we have a population of L individuals and the algorithm is stopped after G generations. This will give an overall complexity of O(GLN 2 ).
(g)
(j)
(h)
(k)
Figure 3. High precision 3D data of a cow produced by Viewpoint Animation Engineering. d)-f) Shows the result when using the above described set of parameters, except that the number of labels for the GA have been doubled ( = 200) to reflect that the model surface has = 100 double the area. g)-i) The result when using and = 400.
M
(i)
M
N
In the first example, Figure 1, an image of a rabbit obtained by a laser range finder, Cyberware, have been used. The scene, 1(a), and the model, 1(b), have been taken from different directions, with a more than 50% overlap, 1(c). The resulting transformation applied to the scene is shown in Figure 1(d), and in 1(e) and 1(f) it is shown together with the model from two different viewing directions. The data was used in Turk and Levoy [11] and made available by ftp. It has been registered with an ICP based algorithm starting from a user supplied initial position. The ICP transformation is given as comparison to the GA computed transformation in the table of Figure 5(a). As can be seen there are good agreement between the values computed by the two methods. In the next example, Figure 2, we have been using 3D data of a bust of Beethoven. The object or the scene is shown in Figure 2(a). The model is a translated and rotated version of the object, Figure 2(b). This data was produced by Viewpoint Animation Engineering using point measurements of high accuracy. The two data sets are shown together in Figure 2(c). Applying the computed transformation to the object, the result can be seen in Figures 2(d)-(f). The table in Figure 5(b) gives the computed transformation and the errors in these parameters. In Figure 3 the model is a cow2 , 3(b). The scene, 3(a), has been generated by cutting out half the cow from head to tail and then adding a rotation and a translation to its position. The match in the first example has been obtained by
(l)
Figure 2. High precision 3D data of a bust of Beethoven produced by Viewpoint Animation Engineering.
3. Experiments and Results We will here present a few experiments illustrating the strengths and weaknesses of the presented approach to freeform surface matching. It should be stressed that our objective is not accurate registration, rather it is to develop a technique to find an approximate pose. The parameters of the algorithm have been kept constant throughout the experiments. In one case the number of points has been increased to obtain a better match. Both results are presented. In the standard case the population has consists of 50 individuals, each having 50 points i.e. M = 50 (see Section 2.1). The genetic algorithm searches for matches among 100 label points i.e. N = 100. The crossover probability was 90% and the mutation rate 1% per label. The temperature has been fixed at 1:0. In all cases the GA was run for 1000 generations. The GA specific parameters have been obtained by testing a few values around those given in Goldberg’s book about GA’s and selecting the set with the fastest convergence for a particular example. The computation time in the examples with these parameters is about 2 minutes on a AlphaStation 250.
2 This
4
data comes from Viewpoint Animation Engineering as well.
using the same parameters as above, except that we have increased the number of labels (M = 200) to reflect the fact that the size of the model is double that of the object. The transformation found was not very good, Figure 3(d)(f) and 5(c)(Estimate 1). This is improved considerably, Figure 3(g)-(i) and 5(c)(Estimate 2), by increasing the number of points (N = 100) on the object surface as well as the number of labels points on the model surface (M = 400). In the final example, Figure 4 and 5(d), the front legs of the cow above have been cutout and rotated relative to each other. A rather surprising solution is found, even though we expected the algorithm to have problems in this case. The points around the cylindrical part of the body will dominate the solution, which make the algorithm stuck in a local optimum.
(a)
(b)
in the field of Genetic Algorithms suggest that small modifications can significantly improve their performance. On the other hand almost all optimization algorithms benefit from the inclusion of domain-specific knowledge into the basic algorithm. This we have not yet done, but seems likely to lead to a much more efficient algorithm. Work is in progress on these points. Parameter
xt
yt
zt
xr
yr
zr
a)
ICP estimate Estimate
0.00 0.038
0.00 0.0044
0.00 0.039
0.019 -0.0016
-1.00 -1.00
-0.013 -0.026
34.3 31.4
b)
True Estimate
4.00 3.64
9.24 8.95
-1.83 -2.31
1.00 -0.99
0.00 -0.070
0.00 -0.091
180.0 176.2
c)
True Estimate 1 Estimate 2
-1.00 -0.91 -1.14
0.71 -0.57 1.15
-3.54 -2.43 -2.81
1.00 0.99 1.00
0.00 -0.085 -0.010
0.00 -0.054 0.010
45.0 21.2 51.4
d)
True Estimate
1.00 3.40
2.00 2.11
3.00 2.74
-1.00 0.064
0.00 -0.91
0.00 0.41
45.0 171.5
Figure 5. The computed transformation for the example shown in Figure 1, 2, 3 and 4. The first three columns represent the computed translation. The next three the rotation axis and the last the rotation angle. a) Result for the example in Figure 1 b) Figure 2 c) Figure 3 d) Figure 4
(c)
References
(d)
(e)
[1] R. Bergevin, D. Laurendeau, and D. Poussart. Registering range view of multipart objects. IEEE Trans. Pattern Analysis and Machine Intell., 61(1):1–16, Jan. 1995. [2] P. Besl. The free-form surface matching problem. In H. Freeman, editor, Machine Vision for Three-Dimensional Scenes, pages 25–69. Academic, New York, 1990. [3] P. Besl and N. McKay. A method for registration of 3-d shapes. IEEE Trans. Pattern Analysis and Machine Intell., 14(2):239–256, 1992. [4] K. Brunnstr¨om and A. J. Stoddart. Genetic algorithms for free-form surface matching. Technical Report ISRN KTH/NA/P--95/19--SE, Dept. of Numerical Analysis and Computing Science, KTH (Royal Institute of Technology), Oct. 1995. [5] A. Cross and E. Hancock. Genetic search for structural matching. In R. Cipolla, editor, Proc. 4th European Conference on Computer Vision, volume 1064 of Lecture Notes in Computer Science, pages 514–525, Cambridge, United Kingdom, Apr. 1996. Springer Verlag, New York. [6] D. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, Mass., 1988. [7] J.-J. Jacq and C. Roux. Registration of 3-d images by genetic optimization. Pattern Recognition, 16(8):823–856, 1995. [8] K. Kanatani. Analysis of 3-d rotation fitting. IEEE Trans. Pattern Analysis and Machine Intell., 16(5):543–549, 1994. [9] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, Inc., Singapore, 1984. [10] A. J. Stoddart, J. Illingworth, and T. Windeatt. Optimal parameter selection for derivative estimation in range images. Image and Vision Computing, 13(8):629–635, 1995. [11] G. Turk and M. Levoy. Zippered polygon meshes from range images. In Proc. SIGGRAPH ’94, In Computer Graphics Proceedings, Annual Conference Series, pages 311–318, Orlando, Florida, 1994.
(f)
Figure 4. High precision 3D data of a cow produced by Viewpoint Animation Engineering. The scene and model are cut out pieces from the front leg region of the cow. Note the near cylindrical shape of most of the data.
4. Discussion We have in this paper presented a simple approach, with promising results, based on genetic algorithm, for finding the initial guess for the free-form matching problem. The basic steps of the algorithm are: 1) Randomly sampling the object and model surface. 2) Search for a solution with a genetic algorithm. 3) Calculating the transformation from the matched points. Our goal is to find a pose in the close vicinity of the global optimum. Usually the fitness function has many local optima. We use a stochastic search technique that can in principle locate the global optimum to any desired probability. In this sense it resembles Simulated Annealing. Naturally there is a tradeoff between the success rate and the resources allocated. On the datasets we studied we obtain good results in 2-10 minutes on a workstation. The results fall short of the performance that would be desired in practical applications such as multi view fusion of range images. However we believe that there is considerable scope for refinement of the algorithm. Recent advances 5