Multi-Resolution Genetic Algorithm and Its ... - Semantic Scholar

3 downloads 0 Views 96KB Size Report
vusE s s. ,. , min arg. ˆ λ. (1) where s is the solution of the problem, E the error function, which measures the accuracy of the solution, R the roughness function ...
Multi-resolution Genetic Algorithm and Its Application in Motion Estimation Minglun Gong and Yee-Hong Yang Department of Computing Science, University of Alberta, Edmonton AB, Canada Abstract Many computer vision problems can be formulated into optimization problems. In this paper, we propose a multiresolution genetic algorithm, which can be applied to solving many of these problems. The motion estimation problem is used as a vehicle for demonstration. A matching-based estimator is proposed through setting up a proper fitness evaluation function and letting the multiresolution genetic algorithm do the global searching. The results show that the proposed estimator is robust and can produce consistent velocities for real image sequences. Keywords: Multi-resolution genetic algorithm, Motion estimation, Optical flow.

1 Introduction

2 Multi-resolution Genetic Algorithm

Many problems in the computer vision area are ill-posed problems, i.e. there is insufficient information available to solve the problem. Regularization is one method used to add additional constraints so that a solution can be determined. Among many proposed constraints, the smoothness one [6] is widely used. Using this constraint, the solving of the problem can be formulated into an optimization problem as follows: sˆ = arg min s

Here, the motivation is to generalize and improve our approach [4,5] into a multi-resolution genetic algorithm, MGA for short. We believe that the MGA can be applied to many optimization problems in the computer vision area. In this paper, the problem of motion estimation is used as a vehicle to demonstrate this idea. We illustrate that by formulating the motion estimation problem into an optimization problem and by using the MGA to do global searching, smooth and detailed velocity fields are produced. The organization of this paper is as follows. In the next section, the MGA is discussed. Its application in motion estimation and the experimental results are presented in section 3. The paper concludes in Section 4.

∫∫ (E (s(u, v )) + λ ⋅ R(s(u, v ))) ⋅ du ⋅ dv

(1)

where s is the solution of the problem, E the error function, which measures the accuracy of the solution, R the roughness function, which measures the interactions between the neighbors in the solution. λ is the parameter that allows one to tradeoff between the relative importance of the error term and the roughness term. Various approaches, such as simulated annealing and neural network, have been used to solve the above optimization problem. However, being a successful global optimization technique, genetic algorithms (GAs) [3] are rarely used. This is partly due to that GAs encode the solutions using 1D strings, while solutions of these imagerelated problems have inherent 2D structures. Therefore, if we simply use pixels as genes directly, it is unlikely to get good results since the straightforward encoding method does not preserve the 2D structure of the solutions. One solution to above problem is to combine the GA with the quadtree structure [4,5]. The 2D structure can then be preserved through encoding the quadtrees that represent the solutions. The results show that good segmentations or disparity maps can be produced through minimizing energy functions, which have similar formats as Eq. 1.

Although the technique is based on work described in our previous paper, for completeness, the generalized MGA is briefly discussed below with the new improvements highlighted. In the following subsections, detailed issues, such as the encoding mechanism and the crossover and mutation operators, are discussed first. An outline of the optimization process is given latter.

2.1

The Encoding Scheme for MGA

In the MGA, the image-related solutions are encoded through encoding the corresponding quadtrees. The corresponding quadtree of a given solution is generated by: (1) select all the nodes in the lowest level of the quadtree as leafs and assign the leafs the values that the solution has in corresponding locations; (2) if a leaf and its siblings have the same value, they will be removed and their parent will be selected as a leaf and assigned the value. Now we need to find a way to encode all the possible quadtrees. Differ from our previous approach, two 1D arrays are used. Array L stores all leafs and their assigned values in arbitrary order. Array N stores all the nodes using the array representation of a complete quadtree. In any given position of array N, if the corresponding node is selected as a leaf, then the index of the leaf in array L will be stored, otherwise, the value ‘–1’ will be used. This data structure not only enables us to find the parent or children of a node efficiently using array N, but also allows us to randomly pick a leaf quickly using array L.

2.2

Fitness Evaluation

The fitness of a given candidate controls the evolution process. The fitter the candidate, the greater is its probability to survive from one generation to the next. For

1051-4651/02 $17.00 (c) 2002 IEEE

ill-posed problems, Eq. 1 can be used as the fitness function, with the error function and the roughness function defined according to the application. Generally speaking, the error function should evaluate the color error or energy of a solution. The roughness function can be defined based on the first-order or second-order derivative of the solution, and therefore, penalizes solutions that have high gradients or curvatures. Since Eq. 1 is defined using an integration, if the solution is locally changed, the fitness value can be updated efficiently.

2.3

Initial Population Generation

Before we initialize the population, the suggested values for all the nodes in the quadtree are calculated through minimizing the error function locally for the nodes. Since nodes at different levels cover different numbers of pixels, minimizing the error function for these nodes may produce different suggested values for the same location. The equation can be defined as: S [k ] = arg min s

∫∫ E (s(u, v ))⋅ du ⋅ dv

(2)

k

The initial candidates of the MGA can then be generated using a recursive procedure. Inside the procedure, whether or not the given parameter, the node k, is selected as a leaf depends on a random number. If the decision is to select, then S[k] will be assigned to node k. Otherwise, the procedure invokes itself recursively using the four child nodes of k until the bottom level is reached.

2.4

Crossover Operator

After encoding the solution using the corresponding quadtree, conventional crossover operators, such as the two-point crossover, cannot guarantee that the offspring produced will still be legal quadtrees. The graft crossover operator is proposed to address this problem [4]. Basically, this operator searches for two branches, one for each parent quadtree, which cover the same rectangle area. All leafs on these two branches are swapped, which gives the effect that the solutions within the rectangle area are swapped. By construction, this algorithm guarantees that the results of crossover will still be legal quadtrees. After crossover, one of the offspring may be fitter than either of its parents.

2.5

Mutation Operator

Similar to the case of crossover operator, the conventional mutation operations cannot guarantee that the mutation result is a legal quadtree either. However, based on the quadtree structure, we can randomly select a leaf and try to mutate it, providing such a change can improve the fitness of the candidate. Three mutation operators can be used: (1) split the leaf to four children and use their own suggested values; (2) merge the leaf with its siblings and use the suggested value for their parent; (3) alter the value assigned to the leaf to one of its neighbors’ values. Obviously, these operations will preserve the quadtree

structure. It is worth noting that because of the alteration operator, the final solution at the given location is not limited to the suggested values calculated initially under different window sizes. In our previous approach, a leaf is selected if it covers a randomly selected pixel. Therefore, leafs that cover larger areas have higher probabilities to be mutated than those in the lower levels. Our experiments show that coarse results are likely to be given since leafs at the pixel level have very few chances to be optimized. In our MGA approach, this problem is addressed by giving different leafs equal probability of mutation. The experiments show that much detailed results can be generated. In addition, we can also assign the probability of mutating a leaf according to the leaf’s local fitness value. Therefore, we can concentrate on optimizing leafs that have relatively high energies. This technique will produce a faster convergence rate. However, it may sometimes cause the program to terminate pre-maturely.

2.6

Optimization Process

After the above issues are addressed, the MGA can be implemented as an iterative procedure. When the population evolves from one generation to the next, two strings are picked randomly each time to do the crossover and mutation until all the strings in the population are processed. The elitist strategy [3] is applied, which guarantees that the best candidate in the population will not get any worse during the process of evolution. The above process is repeated until one of the two termination conditions is satisfied: (1) the fitness difference between the best and the worst candidates in the population is smaller than 0.01 percent of the average fitness value; (2) the best candidate of the population is unchanged during the last 20 generations.

3

Application in Motion Estimation

In this section we will apply the MGA in solving the motion estimation problem. First of all, we will briefly introduce the previous works in this area and give our motivations. How to apply the MGA and the experimental results are discussed latter in the following subsections. Previous works in motion estimation can be classified into four categories: gradient-base, matching-based, energy-based, and phase-based approaches. Even though the gradient-based approach is a more active research topic in recent years [8,9], the matching-based approach does have it own advantages: (1) it can be easily adapted to color video sequence; (2) it does not involve spatiotemporal derivative estimation, which is sensitive to noise and aliasing effect. The problem of existing matching-based approaches [1,7] is that they sometimes mistakenly choose the local minimum as the solution [2]. Therefore, our motivation is to solve this problem using the MGA since the GA is an effective global search

1051-4651/02 $17.00 (c) 2002 IEEE

technique. The performance study of Barron et al. [2] also shows that many estimators fail to give consistent velocities for real image sequences. Many approaches are not robust enough and give poor vector field for the NASA and SRI trees sequence. For the TAXI sequence, some approaches can give reasonable estimations along the boundaries of the vehicles; however, no velocities or inconsistent velocities are given for the interiors of the vehicles. Hence, solving these problems is also our motivation.

3.1

MGA-based Solution Searching

In order to apply the MGA, we need to set up an optimization objective. Being a matching-based technique, the corresponding error function is defined based on the Euclidean distance of color vectors, and the roughness function is defined based on the first order derivative of the velocity: E (t (u, v )) =

3. The MGA discussed in section 2 can then be applied to do the global searching. In practice, to accelerate the optimization process, we pre-calculate the matching errors, E, for different pixels under different velocities and store them in a 4D array.

3.2

Experimental Results

The TAXI, NASA, and SRI trees sequences are used to test our algorithm. Since the subsampled vector fields can hardly show the details, the velocities are also shown using grayscale images. Intensity of 128 in the grayscale image indicates zero velocity. Pixel with intensity higher than 128 denotes motion to the right or bottom, with brighter pixel for higher speed. Pixel with intensity lower than 128 denotes motion to the left or top, with darker pixel for higher speed.

  I 0 (u + i , v + j ) −   ∑ ∑ h   h = − f − w≤i , j ≤ w I (u + i + ht x (u , v ), v + j + ht y (u, v ))   f

R (t(u, v )) =

∂t ∂x

+ (u ,v )

∂t ∂y

.

(3)

(u , v )

where, t(u,v) is the velocity at location (u,v), w the window radius in the space domain, f half of the window size in the time domain, Ih the image at frame h. Substitute Eq. 3 into Eq. 1 gives us the optimization objective function. In this paper, we set w to 1, and f to 2, resulting a neighborhood size of 3×3×5. To calculate the velocity at subpixel level, we allow t to take on floating point values. When the input is a floating point value, function I(x) gives the color by linear interpolation. The coarse-to-fine strategy based on the Laplacian pyramid is widely used in previous matching-based approaches [1,7], which can effectively reduce the computation needed in the matching process. However, Barron et al. [2] point out that if a local minimum is picked at the coarse level, the coarse-to-fine strategy will propagate the error systematically to finer levels. We try to avoid this problem by doing all the matching in the finest levels. To define “the finest level” for velocity, we quantize the continuous velocity according to the accuracy that is specified by the user. Although quantizing the velocities limits the accuracy of the velocities produced, it increases the robustness of the algorithm. In addition, we believe that expecting a high degree of subpixel accuracy to be generated using the intensity information is unrealistic since the pixel intensities are also quantized and likely inherit errors caused by the imaging process. After quantizing the velocities, the motion detection problem is converted to a discrete optimization problem. That is, we need to find a combination of velocities for different pixels, which can minimize the fitness function that is defined by Eq. 1 and

(a) Central frame

(b) Flow field

(c) x velocities (d) y velocities Figure 1: Results for the TAXI sequence.

(a) Central frame (b) Flow field Figure 2: Result for the NASA sequence.

Fig. 1 shows the results for the TAXI sequence, which is generated by setting the maximum velocity to 3 pixels and the resolution of the quantization to 1/4 pixel. The results show that the movements of three vehicles and a pedestrian are detected. In Fig. 1(c) and (d), the shape of the moving Taxi is nicely depicted. We can also tell that the rear of the Taxi has higher x velocity than the front part does, which is consistent with the fact that the Taxi is

1051-4651/02 $17.00 (c) 2002 IEEE

making the turn. For the car in the lower left corner, the corresponding region in the velocity field is wider than the actual size of the car. This is caused by the shadow on the road, which moves with the car. Since the van in the lower right is partially covered by a tree, its shape is not well defined. Fig. 2 shows the result for the NASA sequence, which is generated by setting the maximum velocity to 1 pixel and the resolution of the quantization to 1/8 pixel. Compared with approaches surveyed by Barron et al. [2], the result shows that our approach is more robust and can give consistent velocity estimations.

(b) x velocities (a) Flow field Figure 3: Results for the SRI trees sequence.

Fig. 3 shows the results for the SRI trees sequence, which is generated by setting the maximum velocity to 2 pixels and the resolution of the quantization to 1/4 pixel. Since the camera moves mainly horizontally, very low velocities are detected in the y direction. In Fig. 3(b), the shapes of both the foreground tree and the background objects are clearly depicted.

4 Conclusions In this paper, we propose a generalized multi-resolution genetic algorithm, which we believe can be applied to solving many image-related optimization problems, such as image segmentation, stereovision, and motion estimation. Comparing to our previous approaches [4,5], several improvements are also made, which help to generate more detailed results. We use the motion estimation problem to demonstrate the MGA. A matching-based motion estimator is proposed, which optimizes both the color distance and the smoothness. The local minimum problem of previous matching-based approaches is avoided through matching in the finest level and using MGA to do global searching. Since we quantize the velocities according to the accuracy requirement, our approach is more robust than those surveyed by Barron et al. [2]. Although no example is given in this paper, using our approach to process color image sequences is straightforward. Furthermore, our approach can give consistent velocity in area that does not contain enough intensity variation. This is because in the MGA, the interactions between neighbors are modeled using the roughness function and

the alteration mutation is applied, and hence, the velocities found at the boundary of the area propagate and influence the pixels of the interior. As a result, the velocity fields generated can clearly show the number of moving objects in the scene and their overall shapes. The multi-resolution scheme used also plays an important role since selecting nodes at different levels gives a similar effect as adjusting the window size at different locations of the solution. Normally matching using a larger window is less sensitive to noise, and matching using a smaller one preserves the details better. The computational costs of GAs are relatively high. However, two features of the MGA make it more efficient than other genetic-based algorithms. First, since the multiresolution scheme is introduced, the optimization is conducted on leafs rather than on pixels and the number of leafs are normally far less than that of pixels. Secondly, since the crossover and mutation operators for the MGA only change the candidate locally, we can efficiently reevaluate the fitness of the candidate. On our 1.5GHz Pentium 4 computer, our experiments show that the MGA can evolve for 500~800 generations, which are normally needed to converge, in 1~1.5 minutes. However, about 5 minutes are needed to initialize all the matching errors. How to accelerate matching errors calculation requires future investigation. In summary, the proposed MGA is a useful imagerelated optimization approach. Because of the encouraging results presented in this paper, future research in this direction is much warranted. References [1] P. Anandan, "A Computational Framework and an Algorithm for the Measurement of Visual Motion," Int. J. of Comp. Vision, Vol. 2, pp. 283-310, 1989. [2] J. L. Barron, D. J. Fleet, & S. S. Beauchemin, "Performance of Optical Flow Techniques," Int. J. of Comp. Vision, Vol. 12, No. 1, pp. 43-77, 1994. [3] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989. [4] M. Gong & Y.-H. Yang, "Genetic-Based Multiresolution Color Image Segmentation," Vision Interface, pp. 71-80, Ottawa, Canada, Jun. 7-9, 2001. [5] M. Gong & Y.-H. Yang, "Genetic-Based Stereo Algorithm and Disparity Map Evaluation," Int. J. of Comp. Vision, To Appear, 2002 [6] B. K. P. Horn & B. G. Schunck, "Determining Optical Flow," Artificial Intelligence, Vol. 17, No. 1-3, pp. 185-203, 1981. [7] A. Singh, "An Estimation-Theoretic Framework for ImageFlow Computation," ICCV, pp. 168-177, Osaka, Japan, 1990. [8] C.-J. Tsai, N. P. Galatsanos, & A. K. Katsaggelos, "Optical Flow Estimation from Noisy Data Using Differential Techniques," ICASSP, pp. 3393-3396, Phoenix, USA, 1999. [9] M. Ye & R. M. Haralick, "Optical Flow from a LeastTrimmed Squares Based Adaptive Approach," ICPR Vol. 3, pp. 1052-1055, Barcelona, Spain, Sept. 3-8, 2000.

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents