Wavelet-Based Multiresolution Stereo Vision - Semantic Scholar

Wavelet-based Multiresolution Stereo Vision

Guy Caspary [email protected]

Yehoshua Y. Zeevi [email protected]

Technion - Israel Institute of Technology Haifa 32000, Israel

Abstract An efficient wavelet-based multiresolution approach to the stereo vision problem is presented. A cost function is defined and iteratively minimized. The minimization is performed on the images’ representation in the wavelet space. We employ the theory of representation of operators in spaces spanned by scaling functions and thereby take advantage of a simplified approximation of differentiation. Examples illustrate the advantages afforded by the application of our algorithm over correlation-based methods.

Columbia University New York, NY 10027

point in the other image only because of geometric distortion. This implies that any disparity between the two images can be modelled, and used together with the vision system geometry, in reconstruction of the 3D structure of the scene. Some of the difficulties encountered in consideration of the stereo vision problem are:

Smooth or textureless regions are locally ambiguous. For good 3D scene reconstruction the calculated disparity map should have subpixel resolution, and that is seldom available in the original data.

Depth discontinuities that are typical of 3D data are difficult to compute.

1. Introduction

The problem of reconstructing the Three-dimensional (3D) structure of a scene from two image projections is known as the stereo vision problem. Both vision and pattern recognition systems can benefit from the application of efficient algorithms in stereo vision. Examples of such applications are: Automated cartography , Face recognition and Navigation of autonomous vehicles and aircrafts. In stereo vision several assumptions are made regarding the acquisition of the two projected images. It is assumed that the visual system geometry is completely known; in particular the position and orientation of the cameras used in the acquisition of the images. It is also assumed that both images were taken simultaneously and from identical (to a good approximation) cameras. Reconstructing the 3D structure of a scene is equivalent to finding the disparity map of the two images. The disparity map is a vector field mapping of one image onto the other. When the relative position of the cameras is known, the disparity map can be used to calculate the 3D structure by means of simple triangulation. Ignoring acquisition and processing noise, and the limitations imposed by occlusions, we can say that each point in one of the observed images differs from the corresponding

Real-life visual scenarios often exhibit occluded areas.

Natural images contain detail information in several resolutions. Looking at an image of an urban area, for example, we may first see from a distance just houses and roads as coarse image description. Looking into the image more closely we see windows and trees. At an even finer scale, we find fixtures like door handles and knobs. For this reason, robustness and accuracy of computer vision systems in general, and stereo vision algorithms in particular, must employ multiresolution methods. Another important advantage of multiresolution analysis is the reduction of computational complexity. At each scale the results of the previous scales are used as the initial estimation. The rest of this paper is organized as follows. We start with a mathematical definition of the problem, wherein we present the problem in the context of minimization of a cost function. The cost function is minimized iteratively, using differential operators in the space spanned by a wavelet basis. Since the presented algorithm is wavelet-based, it is reaqdily integrated into a pyramidal multiresolution data structure. We conclude with examples, illustrating the advantages afforded by the proposed multiresolution algorithm.

1051-4651/02 $17.00 (c) 2002 IEEE

2. Stereo vision

In our case, the connection coefficients for 2D wavelets are defined as

Y (f( dY c dW 0 e? R

J! CgW C.h3i N < <

2.1. Problem definition The visual scene is completely represented by a depth function and an intensity function . It is assumed that the scene is composed of one object and that all surfaces are Lambertian. The definition of the two-dimensional (2D) stereo vision problem is as follows: Find the disparity field describing the disparities in the and directions, denoted by and , respectively. We require that and satisfy the following condition:

"!#$%

,

(1)

where & and &$ are the left and right image intensities, respectively. We define the following cost function:

'

((*)

+, . - $ / 10 32546 07 5 08 7 0 8:0 9;=< < '?> ' > ' ' ' (2) ! 0 00 0@ 0 A , where and are functions of , 7 , 8 , 7 and 8 are the derivatives of and in the and directions, ' > respectively, and 2 is a weighting factor. The term provides a measure of the squared gray level error. The rest ' > ' of the terms ( 0 to 0 A ) are measures of departure from smoothness of the and disparities in the and direc!

tions.

The theory of differential operators in the subspace spanned by the scaling function is presented in [1]. It can be shown that, for a function B approximated in the space spanned by a scaling function CDE with F vanishing moments, one can write

BGHI J!

N:O

0/MK3LE0

L0KQP=0 where B G is the derivative of B

B+RSTI VUXN W and U N W[Z

>1Y ,

(3)

Y are the connection

coefficients given by

Y

U N W\Z !

(

P ] ^ C _-`R

< Z D C E < > M L M L (6) J!qp p th3i N C.hi N , h O=r > N:Osr > J!qp M L p M L h3i N C.h3i N , (7) h O=r N:Osr where h3i N and h3i N are the weighting coefficients of and , respectively. Let uv d i w and u d i w represent the updating differential steps of d i w and d i w , respectively. Plugging these values into the cost function, yields the updated cost function [2]:

'

2.2. Differential operators

,

'?> ' > ' ' ' xu xu 0 xu 00 yu 0@ yu ' ! z:uv d i w&{ 7 Tz:u d i w|{ 8 xuv d0 i w { 7}7 T Tz:uv d i w|u d i w { 7|8 xu d 0 i w { 88 c Tz~2=uv d i w| y2=uv d0 i w Tz~2=u d i w| 2=u

Gt!

'

! 7 c 8W 0 o H , (( C d iw {3 !

where Y

{3 Z !

((

8 , ! 7 8

,

c !

0 A d0 i w c

c 7W Y 0 /H v

Y W - $ < < IDa,} E

Y Y C d0 i w W WZ < < E I a`| E

,

,

Y W ! +, s IDa`| E I > M L c Y th3i N W 0 j-`e? -`R

IDa`& E |!p h3i N:Osr

1051-4651/02 $17.00 (c) 2002 IEEE

, (8)

,

and

>

calculated disparity after iteration 25 4

M L c Y h3i N W 0 j-`e? -`R

IDa?& E . p h3i N:O=r ' To maximize u to be maximized in each iteration of

3 2

!

1 0 -1

' u !#o vu d i w

' u d !# u iw

j/ +!qm[m -T

2 { 7}7 y { 7}8

c

{ 7}8 c 2 { 88 y

vu d i w ! u d iw

200

250

300

350

400

(a) true and calculated disparities

4 0 2 -0.1

.

0 -0.2

(9)

2 { 7 y 2 { 8 y

(10) whose solution is obtained in a straightforward manner. We use the set of iterative equations, obtained by solving the above matrix equation, in our 2D algorithm.

50

100

50

100

150

200

250

300

350

400

200

250

300

350

400

-2 -4

150

0.1

The above set of equations can be reformulated in the following matrix form:

100

error after iteration 25. MSE=1.3846e-007

the optimization process of the algorithm, we require that

50

0.2

150

Error using correlation method. MSE=0.00031551 0.2

(b)

0.1 true and calculated disparities 4 0

,

2 -0.1 0 -0.2

50

100

50

100

150

200

250

300

350

400

200

250

300

350

400

-2 -4

150

Error using correlation method. MSE=7.4058e-005 0.2

(c)

0.1 0 -0.1

2.4. Multiresolution algorithm

Figure 1. (a) True (synthetic) and calculated disparity, obtained after 25 iterations are superimposed. (b) True (synthetic) and calculated disparity obtained by using a correlation method. (c) Same as in (b), but the results of the correlation method are interpolated. -0.2

We have implemented a multiresolution stereo vision algorithm employing the iterative set of equations presented in the previous section. Since our set of iterative equations is based on the wavelet decomposition, it easily and naturally extends into a multiresolution algorithm. The most prominent use of multiresolution analysis in computer vision is its implementation in a coarse-to-fine strategy. The main advantages of this method, which can be extended to many other fields, are reduction of the computational complexity and the risk of ’falling’ into local minima. The coarse-to-fine strategy is implemented as follows. First, a pyramidal representation of each image is created. This is done by creating approximations of the images at several scales. In our algorithm we used the wavelet pyramid, which is naturally incorporated into our equations. Pyramidal structures utilize the principle that structures at low resolutions can be used to measure displacements over a large range with low accuracy. Structures at higher resolutions can be used to measure displacements over short ranges with higher accuracy. Beginning at the coarsest level, an initial estimation is made. The algorithm iterates through each successive level, where the results of the previous level are used as an initial estimation for the algorithm applied at the current level. The same algorithm is used at each level.

3. Examples Two examples of the application of our algorithm are presented. In the first example, we apply the algorithm to 1D data and compare the results with those obtained by the

50

100

150

200

250

300

350

400

application of conventional correlation-based methods. In the second example we reconstruct the 3D structure of a face. In our first example the input left-side signal is an arbitrary line from a test image. A synthetic disparity function is used to generate the right-side signal. This process produces left and right signals with a known disparity map between them. Computed results obtained from 1D data are shown in Fig. 1. The disparity obtained after twenty five iterations by the application of the proposed algorithm is depicted in Fig. 1 (a). Already after so few iterations there is a very good approximation of the true disparity function. The calculated disparity differs from the true synthetic disparity mainly at points of discontinuity in the disparity. This is a result of the smoothness term incorporated into the cost function and weighed by the factor 2m It is also affected by the choice of the wavelet scaling function. To compare the results of our algorithm with those obtained by the application of a correlation-based method, we employed normalized correlation with a feature size of nine samples and a search area of seven samples. Matching errors were discarded using a five sample median filter. The calculated disparity in this case is shown in Fig. 1 (b). Apart from a larger error we notice two main problems in the

1051-4651/02 $17.00 (c) 2002 IEEE

images is extremely far behind the face. We know this because the disparity of the background is very different from that of the face. The background disparity is of the order of 30 samples which is more than our algorithm can handle. Also, the large depth of the background results in occluded areas. Our geometric model assumes that the scene doesn’t contain occluded surfaces. This assumption is obviously violated in the background of the images.

4. Conclusion

Figure 2. A stereo pair of images processed by the proposed algorithm (a) Left-side image (b) Right-side image (c) and (d) Reconstruction of the face.

correlation-based method which do not exist in our method. First, it is quite difficult to identify false matches. For example, the correlation method completely failed around the center discontinuity at five consecutive samples; A false match of this magnitude is difficult to identify and resolve. Second, the correlation-based method has, inherently, an integer sample resolution. This means that only disparities which are an integer number of samples may be identified correctly. Our method calculates the disparity with subsample accuracy. To improve the subsample resolution of the correlation method, we performed cubic interpolation around the correlation maxima. This improved the correlation based match as shown in Fig. 1 (c). However, even after the maxima interpolation, the results of our method are much better. Specifically, the false matches of the correlation-based method haven’t been resolved. We also note that a Matlab implementation of our algorithm is almost twice as fast as that of the correlation-based method. The second example is that of face reconstruction. We applied our stereo vision algorithm to the two stereo images shown in Figs. 2 (a) and (b). 3D views computed at two different angles of the face are shown in Figs. 2 (c) and (d). The example of face reconstruction illustrates that facial features, such as the nose and the eye sockets, are identified correctly. The proportion of the depth of the shoulders and that of the face is also good. However, there is a fault in the identification of the background. The background in these

The proposed stereo vision algorithm employs differential operators in the wavelet domain and iteratively minimizes a defined cost function. By virtue of being a waveletbased algorithm, it lends itself to coarse-to-fine algorithm in the context of a multiresolution stereo vision. In our implementation of the wavelet-based algorithm, we have used Daubechies wavelets of order 3. Similarly to cases of other applications of wavelets, it is important to match the choice of the wavelets to the context of the problem and the structure of the signals (images). This is in particular important in our case, since the method depends heavily on differential operators realized in the wavelet domain. Indeed, for this reason we selected smooth wavelets with three vanishing moments. However, the tradeoff that exists between smoothness and size of the effective support requires further investigation of what is the optimal number of vanishing moments required in this case. Examples such as those shown for the cases of both onedimensional signals and images, illustrate the robustness and efficiency afforded by the proposed algorithm. Yet, the algorithm can be most likely further improved by implementation of more advanced minimization techniques other than the steepest descent, used in the present study. Also, a special processing algorithm should be incorporated for the background.

5. Acknowledgement This research has been supported by the Ollendorff Minerva Center. Yehoshua Zeevi is presently supported by the ONR MURI Program N000M-01-0625.

References [1] G. Beylkin. On the representation of operators in bases of compactly supported wavelets. Siam Journal of Numerical Analysis, 6:1716–1740, 1992. [2] G. Caspary. Multiresolution image matching. Master’s thesis, Technion - Israel Institue of Technology, 2001. [3] J. W. Hsieh, H. Y. M. Liao, M. T. Ko, and K. C. Fan. Waveletbased shape from shading. Graphical Models and Image Processing, 57:343–362, 1995.

1051-4651/02 $17.00 (c) 2002 IEEE