smoothness of the recovered stereo correspondence map or disparity map [4]. Motivation for ..... Venus and Map image pairs and their results are shown in Fig 2.
Refine Stereo Correspondence Using Bayesian Network and Dynamic Programming on a Color Based Minimal Span Tree Naveed I Rao1 , Huijun Di1 , and GuangYou Xu2 1
Pervasive Computing Lab, Institute of Human Computer Interaction Department of Computer Engineering, Tsinghua University, Beijing, China {naveed03, dhj98}@mails.tsinghua.edu.cn 2 {xgy-dcs}@mail.tsinghua.edu.cn
Abstract. Stereo correspondence is one of the basic and most important problems in computer vision. For better correspondence, we need to determine the occlusion. Recently dynamic programming on a minimal span tree (mst) structure is used to search for correspondence. We have extended this idea. First, mst is generated directly based on the color information in the image instead of converting the color image into a gray scale. Second, have treated this mst as a Bayesian Network. Novelty is attained by considering local variances of the disparity and intensity differences in the conditional Gaussians as unobserved random parameters. These parameters are iteratively inferenced by alternate estimation along the tree given a current disparity map. It is followed by dynamic programming estimation of the map given the current variance estimates thus reducing the overall occlusion. We evaluate our algorithm on the benchmark Middlebury database. The results are promising for modeling occlusion in early vision problems.
1
Introduction
Occlusion is one of the major challenges in stereo vision. In stereo, occlusion corresponds to a specific situation, that some points in the scene are visible to one camera but not the other due to the scene and camera geometries [12]. But in this work, we refer to the occlusion as if a point in left image (L) could not find its correspondence in the right image (R). Detection of these occluded pixel is ambiguous, so prior constraints need to be imposed e.g. ordering constraint [4] is exploited in dynamic programming framework [11] as it reduces the search space. These occluded pixels are excluded based on the threshold [14],[3] and the scene is assumed as free of occlusion. This naturally entails the piecewise smoothness of the recovered stereo correspondence map or disparity map [4]. Motivation for this work is to develop a stereo corresponding algorithm, which can model occlusion and later can reduce it with possible low computational cost. These goals are achieved by modeling occlusion using bayesian network and achieved low computation cost by utilizing dynamic programming (DP) on J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 610–619, 2006. c Springer-Verlag Berlin Heidelberg 2006
Refine Stereo Correspondence Using Bayesian Network
611
tree network. The bayesian methods (e.g.,[8],[9],[6], [1],[2]) globally model discontinuities and occlusion. These methods can be be classified into two categories [13] based on their computational model i.e. dynamic programming-based and MRFs-based. Keeping in view the scope of this work, we will cover the former category in detail. Geiger et al. [8] and Ishikawa, Geiger [9] derived an occlusion process and a disparity field from a matching process. The matching process is transformed to a path-finding problem by assuming order constraint and uniqueness constraint, where the global optimum is obtained by dynamic programming. Belhumeur [1] used a simplified relationship between disparity and occlusion to solve scan line matching by dynamic programming and by defining a set of priors from a simple scene to a complex scene. Contrary to above where a piecewise-smooth constraint is imposed, Cox et al.[6] and Bobick and Intille [2] did not require the smoothing prior. They assumed a normal distribution of corresponding features and a fixed cost for occlusion, and using only the occlusion constraint and ordering constraints, they proposed a dynamic programming solution. The work of Bobick and Intille focused on reducing the sensitivity to occlusion cost and the computation complexity of Cox’s method, by incorporating the ground control points constraint. These dynamic programming methods are employed with the assumption of same occlusion cost in each scan line. Ignoring the dependence between scan lines, results in the characteristic streaking in the disparity maps. State of art results can be achieved by 2D, but at the cost of time. In comparison, our approach is much simpler, but also much more efficient. To seek global optimum with linear degree of search, we have taken advantage of mini-mal span tree (mst) network. The contribution by this paper is to treat, first time in known literature, mst as a Bayesian Network. To deal with occlusion and for controlling smoothness in disparity space, two random parameters are introduced in the network. Instead of exact inference, posterior distribution is approximated by computing Helmholtz free energy using EM on each node of tree. These energies are minimized by using dynamic programming. In this way we have fused mst, bayesian network and dynamic programming into one method to find the optimal disparity in the image. The algorithm is tested on Middlebury stereo database and the results in comparison to state of art algorithms are promising. The paper is organized as follows: Section II explains details of general framework of the problem along extraction of mst based on the color information instead of gray scale values as been done by recent works [14],occlusion modeling, computation of free energies and minimization of energies using DP. Comparison with other algorithms and experimental results are shown in Section III. Conclusion forms the last section.
2
Occlusion Modeling
In this section, we will first introduce general framework of problem, later the concept of treating mst as a bayesian network is explained. Local variances of the
612
N.I. Rao, H. Di, and G. Xu
disparity and intensity differences in the conditional gaussians are introduced as unobserved random parameters. In order to approximate the posterior probability Helmholtz Free Energy is calculated at every node and is explained in next subsection. These parameters are iteratively inferenced by applying expectation maximization while the current disparity map is given. A new optimal disparity map is attained by minimizing energy through dynamic programming on tree and forms the last subsection. 2.1
General Framework
In this subsection, we have explained the general framework of our problem. The notations used are borrowed from [14]. Let G(V, E) be a grid connected graph with vertices V and edges E. All pixels of the left image form the vertices in V . For the edges E, every pixel p is connected with his 4-connected neighbors. We want to convert this into a tree graph G (V, E ) by choosing the most valueable edge E of each pixel. The definition of most valueable edges out of 4-connected edges, remained under investigation. We exploited the fact, that disparity discontinuities align with intensity discontinuities. It means that if the neighboring pixels i and j have similar intensity values i.e. Z(i) and Z(j), then they are more likely to have same disparity a priori. The gray intensity information provided by the left image is used to assign different weights to edges in G(V, E) [14]. The intensity information provided in shape of gray level is not sufficient to decide the strength of the edges in between pixels. The conversion of color pixel into gray scale is as mapping from many to one problem. In this way, the distance achieved between pixels is not a true distance. Instead in our case we have we have used the distance among the pixels based on the definition of color components. ⎡ ⎤ ⎡ ⎤ r1 r2 (1) d2 (v1 , v2 ) > 0 ⇔ ⎣g1 ⎦ − ⎣g2 ⎦ b1 b2 G(V, E) is converted into tree graph i.e. mst by using standard Kruskal’s Algorithm[5]. Since edge weights are integers in small range, the edge sorting can be performed in linear time, therefore the mst can be computed in basically linear time. Later, experimental results have proved that mst extracted using the color information, provides much better results. These results does not depend upon the specific data but represents that color images provide better smoothness as compared to the gray scale images. 2.2
Formulation of Problem in Bayesian Approach
Bayesian networks are used for modeling uncertainties in the parametric form. Fig.1 shows the Bayesian network (i.e. based on the mst network) used for the current problem. Node di is an optimal displacement for pixel i in L Image. τi,j is variance in disparity between child i and parent j nodes and controls the smoothness in disparity space. Zi represents the intensity of each pixel and σi is
Refine Stereo Correspondence Using Bayesian Network
613
Fig. 1. Occlusion modeling using Bayesian network: τ is local variance of the disparity and σ is variance of the intensity difference
variance between ZL (i) and ZR (i + d) respectively. In the network di , j, τi,j , σi are hidden and Zi is visible node. Based on the model, joint distribution is given by the following product of distributions model, p(σi , τi,j , dj , Zi ) =
n
n p(σi ) . p(τi,j )p(droot ) .
i=1
n−1
i=1
n p(di |di , τi,j ) . p(zi |di , σi )
(2)
i=1
(i,j)∈E
Where n are total numbers of nodes in the tree and droot represents the root node (since root does not have parent so for ease of understanding it is expressed as a separate term). Exact inference requires the computation of posterior distribution, over all hidden variables given the visible, which is often intractable. So we turn to approximation methods which estimate a simple distribution that is close to the correct posterior distribution. 2.3
Computation of Helmholtz Free Energies
The idea is to approximate the true posterior distribution p(h|v) by a simpler distribution Q(h) while h, v represents hidden and visible nodes respectively. A natural choice for a measure of similarity between the two distributions is the relative entropy (a.k.a. Kullback-Leibler divergence), which can be formulated as [7] Q(h) log Q(h) −
F (Q, P ) = h
Q(h) log p(h, v)
(3)
h
Where F (Q, P ) is Helmholtz free energy or the Gibbs free energy, or just the free energy between Q and P . Intuitively , minimum of the energy can be achieved only once p(h, v) will have same value as Q(h). For ease in display and
614
N.I. Rao, H. Di, and G. Xu
understanding above expression is replicated in two parts. F (Q, P ) = T1 − T2 . Point inference searching techniques looks for a single configuration happrox of the hidden variable. By befitting a problem as mentioned above, transforms it to minimizing cost problem. We have also adopted the same way. σ and τ are estimated by point estimation and the Q-Distribution for the entire model is: n n δ(σi − σ ˆi ) . δ(τi,j − τˆi,j ) (4) Q(h) = i=1
i=1
δ is a dirac delta function and for distribution Q(h) is an infinite spike of the ˆ For detail properties of δ function please see [7]. By replacing Eq.4 density at h. in T 1 of Eq.3 and after rearranging τ ). Q(droot |ˆ τ ). log Q(droot |ˆ τ) + Q(di1 |droot , τˆ). T1 = Q(ˆ τ
droot
log Q(di1 |droot , τˆ) + [... +
ih ∈Cdin−1
ih ∈Cdin−1
di1
Q(din |dn−1 , τˆ). din
+ Q(ˆ σi ) log Q(ˆ σi ) + Q(ˆ τi,j ) log Q(ˆ τi,j ) log Q(din |dn−1 , τˆ)]... σi
τ
(5) The last two terms are the entropy of the delta-functions i.e. Hδ , and is constant w.r.t. the optimization. Since we intend to use tree structure, and want to apply recursive programming so by rearranging T 1 in recursive fashion.
FQQ (dk , di ) (6) Q(di |dj , τˆi,j ). log Q(di |dj , τˆi,j ) + FQ,Q (di , dj ) = di
k∈C
Where, C is set of child nodes. For every node i, Q is a matrix of order lxl, for all possible values of di and dj where l is total search area for every pixel. For ease of understanding, Eq.2 is mainly divided into two parenthesis i.e. p1 and p2. By going through the same procedure as above, for T 2 in Eq. 3, below are two outcomes of equations p1 and p2 from Eq.2 respectively.
FQP Q (dk , di ) + Q(di |dj , τˆi,j ) + FQP Q (di , dj ) =
di
k∈C
Q(ˆ σi ) log Q(ˆ σi ) + σi
(7)
Q(ˆ τi,j ) log Q(ˆ τi,j ) τ
FQP Z (di , dj ) = FQP Z (dk , di ) Q(di |dj , τˆi,j ). log p(Zi |dj , τˆi,j ) + di
(8)
k∈C
The generalized recursive expression for the total free energy can be written as
F (Q, P ) = FQQ (droot , d−1 ) − FQP Q (droot , d−1 ) − FQP Z (droot , d−1 ) + Hδ (9)
Refine Stereo Correspondence Using Bayesian Network
615
d−1 is a dummy node i.e. root node of the root and is shown for simplicity in expression. All terms related with the entropy of the delta function are ignored as they do not take part in optimization. The total free energy can be viewed as the summation of energies at all nodes over the tree. 2.4
Expectation Maximization
Various inference approximation techniques may be applied to compute the approximate value for the Q function. To avoid local minimum problem, we have preferred EM on ICM. By considering this distribution as Gaussian, 1 |di − dj |2 log p(di |dj , τˆi,j ) = − log 2π − log τˆi,j − 2 2 2ˆ τi,j
(10)
1 |ZR (i) − ZL (i + di )|2 log p(zi |dj , σ ˆi − ˆi ) = − log 2π − log σ 2 2ˆ σi2
(11)
Free energy is a lower bound on the posterior distribution which is same as minimizing Q(h). In order to minimize energy of Q function, replace values of equation Eq.10),11 in Eq.9 and by equalizing the derivative to zero, ∂F (di , dj ) =0 ∂Q(di = a|dj = b, τˆi,j ) Q(di |dj , τˆi,j ) ∝ exp
|di −dj |2 2ˆ τ2 i,j
−
(12)
|ZR (i)−ZL (i+di )|2 2ˆ σ2 i
(13)
The restriction placed on the Q function is di Q(di |dj , τˆi,j ) = 1 (all the constant terms which do not participate in the optimization are not taken into consideration). Uncertainties calculated from Q will decide the significance of the pixel. Intuitively the efficiency of σi and τi,j can be visualized as competing parameters. For any occluded pixel and its matched pixel, their difference of intensity is a high value. Their variance is also high resulting the weight of the pixel as a low value. Likewise, if the difference of disparities of i and j is high, its influence will be controlled by τi,j . To minimize σi and τi,j , take derivative of the free energy F (Q, P ) and place it equal to zero. Minimum values for the parameters can be found by taking derivative of free energy with respect to them and equating it with zero. 2 τˆi,j = Q(dj ) Q(di |dj , τˆi,j )|di − dj |2 dj
σ ˆi2
di
(14)
Q(di )|Zt (i) − Zt−1 (i + di )|
2
= di
While the constraint placed on Q(di ) is defined as: ˆi ) . Q(di ) = dj Q(dj )Q(di |dj , τˆi, j, σ
616
2.5
N.I. Rao, H. Di, and G. Xu
Dynamic Programming on Tree
Energy for Q function at node di can be expressed as: EQ(di |dj , τˆi,j ) = Q(di |dj , τˆi,j ) Q(dk |dj , τˆk,j )
(15)
k∈C
Minimum energy can be achieved by minimizing log of above expression. Optimal disparity assignment for node can be determined by using:
EQ(dk |dj , τˆi,j ) (16) EQ(di |dj , τˆi,j ) = arg min s(di , dj ) + m(di ) di ∈D
k∈Ci
Where s(di, dj) is the disparity mismatch and m(di) is the matching penalty for assigning disparity di to pixel i. Eq. 16 is a standard expression for the optimization [14] problem and can find the optimal places for the minimum energies at each node. Total computation cost for these terms is O(l2 n) each where n is number of nodes and l is max possible disparity vector. Including free energy, total computation cost is (2m + 1)O(l2 n) where m is number of iteration. While implementing this algorithm, the hardest problem is memory consumption. Finding disparity value for each pixel for l different places is hard on memory.
Ground Truth
Our Result
Results[2]
Results[14]
Results[15]
Fig. 2. Comparison Results: Occlusion results for ”tsukuba”
3
Experiment Results and Discussion
To check the performance in modeling and detection of occlusion, we test our results against Middlebury test bed [11] dataset. We tested Tsukuba, Sawtooth, Venus and Map image pairs and their results are shown in Fig 2. As compared to other approaches, our results lay in range of position 8 to 10 out of 36 competitors for various images. Although the ranking looks odd, but it’s a bit unfair to compare our results straight with other state of art algorithms. Since they employed 2D optimization, while in our case, we are trying to achieve the efficiency of 2D using a tree structure which is neither 2D nor 1D. Its straight comparison can be made either with 1D optimization algorithms or with Dynamic Tree Optimization algorithm [14]. There are 4 methods based on 1D optimization in the evaluation table, and by a coincidence they have consecutive ranks 25 to 28, which is almost at the 3rd quarter of the table. Tree Optimization method
Refine Stereo Correspondence Using Bayesian Network
617
[14] lies a bit high of these algorithms. Further we have incorporated occluded pixels in our results while other results are based on non occluded pixels present in the image pairs. These methods isolate all occluded pixels by using a fix threshold or a threshold based on the neighboring pixels. To look into deep , we tried to find equal footings to compare our occlusion results with several recent approaches: ”GC+occl” algorithm by Kolmogorov and Zabih [10] which is a pixel-based approach using a symmetric graph-cut framework to handle occlusion, ”Seg+GC” algorithm by Hong and Chen [12] which is a segment-based asymmetric graph-cut approach that does not explicitly detect occlusion, and ”Layer” algorithm by Lin and Tomasi [4] which is a combination of pixel-based and segment-based approaches. Results of two images i.e. ”tsukuba” and ”venus” are presented from[11] dataset. Same parameters are selected for both data sets. The occlusion result is computed by check-ing the visibility of each point in the non-occlusion result. Result of ”Layer” is from the authors’ website. The results are shown and compared in Fig.3. 1 gives the error statistics for ”tsukuba” and ”venus” respectively. They are quantitatively evaluated by 3 criteria, which are the percentages of: false positive, false negative and the bad points near the occlusion. A bad point is a point whose absolute disparity error is greater than one [11]. We make a near occlusion model by dilating the occlusion area to 20 pixels and excluding the occlusion area. Figure 3 are our results.
Table 1. Error Statistics of Two images with respect to our technique Methods
False Pos False Neg Near Occl Tsukuba Our Results 2.21 31 8.75 GC+Occl 1.49 31.8 6.34 Seg+GC 1.23 30.4 8.12 Layered 2.25 24.2 9.01 Venus Our Results 1.2 21 7 GC+Occl 1.91 32.88 13.12 Seg+GC 0.51 16 0.89 Layered 0.32 51 1.01
4
Conclusion
In this work, we have extracted mst based on the color information. We have treated this mst as a bayesian network and inferred two random parameters for occlusion modeling and smoothness in disparity space. For inference, Helmholtz free energies equations are reshaped to suit our framework i.e. the equations are transformed in a recursive manner to fit in for DP. EM algorithm is used to approximate these energies, and then these are minimized to find the optimal disparity using DP. The ultimate goal achieved is minimum occlusion.
618
N.I. Rao, H. Di, and G. Xu
Fig. 3. Comparison Results: Middlebury datasets. First row are the left images, second row are the ground truth third row is our results.
Acknowledgement. This work is jointly sponsored by Higher Education Commission of Pakistan, under National University of Science and Technology, Pakistan.
References 1. P.N. Belhumeur. A bayesian-approach to binocular stereopsis. IJCV, 19:237–260, 1996. 2. A.F. Bobick and S.S. Intille. Large occlusion stereo. IJCV, 33(3):1–20, 1999. 3. Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. TPAMI, 11(23):1222–1239, 2001. 4. Myron Z. Brown, Darius Burschka, and Gregory D. Hager. Advances in computational stereo. TPAMI, 7, 2003. 5. T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990. 6. I.J Cox, S.L. Hingorani, S.B Rao, and B.M. Maggs. A max likelihood stereo algorithm. Computer Vision and Image Understanding, 63:542–567, 1996. 7. Brendan J. Frey and Nebojsa Jojic. A comparison of algorithms for inference and learning in probabilistic graphical models. TPAMI, 27(9), 2005. 8. D. Geiger, B. Ladendorf, and A. Yuille. Occlusions and binocular stereo. IJCV, 14(3):211–226, 1995.
Refine Stereo Correspondence Using Bayesian Network
619
9. D.Geiger H.Ishikawa. Occlusions,discontinuities,and epipolar lines in stereo. In ECCV, 98. 10. V. Kolmogorov and R. Zabih. Multi-camera scene reconstruction via graph cuts. In ECCV, 02. 11. D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 1-3(47):7–42, April 2002. 12. Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing Analysis and Machine Vision. USA, 2nd edition, 2002. 13. J. Sun, N. Zheng, and H. Shum. Stereo matching using belief propagation. TPAMI, 7, 2003. 14. Olga Veksler. Stereo correspondence by dynamic programming on a tree. In CVPR, 2005.