[Bur et al86] Burns, J.B., Hanson, A.R., and Riseman, E.M. 1986. Extracting straight ...... and the edgels in the images are matched due to heir position in the ...
What is New in Computational Stereo Since 1989: A Survey on Current Stereo Papers
Andreas Koschan Technische Universität Berlin August 1993
Technischer Bericht 93-22
What is New in Computational Stereo Since 1989: A Survey on Current Stereo Papers
Andreas Koschan Technische Universität Berlin August 1993
Technischer Bericht 93-22
Zur Veröffentlichung empfohlen von Prof. Dr. Klette
Kurzfassung Nur wenige Bereiche im Fachgebiet Computer Vision wurden bisher so ausführlich und umfassend untersucht wie Stereo Vision. Dies hat zur Folge, daß bereits mehr als 200 verschiedene "Stereo Vision"-Verfahren veröffentlicht wurden und daß weiterhin die Anzahl der Stereo-Publikationen von Monat zu Monat zunimmt. Obwohl das Grundprinzip der rechnergestützten Stereobildauswertung bereits seit mehr als 20 Jahren bekannt ist, entwickeln sich sich immer wieder neue Forschungsrichtungen in diesem Bereich. Ziel dieses Technischen Berichtes ist es, die aktuellen Entwicklungen seit 1989 in der Stereo-Forschung aufzuzeigen und zu systematisieren. Zusätzlich zu dieser Zusammenstellung der relevanten Forschungsentwicklungen wird im Anhang eine Übersicht über 86 aktuelle Stereo-Veröffentlichungen angegeben.
Abstract Only a few problems in computer vision have been investigated more vigorously than stereo. Consequently, more than 200 different stereo vision methods have been published and their number is increasing month by month. Although the principle of computational stereo has been known for more than 20 years, new directions in stereo research are still being under development. The objective of this report is to show and to categorize what is new in stereo research since 1989. This overview is supplemented in an appendix by a survey on 86 current stereo papers.
1
Contents 1. 2. 3. 4.
Introduction..............................................................................................................3 Area-Based and Feature-Based Stereo.....................................................................4 Dynamic Stereo........................................................................................................5 Active Stereo............................................................................................................6
5. 6.
7.
Dense Depth Maps ...................................................................................................6 Camera-Based and Scene-Based Constraints ..........................................................8 6.1 Constraints from Image Geometry .................................................................9 6.2 Constraints Making Use of Object Properties ..............................................10 Stereo Vision Using Color Information .................................................................14
8. 9. 10. 11.
Stereo Using More Than Two Images ...................................................................15 Occlusions and Transparencies..............................................................................17 Comparisons Between Different Stereo Techniques .............................................19 Refinement of Precomputed Disparity Values.......................................................22
12. 13. 14. 15.
Real-time Stereo Using Special Hardware.............................................................23 Integration of Stereo and Other Visual Modules ...................................................24 Conclusion and Outlook.........................................................................................25 References..............................................................................................................27
APPENDIX: A Survey on Stereo Matching Techniques...................................................40 Part 1: Binocular Monochromatic Stereo...............................................................42 Part 2: Polynocular Monochromatic Stereo ...........................................................51 Part 3: Binocular Chromatic Stereo .......................................................................53
2
1.
Introduction
Stereo is a well-known technique to obtain depth information from digital images. The key problem in stereo is how to find the corresponding points in the left and in the right image, referred to as the correspondence problem. Whenever a pair of corresponding points is found the depth can be computed by triangulation. Worldwide, many research activities are known dealing with stereo vision. In their solution, these methods use different approaches to solve the correspondence problem and to select constraints imposed on the visibility of objects in the scene. Furthermore, the methods are applied to rather different tasks (e.g. mobile robots, photogrammetry, stereo microscopy, etc.), and a large number of distinguishable features is used in the solutions. This aggravates a direct comparison of the methods. In 1989, Dhond and Aggarwal [DhoAgg89] presented a review on stereo vision techniques developed until 1989. Their review covers categories of algorithms being identified based upon differences in image geometry, matching primitives, and the computational structure used. Furthermore, the performances of these stereo techniques on various classes of test images were reviewed. This comprehensive review is highly recommended to all researchers dealing with stereo. Nevertheless, many new stereo approaches have been published during the past couple of years. Energy functions have been used to compute dense depth maps [Bar89], [ChaCha92], [Vle93], [NasCho92], phase-based stereo has been introduced [JepJen89], [Fle et al91], color information has been successfully utilized in stereo matching [BroYan89], [JorBov91], [JorBov92], and techniques for verifying and refining of the matching results have been presented [BasDer93], [Bru et al92]. Moreover, the problems caused by occlusions [Bel93], [Gei et al92], [JonMal92], [Sha93] and transparencies [Shi92], [Shi93] in stereo images have been tried to be solved. This report presents a new survey on current stereo techniques taking into account new directions in stereo research. The objective of this report is to show what is new in stereo research since 1989 and which new research directions are presently being followed. Stereo techniques can be distinguished by several attributes, e.g., if they use areabased or feature-based techniques, if they are applied to static or dynamic scenes, if they use passive or active techniques, and if they produce sparse or dense depth maps. A brief introduction to this classification concerning the research directions since 1989 is given in chapters 2 to 5. All stereo methods use constraints to decrease the ambiguities in the matching process. A survey on relevant constraints implemented in current stereo approaches is given in chapter 6. Current investigations have shown that the quality of stereo matching results can be improved when using color information. A review on these color stereo techniques is presented in chapter 7. Another approach to improve the results is the use of additional images in stereo matching. A survey on current polynocular stereo techniques is given in chapter 8. The 3
occurrence of occlusion and transparency is still a big problem in stereo matching. Chapter 9 presents some approaches to deal with the problems. Almost all stereo papers exclusively present their own stereo techniques without comparing them to the results of other methods. Some of the few exceptions are presented in chapter 10. Furthermore, matching results can be improved by using refinement techniques applied to the precomputed disparity values. Three techniques are presented in chapter 11. Excessively long computation time needed to match stereo images is still the main obstacle on the way to the practical application of stereo vision techniques. A selection of current hardware implementations designed to speed up stereo matching is presented in chapter 12. The experience gained with stereo vision indicates the necessity of integrating different vision modules to receive more reliable results. There already exist some approaches dealing with this subject, and a summary is given in chapter 13. Moreover, a survey on 86 current stereo papers is presented in the appendix. Techniques treated in this text, are listed in tabular form. A brief description of the methods is given, including the applied constraints and the techniques of feature extraction (if features are used). Furthermore, the primitives being part of the correspondence search as well as the application tasks and the classes of images used in the evaluation process are listed.
2.
Area-Based and Feature-Based Stereo
Area-based stereo techniques find corresponding points on the basis of correlation (similarity) between the corresponding areas in left and right images. First, a point of interest is chosen in one image. A correlation measure is then applied to search for a corresponding point with a matching neighborhood in the other image. Area-based techniques have the disadvantage of being sensitive to photometric variations during the image acquisition process and of being sensitive to distortions as a result of changing the viewing position. This sensitivity is due to the direct comparison of intensity values in the images. Feature-based stereo techniques, on the other hand, match features in the left image to those in the right image. Features are selected as the most prominent parts in the image, such as, for example, edge points or edge segments. Feature-based techniques have the advantage of being less sensitive to photometric variations and being faster than area-based stereo, because there are fewer candidates for matching corresponding points. Furthermore, features may be extracted with subpixel accuracy to increase the accuracy of the matching results. Whenever applying a feature-based technique, features have to be extracted. Most approaches still use edges as features, but there are some exceptions using, for example, regions [Coh et al89], [Lee et al93] or topological structures [Fle91]. The techniques applied for edge detection did not change since 1989. These are the interest operator by
4
Moravec [Mor77], the Marr-Hildreth operator [MarHil80], the Nevatia-Babu technique [NevBab80], the Canny operator [Can86], and its modification by Deriche [Der87]. These techniques were already reviewed concerning stereo vision by Dhond and Aggarwal in 1989 [DhoAgg89]. Although feature-based stereo techniques solve the correspondence problem fast and accurate, the number of correspondent structures is low because of the small number of features. Correlation techniques seemed to be out of fashion in the late eighties due to their disadvantages mentioned above, but they undergo a revival in the recent years due to the aforementioned disadvantage of feature-based techniques. Kanade and Okutomi [KanOku90] and Geiger, Ladendorf, and Yuille [Gei et al92] use adaptive correlation between windows, while, among others, Chang and Chatterjee [ChaCha92] directly include a photometric constraint into a cost function to be minimized for stereo matching. A more detailed explanation of these minimization techniques is presented in chapter 5. The main difference between these methods as compared to the feature-based methods is the direct use of intensity values instead of features in the matching process. Consequently, these techniques should be more likely called intensity-based approaches than area-based approaches. Nevertheless, both techniques (intensity-based and featurebased) are still commonly applied.
3.
Dynamic Stereo
Initially, images have been taken by static cameras in a static environment. The main objective of all proposed stereo methods was and still is the automatic search for corresponding points in the two images. Recently, a new trend in stereo research is to use motion to obtain more reliable depth estimates. This research direction, called dynamic stereo, is mostly pursued by the members of the robot vision community using mobile moving robots. Several methods have been presented to obtain depth and/or motion from dynamic stereo (see for example [Abd et al93], [Gam90], [GroTis93], [HayNeg90], [JiaWey89], [Mat89], [Nav et al90], [Neg et al92], [Tir et al90], [Tir et al92], [WalMer92], etc.). Different approaches have been published assuming one moving object, several moving objects, or a static scene and a moving camera. A comprehensive introduction to these problems and their solution can be found in the book of Zhang and Faugeras [ZhaFau92], while the book of Sonka, Hlavac, and Boyle [Son et al93] is recommended to those readers being interested exclusively in a short but current presentation of this topic. The correspondence analysis in image sequences is out of the scope of this report. For further information see, for example, [Mat et al89] or [LeeJos93]. Although the use of (known) motion of the stereo system gives some additional information to depth computation, most of the stereo researchers still exclude motion information from their solutions because motion does not considerably improve the accuracy of the results, and it is not always available (for example in photogrammetry or stereo microscopy). 5
4.
Active Stereo
Range finders are usually classified in the computer vision terminology as using either active or passive methods. Active methods transmit energy (for example ultrasound, radar, collimated light, etc.) into the scene to measure range, whereas passive methods do not. One example of active stereo is the approach of Gerhardt and Kwak [GerKwa86]. They used a laser beam to overilluminate one spot of surface to be observed. Range is easily obtained by triangulation using the bright spots in both stereo images since the solution of the correspondence analysis is trivial. Sasse [Sas93] illuminated a scene with a wide-range color spectrum. Each color in the spectrum arises only once along the scanline. He matched pixels due to similarity between colors in both images, and herewith he obtained a dense depth map. Marayama and Abe [MarAbe93] projected line patterns with random cuts on object surfaces. The correspondence is found by using an ordering constraint and adjacency relations between the line segments. Siebert and Urquhart [Sie Urq90] projected random noise texture onto the scene to enhance stereo matching. They applied a Gaussian windowed first order moment calculation to find correspondence in multiscale LoG-filtered channels. During the past few years, some authors have slightly modified the definition of passive and active range techniques. They argue that perception is an active, not a passive process, e.g., we do not just see, we look [Baj88]. Moreover, new robot vision systems are able to verge cameras and/or to change the focus by motor. Therefore, Krotkov [Kro89] uses the term "active" for a passive sensor employed in an active fashion. In the recent years, several members of the robot vision community used this definition in a similar way (see for example [AbbAhu90], [DasAhu93], [GroTis93], [KroBaj93], [Mar et al93], [Ols93], [SwaStr93], etc.). A comprehensive overview on active vision techniques (including active stereo) can be found in [Alo93]. Furthermore, there exists a special issue of the international journal of pattern recognition and artificial intelligence (vol. 7, no. 1, February 1993) devoted to active robot vision: camera heads, model based navigation, and reactive control. Active vision is also referred to as active perception or foveal vision.
5.
Dense Depth Maps
The correspondence search in stereo images is commonly reduced to significant features as computing time is still an important criterion in stereo vision. Unfortunately, feature-based stereo or edge-based stereo, respectively, produce only sparse disparity maps, i.e., only scattered control points can be computed for a succeeding surface interpolation process. For a successful reconstruction of complex surfaces it is, however, essential to compute dense disparity maps defined for every pixel in the entire image.
6
Dense stereo correspondence is considerably influenced by the assumption that corresponding image pixels have similar intensities (photometric constraint). Several methods have been proposed to compute dense depth maps. The main idea of stochastic stereo as suggested by Barnard [Bar89] is the definition of stereo matching as regularization or optimization process. An energy function is being defined representing the constraints used for matching. The function to measure photometric similarity is defined, for example, as PI [D(x,y)] = | IL (x,y) - IR (x + D(x,y), y) | ,
(1)
where IL (x,y) and IR (x,y) are the intensity functions in the left and right images. The function to ensure disparity smoothness may be defined as 1 DS [D(x,y)] = 4
∑
( D(x,y) - D(u,v) ) 2 ,
(2)
(u,v) Ρ N4
where N4 is the four-connected neighborhood of (x,y) (see [JorBov92]). The energy function representing both constraints is given by E =
∑ ∑ { PI [D(x,y)] + α DS [D(x,y)] } x
(3)
y
with a parameter α. Finding a disparity map D(x,y) that minimizes the equation (3) constitutes a solution to the stereo correspondence problem. Several methods are proposed to minimize energy functions. Barnard [Bar89] applied a variant of the simulated annealing technique [LaaAar87] to compute dense depth maps. He used the photometric constraint and the disparity smoothness constraint in his energy function. Furthermore, he applied his algorithm to a multi-resolution pyramid (Gaussian or Laplacian) of the stereo images. Shah [Sha93] defined a nonlinear system of diffusion equations derived by simultaneously applying gradient descent to these functionals. He used a straight-forward finite-difference scheme in his implementation, while Belhumeur [Bel93] suggested to apply a technique, which he called "iterated stochastic dynamic programming", to find a minimal solution for the energy function. Fua [Fua91] obtained a dense depth map by using an interpolation scheme that takes image gray levels into account to preserve image features, while Jones and Malik [JonMal92] used a bank of linear spatial filters at multiple scales and orientations to obtain a dense depth map. The latter applied a technique based on using the pseudoinverse to characterize the information presented in a vector of filter responses. A different approach to obtain dense depth maps is to apply a Block Matching technique to stereo images [Kos91]. Opposed to the methods mentioned so far in this chapter this scheme is fast and simple. Unfortunately, the precision of the matching results is not very high. The results improved considerably by including color information into the
7
matching process [Kos93], but further improvements will be necessary to obtain more precise results. Jepson and Jenkin have demonstrated in a couple of papers that the task of recovering local disparity measurements can be reduced to the task of measuring the local phase difference between band pass signals extracted from the left and right cameras. This approach uses the output behavior of bandpass Gabor filters, and it is called phase-based stereo (see for example [JepJen89] and [Fle et al91] for further details). A dense depth map is easily obtained when this technique is applied to stereo images. Wilson and Knutsson [WilKnu89] also suggested to use Gabor representations to match stereo images, while Weng [Wen90] proposed to match windows of Fourier phases, and Ludwig, Neumann and Neumann [Lud et al92] proposed to match cepstra (the power spectrum of the log of the power spectrum of a signal) to produce dense depth maps. However, dense depth maps are not always required for robotics applications, and their computation is very time consuming. Nasrabadi and Choo [NasCho92] suggested to use an optimization approach to match interest points which are extracted by the Moravec operator [Mor77]. They mapped their cost function to a Hopfield neural network to achieve computationally efficient but sparse results. In summary, energy functions are widely used in current stereo matching techniques. They can be used to obtain dense depth maps as well as sparse depth maps. The decision for one of these depth maps is still a compromise between computing time and the quantity of results needed for a specific application.
6.
Camera-Based and Scene-Based Constraints
There exists no general solution to the correspondence problem due to the existence of ambiguous matching candidates. Consequently, every stereo method uses several assumptions about the image geometry and/or the objects in the scene to reduce the number of ambiguities. These constraints can be distinguished as camera-based and scene-based. Camera-based constraints have to do with image geometry, e.g., camera position, orientation and the physics of image acquisition. Scene-based constraints deal with hypotheses about the nature of allowable surfaces and their spatial relationship in the scene. Many constraints are home made and they are used only by a single author. The selection presented below lists those constraints still being relevant and in use in recent stereo approaches. Although most of the constraints were defined before 1989, they are still fundamental to current research activities. Nevertheless, most stereo papers just mention them but do not explicitly define them. Therefore, a survey on stereo constraints could help to improve the readability of stereo papers. An interesting overview of constraints used for the geometrical reconstruction of a scene was given by Dreschler-Fischer [Dre93] (in German language).
8
6.1 Constraints from Image Geometry Epipolar constraint: One of the most powerful constraints in stereo vision is the knowledge of epipolar geometry. Fig. 1 shows the imaging geometry of a pair of stereo images. OC1 and OC2 are the optical centers of both camera systems. The distance between these optical centers is called baseline. A point P(x,y,z) in the scene is mapped to the points p1 and p2, respectively, in the corresponding image planes. The baseline and the ray of projection from OC1 to P define the plane of projection of the 3-D scene point called the epipolar plane. This epipolar plane intersects both image planes in lines called the corresponding epipolar lines. From this epipolar geometry follows, that for a given point p1 in the right image its corresponding match point p2 in the left image must lie on the corresponding epipolar line. Thus, the search space for a corresponding match point in the second image reduces to a 1-D search space given by the epipolar line. Moreover, all epipolar planes intersect the image planes along horizontal lines if the cameras are arranged with the optical axes in parallel, and the image planes are shifted by a horizontal displacement. This camera arrangement is called standard geometry and, due to its simplicity, it is still used by most of the stereo methods.
Y
P(x,y,z) epipolar plane
image plane 2 p2
OC2
p1
. Z
X
epipolar line 2
baseline epipolar line 1
image plane 1 OC 1
Fig. 1: Epipolar geometry and epipolar lines. The main problem with the epipolar constraint still is that the epipolar lines have to be computed with high accuracy using calibration techniques. Among others, Boyer and his colleagues ([Boy et al91], [VaiBoy91]) presented a technique to compute the epipolar constraint from relative orientation without exact camera calibration. Faugeras [Fau92], Hartley [Har92], and Hartley, Gupta, and Chang [Har et al92] gave an overview of what can be seen in three dimensions with uncalibrated cameras. Geometric Similarity Constraint: Corresponding line segments are assumed to have similar orientations (angle criterion) and similar length (length criterion) in both stereo images. Both criteria are combined to the geometric similarity constraint. The length criterion is rather unstable because the length of an edge depends directly on the edge detection scheme. A long edge in one image could have been easily broken into two edges in the other image. Contrarily, the angle criterion is rather stable, and, therefore, it is commonly used in recent stereo approaches. 9
Uniqueness Constraint: "Almost always, a black dot from one image can match no more than one black dot from the other image." [Mar82, p. 115]. This constraint does not hold for the unusual situation that two points lay on the same ray of projection in one image but they are both visible in the other image. Photometric Constraint: Corresponding pixels are required to have similar intensities (or similar values of some simple functions of intensity) in both images. This holds for "nearly Lambertian" surfaces and a camera arrangement providing that the surface orientation does not change radically and that the baseline distance is small as compared to the distance to the surface [JorBov92]. This constraint is used in all stereo approaches using correlation techniques. Moreover, it is included in almost all energy functions defined to obtain dense depth maps (see chapter 5). Chromatic Photometric Constraint: Jordan and Bovik [JorBov92] extended the photometric constraint to the chromatic photometric constraint for color images. They assume that corresponding pixels in color images have similar color values (or similar values in the corresponding color channels). The limitations of this constraint are identical to those of the photometric constraint but with the additional problem of different amounts of noise in every color channel. Perspective Projection Constraint: An image model with perspective projection is used in (almost) all stereo systems. Some exceptions using orthographic projections are, for example, [Ike87], [CheHua91], and [Che91]. 6.2 Constraints Making Use of Object Properties Feature Compatibility Constraint: "If the two descriptive elements could have arisen from the same physical marking, then they can match. If they could not have, then they cannot be matched." [Mar82, p. 114]. Although this requirement is obvious, a constraint for stereo matching can only be found if the features in the images can be classified by their physical origin. The different types of edges are outlined in Fig. 2.
orientation edge illumination edge
occlusion edge specula´s edge
reflectance edge Fig. 2: Different types of edges (discontinuities). Edges can be distinguished into the following categories (compare [Aya91, p. 45]):
10
- Orientation edges arise from a discontinuity of the vector normal of a continuous surface. - Reflectance edges arise from a discontinuity of the reflectance of a surface, i.e. there exists a change in the material. - Illumination edges arise from a discontinuity of the intensity of the incident lighting. - Specula's edges arise from a special orientation between the light source, the object surface, and the observer, and due to material properties. - Occlusion edges are boundaries between an object and the background as seen by the observer. They do not represent a physical discontinuity in the scene. They exist due to a special viewing position. Only orientation edges, reflectance edges, and illumination edges should be matched in stereo vision. Specula's edges and occlusion edges should not be matched because their occurrence in the images depends on the viewing position of both cameras, and they do not represent the identical physical locus in the scene. Furthermore, illumination edges should not be matched if dynamic stereo is applied. The main problem in edge classification is that a qualitative 3-D reconstruction of the scene is needed to separate different classes of edges. Nevertheless, edge matching is a common technique to find a 3-D reconstruction of the scene, but this process seems to be dependent on the classification of the edges (see [Dre93]). Some encouraging results are already obtained by the use of color information for edge classification (see [Ger et al87], [Kli et al88], and [LeeBaj92] for highlight detection, [FunBaj92] for shadow recognition, and [RubRic82] for the determination of reflectance edges). Continuity Constraint: "The disparity of the matches varies smoothly almost everywhere over the image." [Mar82, p. 115]. Pixels on object boundaries are the exceptions to the above constraint but they represent only a little area in the entire image. Figural Continuity: A region does not have to represent a single surface in the stereo image. Consequently, disparities do not always have to vary smoothly inside an image area. Mayhew and Frisby [MayFri81] assume that the disparities between the corresponding points vary continuously along the contours in smoothed stereo images. This is a slight modification to the constraint proposed by Arnold and Binford [ArnBin80]. Coherence Principle: "From the cohesivity of matter follows directly the coherence principle: the world is not made of points chaotically varying in depth but of (not necessarily opaque) objects each occupying a well defined 3D volume." [Pra85, p. 94]. This principle is different to the continuity constraint determining that neighboring points in the image are projections of neighboring points in the 3D world. This does hold neither for transparent surfaces nor for points along object boundaries. The 11
coherence principle assumes similar disparities for neighboring elements being part of the identical 3D object. Consequently, a 3-D segmentation of the object surfaces is necessary to apply this constraint. Unfortunately, this is a difficult task not yet solved in general. Disparity Limit: From the continuity constraint Marr and Poggio [MarPog76] conclude that the disparity of two corresponding points has to be less than a predefined threshold. This threshold is called disparity limit and it is used to reduce the search space. The limit is motivated by investigations on the human visual system (Panum´s fusional area). The definition of a threshold fixes the minimum distance between the camera and the objects in the scene. On the other hand a disparity limit is easily defined if the minimum distance between the camera and the objects (the working space) is known. Disparity Gradient Limit: Burt and Julesz [BurJul80] found from physiological investigations that binocular fusion of random dot stereograms by the human visual system is only possible if the disparity gradient does not exceed 1. Bl ∆y
Bc
Br
cyclopean separation
Al ∆x (left image)
Ac (binocular superposition)
Ar ∆ x´ (right image)
Fig. 3: Disparity gradient and cyclopean separation (after [Pol et al85]). Let Al, Ar and Bl, Br be corresponding points in the left and right images and Ac and Bc are their cyclopean images. Pollard, Mayhew, and Frisby [Pol et al85] define the disparity gradient Γd between Ac and Bc as (see Fig. 3 for illustration):
Γd =
ℑ ?x' - ?x ℑ 1/4 ⊇ ( ?x + ?x')2 + ?y2
.
(4)
A constraint for stereo matching is established if a maximum limit is defined for the disparity gradient. Although this constraint has been defined in the mid-eighties it is still used, for example by [Pol et al89] and [Ols90]. Ordering Constraint: A further reduction of the ambiguities in stereo matching can be achieved by assuming that object points in the scene are mapped in the identical order on the corresponding epipolar lines in both images. The distance between the camera and all objects in the scene has to be approximately identical if this ordering constraint shall hold. If one object lies much closer to the camera than the other objects the ordering constraint does not hold. An illustration of this limitation is outlined in Fig. 4.
12
A
left lens center C l Bl A l
B
B
A
C
left lens center
right lens center
C
C l Bl Al
Cr Br A r
right lens center Br A r Cr
Fig. 4: Order of the points mapped on the corresponding epipolar lines in the images. Left: Both objects have approximately the same distance to the image planes. Right: One object is much nearer to the image plane than the other. The definition of an upper bound for the disparity gradient has a stronger influence on the properties of the objects in the scene than the assumptions on ordering. The ordering constraint is still used by several authors (see [BriBra90], [CocMed89], [HuaDub89], [TuDub90], etc.). Connectivity Constraint: A strong global constraint is proposed by Baker and Binford: "In the absence of other information, a connected sequence of edges in one image should be seen as a connected sequence of edges in the other, and ... the structure in the scene underlying these observations may be inferred to be a continuous surface detail or a continuous surface contour." [BakBin81, p. 633]. This constraint exclusively holds for restricted object scenes. If a general scene is viewed edges being connected in one image may be broken in the other image and, moreover, edges from different surfaces may be connected in one image due to occlusions. Planarity Constraint: For a limited number of applications the exclusive existence of polyhedral objects in a scene can be assumed. In these applications, the planarity constraint guarantees the existence of an affine transformation between the coordinates of the corresponding points in the left (Xl,Yl) and right (Xr, Yr) images which were generated by 3D scene points (X, Y, Z), all belonging to the same plane [Bon et al92]. This constraint is given by the equations Xr = αXl + βYl + γ and Yr = Yl, while the coefficients α, β, γ depend on the parameters a, b, c and d of the plane equation aX +bY + cZ + d = 0. Occlusion constraint: "A discontinuity in one eye corresponds to an occlusion in the other eye and vice-versa." [Gei et al92, p. 428]. This constraint does not hold in the unusual situation of a small cuboid lying in front of a big cuboid. There is no region of occlusion, but a depth discontinuity occurs if this scene is viewed by a stereo camera. In summary, 16 constraints were presented in this chapter. This survey is not intend to be complete, but it represents a comprehensive selection of constraints. The objective of this chapter was to provide helpful information to readers of stereo papers and to users implementing stereo algorithms.
13
7.
Stereo Vision Using Color Information
There are several motivations for using chromatic information. Firstly, chromatic information is easily obtained with high precision when using a 3-chip CCD camera. Secondly, color plays an important role in human perception. Livingstone and Hubel [LivHub87] showed that humans cannot perceive depth in color stereograms when the colors are at equiluminance. Although the role of color in stereo vision is still an unsolved question, there is little doubt that color attributes are influencing human stereopsis (see [BroYan89]). Thirdly, it is obvious that red pixels cannot match with blue pixels although their intensities are equal or similar. Consequently, the existing computational approaches to color stereo correspondence have shown that the matching results can be considerably improved when using color information. Drumheller and Poggio [DruPog86] presented one of the first stereo approaches using color. They used the sign of convolution as a matching primitive in each of the three color channels. They found only little improvement in the results for typical natural scenes, but they encouraged the interest in the use of color in stereo. In 1988, Jordan and Bovik [JorBov88] improved their edge matching results by using the chromatic gradients of three normalized difference spectra (red minus green, green minus blue, and blue minus red). In 1989, together with Geisler [Jor et al89], they introduced the chromatic gradient matching constraint. This is an extension of the disparity gradient limit [Pol et al85] from gray level images to color images. The quality of the matching results with the edge-based PMF Algorithm [Pol et al85] can be extremely improved when using color images and the chromatic gradient matching constraint. The number of correct matches increases by 100% to 500% depending on the color test images [JorBov91]. Brockelbank and Yang [BroYan89] suggested an approach based on physiological and psychophysical investigations on human perception. They used the opponent color model for image representation. Each potential match between edges is given a probability with regard to edge orientation, edge contrast, and a maximum disparity value defined by Panum´s fusional area. Assuming figural continuity and global surface smoothness, a relaxation labeling is used to increase the probabilities of possible matches that agree in disparity with the majority of possible matches within the local neighborhood. All color methods mentioned so far are feature-based, i.e., they produce sparse disparity maps. In the past few years, there is an essential need for algorithms that compute dense disparity maps. Nguyen and Cohen [NguCoh92] suggested to match regions with similar curvature characteristics in both images. The matching is performed separately for the R, G and B channel, and the results are concatenated by averaging. A dense disparity map is computed by a succeeding spline interpolation. This is not a dense stereo algorithm in the sense that the correspondence problem is 14
solved for all pixels in the image simultaneously, but, nevertheless, a dense depth map is computed. In 1992, Jordan and Bovik [JorBov92] introduced the chromatic photometric constraint requiring similar colors for corresponding pixels in both images. The similarity of potential matches is being measured by computing the absolute differences between the intensity values in the R, G and B components. Together with a disparity smoothness constraint, the assumption that disparity values are similar in the fourconnected neighborhood, they define the sum of these constraints as energy function. A dense disparity map was computed by using the simulated annealing technique to minimize this energy function for all possible disparities and all pixels in the left image. Okutomi, Yoshizaki and Tomita [Oku et al92] suggested an area-based technique. They used the sum of squared differences (SSD) of the intensity values in the three color channels R, G and B for correspondence. They showed (for one-dimensional signals) that the variance of the estimated disparity values is smaller, i.e., the precision is higher, when using color images than using any single gray image. The precision of the estimated disparities increases when the original images contain various colors. In their investigation they used the RGB color space and the SSD criterion for stereo matching. A different approach is to extended the Block Matching technique to color stereo [Kos93]. Three different color models (RGB, I1I2I3, HSI) and five different color metrics were investigated concerning their suitability for stereo matching. All combinations of color spaces and color measures were examined by applying them to several test images. Furthermore, the results were compared to the results of the gray value algorithm. However, the precision of the matching results could always been improved by 20 to 30% when using color information instead of gray value information. Moreover, it was found that the I1I2I3 color space (introduced by [Oht et al80]) provided the best information in this comparison. In summary, color information provides a powerful cue in stereo matching. The results of all methods mentioned in this chapter (feature-based or area-based) could be considerably improved by including color information into the matching process. Thus, our group will carry out further investigations on color stereo to obtain more reliable results in less computational time.
8.
Stereo Using More Than Two Images
A different approach to deal with the ambiguities in stereo is the use of more than two monochromatic images in the correspondence search. Stereo approaches using three images (trinocular stereo), four images (tetranocular stereo) or up to eight images (e.g. [Tsa83]) are called polynocular stereo approaches. The most common technique is
15
trinocular stereo providing the advantage of an extra epipolar geometry constraint. Three different camera arrangements are common in trinocular stereo (see Fig. 5).
(a)
(b)
(c)
Fig. 5: Trinocular camera arrangements (a) rectangular coplanar arrangement, (b) collinear arrangement, (c) "free" noncollinear arrangement. The rectangular (a) and the collinear (b) arrangement uses coplanar image planes and optical axes in parallel. Thus, the epipolar lines are congruent with the horizontal scanlines (configuration a) or with the horizontal and the vertical scanlines in the images (configuration b). The main disadvantage of the collinear arrangement consists of the fact that horizontal edges in the scene are mapped to the identical epipolar line in all three images. The resultant ambiguities in the correspondence analysis cannot be solved in general. Opposed to that, the rectangular arrangement provides horizontal epipolar lines as well as vertical epipolar lines. Therefore, edges can be matched independent to their orientations in the images. Simple stereo matching, exclusively based on the image geometry, is made possible when a "free" camera arrangement is used. The true corresponding points in the three images satisfy the condition that they must lie on the conjugate epipolar lines of the other two cameras. Consequently, no further constraints are necessary. The main disadvantage of this camera arrangement consists of the fact that the epipolar lines do not correspond with the image scanlines. They have do be computed for every distinct configuration. In principle, two strategies in correspondence analysis can be distinguished when a trinocular approach is used. The first strategy is to match pixels only if a correspondence is found in all three images, i.e., the third image is used to verify possible matches. The second strategy is to match pixels whenever a correspondence is found in at least two images, i.e., the third image is used to enhance the number of matches. The number of false positive matches can be extremely reduced when applying the first principle. Contrarily, the number of false negative matches increases at the same time. This holds especially for edges being occluded in one of the three images. Opposed to that, the second principle is selected if edges shall be matched being occluded in one of the three images. Herewith the number of matches increases and because of that more control points are obtained to reconstruct the objects in the scene. The analysis of corresponding features has to be carried out very carefully to
16
avoid an increase of false positive matches. Current trinocular stereo techniques still use both strategies mentioned above. Dhond and Aggarwal [DhoAgg91] presented a cost-benefit analysis of a third camera for stereo correspondence when employing the first strategy. They generated test images by applying a Lambertian reflectance model to real Digital Elevation Maps (DEMs). As result of their investigation, Dhond and Aggarwal observed that trinocular local matching reduced the percentage of mismatches having large disparity errors by more than half when compared to binocular matching. On the other hand, trinocular stereopsis increased the computational cost of local matching over binocular by only about one-fourth in this investigation. A new direction in polynocular stereo research is established by applying dynamic polynocular techniques to stereo images (for example [EnsLi93], [Mat89], [SkeLiu92], [Tir et al90], [Tir et al92], etc.). Furthermore, multiple-baseline approaches have been presented, for example, by Kanade and his group [OkuKan91], [OkuKan93], and [NakKan92]. Yoshida and Hirose [YosHir92] followed this principle by using five cameras and a special hardware for real-time stereo. A comprehensive review of trinocular stereo algorithms can be found in [DhoAgg89]. A summarizing survey on polynocular stereo techniques developed since 1989 is presented in part 2 of the appendix.
9.
Occlusions and Transparencies
For a real scene, the occurrence of occlusion is the norm rather than the exception. This has a direct influence on the matching results in stereo. Therefore, the ability to detect occlusions is essential to three-dimensional machine vision. Among others, Toh and Forrest [TohFor90], and Williams [Wil90] presented some constraints for occlusion detection but they do not incorporate their methods into a stereo system. Belhumeur and Mumford [BelMum92] proposed an approach to deal with half-occlusions based on the minimization of an energy functional. They defined their functional as being composed of two functions, one being defined for half-occluded parts in the image and another one being defined for non-occluded parts in the image. The following example outlines the definition given by Belhumeur and Mumford to distinguish between half-occluded and non-occluded regions. Let x be the coordinate of an image point along an epipolar line in an imaginary cyclopean image plane. Its corresponding points in the left and right image are denoted as xl and xr, respectively. Then the relations between the corresponding coordinates in all image planes can be given using a positive disparity function d(x) via xl = x + d(x) and xr = x - d(x). To see when a patch is visible from both eyes, Belhumeur and Mumford introduced a filtered version d*(x) of d(x) as d*(x) = max (d(x+a) - |a|). a
17
A point p visible to the cyclopean eye in direction x is mutually visible to the left and right eyes if and only if d*(x) = d(x) (see [BelMum92]). Thus, Belhumeur and Mumford defined the half-occluded points O in the cyclopean plane to be the closure of the set of x such that d*(x) > d(x). The energy function was defined as a constant value for halfoccluded points. In 1993, Belhumeur [Bel93] extended the definition of the energy functional to incorporate surface orientation and creases. His resulting functional did not meet the requirements needed to apply the simulated annealing technique. Therefore, he proposed to use a technique called "iterated stochastic dynamic programming" to find a minimal solution for the energy function. This scheme does not guarantee to find an optimum solution, however, but the results obtained when applying this technique to two indoor test images were rather acceptable. Shah [Sha93] also suggested to use energy functionals designed to deal with halfocclusions. He defined a nonlinear system of diffusion equations derived by simultaneously applying gradient descent to these functionals. He used a straightforward finite-difference scheme in his implementation. Geiger, Ladendorf, and Yuille [Gei et al92] presented an intensity-based technique using a Bayesian approach to compute a dense depth map. They used adaptive correlation between windows together with an occlusion constraint which assumes that a discontinuity in one eye (always) corresponds to an occlusion in the other eye and vice-versa. This occlusion constraint requires the monotonicity of the disparity function inside a mutual visible region. When applying this constraint to stereo matching the search space can be considerably reduced to subintervals limited by the occluding regions. Dhond and Aggarwal [DhoAgg92] presented a feature-based hierarchical stereo approach to deal with narrow occluding objects. They used a dynamic disparity search at different levels of resolution. The correspondence process took place in two disparity pools (a background and a foreground pool). At each level of hierarchy all matched pixels were assumed to have their estimated disparities within an allowable disparity range of their respective pool. Instead of occluded and non-occluded regions foreground and background regions were distinguished in the images. A simple technique to deal with occlusions was presented by Jones and Malik [JonMal92]. They defined a binocular visibility map for one view as being 1 at each image position that is visible in the other view, and 0 otherwise. The visibility map for the right image was initially set all zero. For each position in the left image that matches a position in the right image, the corresponding position in the right visibility map was set to 1. Those positions that remained zero at the end of the matching procedure were defined to be the occluding parts in the image. Jones and Malik suggested to assign the same disparity value to an occluded region as it was assigned in the neighborhood of the region. A completely different approach to avoid the occlusion problem is based on collinear polynocular stereo. The basic idea of this approach is to assume that as more images are
18
viewed as less parts of the scene are occluded in all of the images. A survey on polynocular techniques is presented in chapter 8. Another problem to deal with is the occurrence of transparency in stereo vision. Transparency perception arises when we see scenes with complex occlusions such as, for example, fences or bushes and with physically transparent objects such as, for example, fluids or glass. Conventional stereo techniques cannot properly handle these complex situations, since transparency is beyond the assumptions of these techniques. Fortunately, some investigations on the perception of transparencies in motion and stereo have been carried out (see [Wei89], [Sto et al90], [Ber et al90], etc.). Shizawa [Shi92] suggested to use a mathematical technique, called the principle of superposition (PoS), to find constraints for transparent stereo. If L(x) denotes the intensity value in the left image and R(x) denotes the intensity value in the right image along the epipolar line, the photometric constraint in intensity-based stereo can be written using an amplitude operator a(.) as 1 -Ψ(D) ⎞ L(x) ⎛ ⎟ , f(x) ≡ ⎛⎝ R(x) ⎞⎠ . a(D)f(x) = 0 where, a(D) ≡ ⎜ (5) ⎝ -Ψ(-D) 1 ⎠ D is the disparity and Ψ(D) is a shift operator which transforms L(x) into L(x-D) and R(x) into R(x-D). According to the principle of superposition, Shizawa suggested the constraint of transparency in stereo hypothesized as a(Dn) … a(D2)a(D1)f(x) = 0, (6) n
where f(x) =
∑fi(x) , and each fi(x) is constrained by a(Di)fi(x) = 0 . In 1993, Shizawa i=1
[Shi93] included a first-order approximation of the constraint mentioned above into an energy function which is intended to be minimized by regularization. The correlation between intensities is based on the sum of squared differences in this functional. Shizawa found a good behavior of the energy function for 1-D signals [Shi93]. There exist no practical results with 2D stereo images so far, but the 1-D results encourage the use of this constraint in a stereo implementation. Several techniques were presented to deal with the occurrence of transparency and occlusion in stereo images. Most of them use energy functions to be minimized for solving the correspondence problem. These techniques provide rather good results but they are very slow. Thus, more efficient techniques are needed to deal with transparency and occlusion and/or specific hardware implementations have to be developed to speed up the matching process.
10. Comparisons Between Different Stereo Techniques Although more than 200 stereo papers have been published so far there still does not exist a standardized way for the evaluation of the algorithms. The known methods differ in their solution to the correspondence problem as well as in the selection of constraints 19
assumed for the visible objects in the scene. Almost all publications exclusively present their own stereo technique without comparing it to the results of other (200 already existing) methods. Moreover, the large number of distinguishable features in the solutions aggravates a direct comparison of the methods. Therefore, it is nearly impossible for the reader or user to evaluate the suitability of a method for a selected application. The ignorance of methodic experimental evaluations has been criticized for years by members of the computer vision community (see for example [AloRos91], [JaiBin91], [Sny91], [Pav92], [Pan92], etc.). Nevertheless, there still exist only very few publications on experimental investigations and comparisons. Day and Muller [DayMul89] compared the results of three different algorithms ([Pol et al85], [BarTho80], [OttCha88]) applied to SPOT satellite images. As a result, they favor the algorithm by Otto and Chau [OttCha88] due to its ability to create arbitrarily dense disparity maps and to sheet-grow around features such as clouds and areas of low texture. O´Neill and Denos [O´NeDen92] applied several stereo techniques to large scale aerial images and tried to refine the results for receiving dense depth maps. They compared the results of their own stereo matcher with the results of four well-known stereo algorithms ([Nis84], [OhtKan85], [OttCha88], [Pol et al85]). In summary, they received rather acceptable data with their own approach, but further work will be necessary to obtain more sufficient results for large scale images. Faugeras and his team [Fau et al92] presented a very interesting comparative study on stereo algorithms developed at INRIA. They investigated a correlation scheme as well as binocular and trinocular feature-based approaches matching line segments in two or three images, respectively. They applied the methods to synthetic images as well as to real indoor images. An "optimal" stereo algorithm was, however, not found in this comparison but all algorithms yield precise results depending on arbitrary parameters such as the size of the correlation window or the disparity gradient limit. Unfortunately, the results were not compared to results obtained by methods of other authors to show their competitiveness. A comparison of eight well-known ([Gri83], [Kas86], [KimBov88], [OhtKan85], [Oht et al86], [Shi87], [ShiNis85], [Yac et al86]) and two own ([Kos91]) solutions to the correspondence problem was carried out by our group [Kos92]. The ten methods were evaluated with regard to their suitability within a stereo system for the automatic registration of the geometry of an a priori unknown 3-D object in near distance to the cameras (approximately 1 m). The methods were selected regarding their methodical distinction in solving the correspondence problem (area-based and feature-based, binocular and trinocular, statistical and physiology-based, etc.). All methods have been applied to a series of indoor images. The results of this comparison can be summarized as follows.
20
A strong dependency has been found between the choice of the thresholds and the quality of the computed results for the area-based method [Shi87]. Moreover, no regularity between the optimal value to be chosen for the parameters and the employed image functions have been found during the investigations. The Marr-Poggio-Grimson approach [Gri83] required a great amount of computational time. Difficulties occurred most of all with the matching of pixels with large disparity values ( > 35). The entire number of matches was rather acceptable as compared to the results of the other investigated methods but it was not larger than the number of matches determined by methods requiring less computational time. The statistical method of Kass [Kas86] showed the strongest dependency (in comparison to the other investigated methods) between the choice of the thresholds and the results which have been obtained when evaluating this method. Furthermore, some pixels representing no object points in the scene were also matched. These pixels correspond in both images but the assignments are not suitable for the determination of control points to be used in a succeeding interpolation process. The quality of the results determined by the method of [KimBov88] depended directly on the exact detection and matching of the "extreme points" (high curvature points). Generally a false match of the entire contour was created, if only one "extreme point" was matched falsely. Moreover, the results depended on the number of existing extreme points, i.e., this method is rather suitable for scenes containing many surfaces with high curvature, while only a small amount of matches will be detected if the scene contains objects with planar surfaces. The method based on disparity histograms [ShiNis85] has been found more robust than the area based method [Shi87] and the statistical method [Kas86]. The quality of the results was substantially high if suitable thresholds were selected. Nevertheless, rather good results have been reached for the investigated test images even when the thresholds were fixed. The method using Dynamic Programming [OhtKan85] was very efficient with regard to computational time, and only a few edges have been found matched falsely positive. Moreover, the efficiency of this method improved when matching intervals between edges instead of the edges themselves. The investigation of the two trinocular methods showed mainly an improvement of the results of the Dynamic Programming approach [Oht et al86] as compared to the binocular method [OhtKan85]. The influence of the relaxation technique on the quality of the computed depth values has to be classified as very low. The practical applicability of the second trinocular method [Yac et al86] is restricted because of the necessity to compute the epipolar lines after each image acquisition process. The essential disadvantage of this method is, however, based on the principle action that only pixels being visible in all three images could be matched when this technique is applied.
21
Good results have been reached using the Block Matching technique adapted from the HDTV domain. Within the image areas being of significance to a feature based method, i.e., along the edges, only a difference in the disparity values of 1 to 2 pixels in comparison to the exact values has been determined generally by applying this technique to the test images selected in these investigations. The computed results in fact do not fulfill the accuracy requirements imposed to stereo techniques but they are very suitable to initialize the values of a succeeding method. Consequently, very good results have been reached for all selected test images when applying the feature-based method [Kos91] which integrates the values determined by the Block Matching technique for an initialization of the disparity values. The influence of the thresholds on the processing results was less than it was when applying most of the other methods. Operations consuming a great amount of computational time as, for example, the determination of disparity histograms or disparity gradients have consciously not been employed. Therefore, rather good results can be easily obtained when this technique is applied. Nevertheless, no current stereo method (except [Kos91]) has been part in this comparison so far. Thus, further investigations and comparisons will be necessary to obtain more precise depth maps and to show the recent competitiveness of the methods.
11. Refinement of Precomputed Disparity Values The emphasis here is on techniques obtaining more precise results from already precomputed disparity values (sparse or dense). Several approaches have been suggested to yield dense depth maps from sparse depth maps by using interpolation techniques ([Fua91], [NguCoh92], etc.). These techniques are neither the subject of this chapter (compare chapter 5) nor are they methods using consistency checks for the verification of disparity values. Luo and Maitre [LuoMai90] used planar surface models to refine the disparity values computed for stereo images representing urban scenes. Uniform intensity regions in the image are hypothesized to correspond to single surfaces in the scene by assuming that urban scenes consist of polyhedral surfaces. The hypothesis is either verified or refuted by testing whether matches in each region fit the model. This approach is limited to scenes containing objects with polyhedral surfaces. Bruzzone, Cazzanti, De Floriani and Mangili [Bru et al92] suggested a more general approach. They applied a constrained Delauney triangulation technique to line segments being matched using the Ayache algorithm [Aya91]. The resulting two-dimensional triangulation in one of the images was then backprojected into the 3D space, generating a surface description in terms of triangular faces. Herewith a surface reconstruction was obtained using just a small number of matched line segments.
22
Another interesting approach was presented by Bascle and Deriche [BasDer93]. They suggest to approximate a set of 3-D linked points (provided by a stereo matcher) by a B-spline curve. Using a pinhole model for the cameras, they mapped the 3D Bsplines onto both image planes. These projections do not superimpose onto the edges in the images, and the difference is given by an energy function defined for deformable contours. The objective is to search for a deformable model minimizing this energy function by using Lagrangian dynamics. At the end, a refined parametric 3D curve representation can be found which can be used directly in a succeeding processing step. The latter two techniques can be applied as postprocessing process to almost all edgebased stereo techniques.
12. Real-time Stereo Using Special Hardware Computational fast stereo techniques are required for real-time applications, especially for mobile robots and autonomous vehicles. General purpose computers are not fast enough to meet real-time requirements because of the algorithmic complexity of stereo vision techniques. Consequently, the use and/or development of special hardware is inevitable to achieve real-time execution. Several hardware implementations were presented during the past couple of years and only some will be mentioned here. Neural networks and transputers are successfully used for stereo (see for example [Mul et al88], [ZhoChe88], etc.). A parallel stereo algorithm implemented on the TMC Connection Machine was presented by Chen and Medioni [CheMed90]. Choudhary, Das, Ahuja, and Patel [Cho et al90] implemented a feature-based algorithm [HofAhu89] on a multi-processor architecture called NETRA. Their architecture consists of a large number (1000 - 10000) of processing elements, organized into clusters of 16 to 64 elements each, a tree of distributing-and-schedulingprocessors, and a parallel pipelined shared global memory. Laine and Roman [LaiRom90] implemented a feature-based stereo algorithm on a Gould/DeAnza IP-8500 image processor, equipped with a digital video processor consisting of a network of four pipelines. Chakrapani, Khokhar, and Prasanna [Cha et al92] presented a parallel stereo algorithm designed for a fixed size mesh array of n x n SIMD processors. Ens and Li [EnsLi93] matched edge pixels for motion stereo using a pyramid with 512 SIMD processors at the bottom and 63 transputers at the top. Ross [Ros93] mapped the SSSDalgorithm (sum of sum of squared differences) developed by [OkuKan91] to parallel machines including an iWarp, a 5-cell i860 and a 4096 processor Maspar machine. A completely different approach for real-time stereo vision has been introduced by Yoshida and Hirose [YosHir92] for the task of obstacle avoidness. They use a multiple arrayed camera (MAC) system with five collinear cameras in the prototype. The principle of feature-based matching is based on [Tsa83] and [OkuKan91] but a special hardware is used with a single RAM for every (camera) image. Real-time processing is
23
reached by a fast access to the memory planes and an assembler program implemented on a NEC microcomputer PC-9801 using look-up-tables. Although many real-time realizations already exist, there is still a need for more reliable results computed in less time. Almost all hardware realizations implemented so far exclusively emphasize the processing speed. They simplify and/or approximate the underlying algorithms to speed up the system, but most of them do not take the quality of the matching results into account. Therefore, more reliable hardware realizations are needed for stereo vision systems, and we are sure that almost every development of new chips or processors will be immediately followed by a new hardware realization for stereo vision.
13. Integration of Stereo and Other Visual Modules Although the results of the stereo techniques mentioned so far are rather acceptable, the results still lack accuracy. One possibility to improve stereo matching is to combine multiple stereo techniques. Area-based and feature-based stereo has been incorporated, for example, by Cochran and Medioni [CocMed89, CocMed92], Tu and Dubuisson [TuDub90], and by our group [Kos91], while Watanabe and Ohta [WatOht90] suggested to combine area-based, interval-based, and segment-based matching modules. A different direction in stereo research is to integrate stereo and other visual modules to obtain richer information on the shapes of the objects in the scene. Several approaches are suggested dealing with the combination of different visual cues. In this chapter only a selection is presented because some of the papers were published before 1989. In 1987, Ikeuchi [Ike87] proposed to combine binocular and photometric stereo by using three light sources and two cameras. A depth map is determined from a pair of surface-orientation maps obtained by a dual photometric stereo. Assuming orthographic projection, the regions are matched due to their surface orientation and area. Krotkov and Bajcsy combined stereo vision, vergence, and depth-from-focus (see [Kro89] and [KroBaj93]) , while Moerdler and Boult [MoeBou88] suggested to incorporate stereo vision and several shape-from-texture techniques. The integration of stereo vision and shape-from-shading has been investigated by several researchers (see for example [Bla et al85], [Sha et al88], [Chi et al89], etc.). One of the newer papers was presented by Cryer, Tsai and Shah [Cry et al93]. They suggested to combine the low frequency information from stereo with the high frequency information from the shape-fromshading. A more recent direction in stereo research deals with the combination of stereo and motion. While Huang and Blostein [HuaBlo85] first found image motions to help stereopsis, Gambotto [Gam90] suggested to incorporate egomotion information into the matching process. He predicted current stereo matches from previous ones using a trinocular stereo algorithm proposed for robot vision. These research activities deal with dynamic stereo and they are already explained in chapter 3.
24
Abbott and Ahuja [AbbAhu90] proposed to integrate the three visual cues: stereo, vergence, and focus as sources of depth. In 1993, Das and Ahuja [DasAhu93] presented a performance analysis of these three cues. They pointed out that the limitations of the individual cues are complimentary to the strengths of the other cues. A theoretical investigation on the compatibility of different constraints in different visual modules was presented by Jepson and Richards [JepRic92]. They stressed the need to use world knowledge to reason about the plausibility and consistency of interpretations of the image data. Gamble and his colleagues [Gam et al89] used stereo information (together with color, texture, motion and edge detection) as one cue for labeling surface discontinuities. The main problem in all approaches mentioned so far is the integration of the results from different visual cues into a joint representation form which is suitable for a succeeding surface interpolation process. The results that have been reached so far are rather encouraging, but further research will be necessary to find more reliable techniques.
14. Conclusion and Outlook Much of the research dealing with stereo vision was carried out during the past couple of years. Nevertheless, no "optimal" stereo algorithm has been found yet, and perhaps it cannot be found due to the complexity of different requirements arising from varying applications. Following Pavlidis [Pav92] we should blame ourselves for using the wrong tool rather than blaming a stereo technique for poor results. Consequently, the question remains how to find the right tool. This tool has to fulfill all the requirements for specific applications. We believe that this tool can only be found by using an experimental framework because the method has to produce precise results in fast time. One new direction in stereo research is to use energy functionals to be minimized for solving the stereo correspondence problem. Some of the functionals deal with the problem of occlusions, while some others deal with the problem of transparency occurring in stereo images. However, these techniques produce (among others) dense depth maps that are essentially needed for improving the quantity as well as the quality of the results obtainable by any stereo technique. Unfortunately, these techniques are very slow and some of them are non-deterministic. Thus, more efficient techniques are needed to deal with transparency and occlusion in stereo images and/or specific hardware implementations are required to allow their practical application in several task domains. A different direction in stereo research is to integrate motion and stereo or to compute relative depth from uncalibrated images. These techniques are easy to apply to the robotics and autonomous vehicle domain but, unfortunately, exclusively to this
25
domain. This is not really a disadvantage because this task domain is very large. Nevertheless, we believe that robotics applications do not neglect dense depth information if this information can be obtained quickly. Color could be one interesting cue to compute more dense results because the results that have been reached so far with color stereo methods are rather encouraging. Therefore, we believe that color information is very helpful for the correspondence analysis. The disadvantage of these techniques consists of the fact that three-times as much data has to be handled and has to be computed when using color instead of gray values. Fortunately, this is not really a disadvantage in non-realtime applications. Nevertheless, further investigations will be necessary to obtain more reliable results in less computational time. The experience gained with stereo vision indicates the necessity of integrating different vision modules to receive more reliable results. The main problem with these approaches still is how to integrate results from different visual cues into a joint representation form. Further research is also needed in this field but, nevertheless, the single methods that are selected for integration have to show their competitiveness separately and in coorporation with each other. A few comparative studies on stereo algorithms have been presented, but none of them included new techniques as, for example, phase-based stereo, optimization techniques applied to find the minimum solution for energy functions, or techniques using color information. Consequently, there is still an essential need for such studies opposed to the development of further new algorithms only showing good behavior on two or three test images. Unfortunately, this is a difficult task costing very much time and personnel. In our opinion, the only way to overcome this problem is to intensify the interchange of programs and to design an appropriate testbed that will be accepted by the majority of the stereo vision community. This, however, is another difficult task, but it should be started now. The first thing we urgently need is a Stereo Image Database being available to everyone in the computer vision community. This database should contain a large variety of images representing many different scenes. These are for example • random dot stereograms, • images representing indoor scenes as, for example, polyhedral block scenes, mixed polyhedral and non-polyhedral scenes, and faces, • images representing outdoor scenes as, for example, urban scenes (almost polyhedral, representing, e.g., houses), landscape scenes (representing roads, trees, bushes, etc.), and aerial images (for example the Pentagon image and a mountain view),
26
• microscopy images, and • image sequences representing indoor and outdoor scenes as mentioned above. The images should be available in binocular and trinocular arrangements (collinear and rectangular) and also in gray value representation and in color representation (RGB). We should like to emphasize that this database cannot and shall not replace all databases collected in many institutes for the evaluation of specific methods concerning specific requirements. Contrarily, we stress that experimental evaluation is inevitable to prove the competitiveness of a method (with regard to a specific application). This evaluation would be enhanced greatly if the computer vision community could agree on common sets of test images. This, however, includes a consensus on stereo task domains and on classes of test images which are to be used within these domains.
15. References The following abbreviations are uses in the references: CAIP Computer Analysis of Images and Patterns CGIP CVGIP CVPR DARPA
Computer Graphics and Image Processing Computer Vision, Graphics and Image Processing Computer Vision and Pattern Recognition Defence Advanced Research Projects Agency
ECCV GWAI ICIAP ICPR
European Conference on Computer Vision German Workshop on Artificial Intelligence International Conference on Image Analysis and Processing International Conference on Pattern Recognition
ICCV IEE IEEE
IJCAI IUW
International Conference on Computer Vision The Institution of Electrical Engineers The Institute of Electrical and Electronics Engineers Trans. on PAMI Transactions on Pattern Analysis and Machine Intelligence Trans. on SMC Transactions on Systems, Man and Cybernetics International Joint Conference on Artificial Intelligence Image Understanding Workshop
MIT PRL PRIP SPIE
Massachusetts Institute of Technology Pattern Recognition Letters Pattern Recognition in Practice Society of Photo-Optical Instrumentation Engineers
27
[AbbAhu90] Abbott, A.L., and Ahuja, N. 1990. Active Surface Reconstruction by Integrating Focus, Vergence, Stereo, and Camera Calibration. Proc. 3rd ICCV´90, Osaka, Japan, pp. 489-492. [Abd et al93] Abdel-Mottaleb, M., Chellapa,. N., and Rosenfeld, A. 1993. Binocular Motion Stereo using MAP Estimation. Proc. CVPR´93, New York, USA, pp. 321-327. [Alo93] Aloimonos, Y. (Ed.). 1993. Active Perception. Lawrence Erlbaum: Hillsdale, New Jersey, USA. [AloHer90] Aloimonos, J., and Hervé, J.-Y. 1990. Correspondenceless Stereo and Motion: Planar Surfaces. IEEE Trans. on PAMI 12 (5): 504-510. [AloRos91] Aloimonos, J., and Rosenfeld, A. 1991. A response to "Ignorance, myopia, and naiveté in computer vision systems" by R.C. Jain and T.O. Binford. CVGIP: Image Understanding 53 (1): 120-124. [AloShu89] Aloimonos, J., and Shulman, D. 1989. Integration of Visual Modules An Extension of the Marr Paradigm. Academic Press: Boston, USA. [Alv et al89] Alvertos, N., Brzakovic, D., and Gonzalez, R.C. 1989. Camera Geometries for Image Matching in 3-D Machine Vision. IEEE Trans. on PAMI 11 (9): 897-915. [ArnBin80] Arnold, R. D., and Binford, T.O. 1980. Geometric Constraints in Stereo Vision. Proc. SPIE Vol. 238 Image Processing for Missile Guidance, San Diego, California, USA, pp. 281-292. [Aud et al91] Audette, M., Cohen, P., and Weng, J. 1991. Shading-based two-view matching. Proc. IJCAI´91, Sydney, Australia, pp. 1286-1291. [AyaLus91] Ayache, N., and Lustman, F. 1991. Trinocular Stereovision for Robotics: IEEE Trans. on PAMI 13 (1): 73-85. [Aya91] Ayache, N. 1991. Artificial Vision for Mobile Robots: Stereo Vision and Multisensory Perception. MIT Press: Cambridge, Massachusetts, USA. [Baj88] Bajcsy, R. 1988. Active Perception. Proc. of the IEEE 76 (8): 996-1004. [BakBin81] Baker, H.H., and Binford, T.O. 1981. Depth from Edge and Intensity Based Stereo. Proc. IJCAI'81, Vancouver, Canada, pp. 631-636. [BarTho80] Barnard, S.T., and Thompson, W.B. 1980. Disparity Analysis of Images. IEEE Trans. on PAMI 2 (4): 333-340. [Bar89] Barnard, S.T. 1989. Stochastic Stereo Matching over Scale. Int. J. of Comp. Vision 3: 17-32. [BasDer93] Bascle, B., and Deriche, S.T. 1993. Stereo Matching, Reconstruction and Refinement of 3D Curves Using Deformable Contours. Proc. 4th ICCV, Berlin, Germany, pp. 421-430. [BelMum92] Belhumeur, P.N., and Mumford, D. 1992. A Bayesian Treatment of the Stereo Correspondence Problem Using Half-Occluded Regions. Proc. CVPR´92, Champaign, Illinois, USA, 1992, pp. 506-512. [Bel93] Belhumeur, P.N. 1993. A Binocular Stereo Algorithm for Reconstructing Sloping, Creased, and Broken Surfaces in the Presence of Half-Occlusion Proc. 4th ICCV, Berlin, Germany, pp. 431-438. [Ber et al90] Bergen, J.R., Burt, P., Hingorani, R., and Peleg, S. 1990. TransparentMotion Analysis. Proc. 1st ECCV, Antibes, France, pp. 566-569. [Bla et al85] Blake, A., Zisserman, A., and Knowless, G. 1985. Surface description from stereo and shading. image and vision computing 3 (4): 183-191. 28
[Bon et al92] Bonnin, P., Fortunel, C., and Zavidovique, B. 1992. The Planarity Constraints: Definition Interest and Applications. Proc. Int. Conf. on Image Processing and its Applications, Maastricht, the Netherlands, pp.613-616. [Boy et al86] Boyer, K.L., Vayda, A.J., and Kak, A.C. 1986. Robotic Manipulation Experiments Using Structural Stereopsis for 3D Vision. IEEE Expert Fall: 73-94. [Boy et al90] Boyer, K.L., Wuescher, D.M., and Sarkar, S. 1990. Dynamic Edge Warping: Experiments in Disparity Estimation Under Weak Constraints. Proc. 3rd ICCV, Osaka, Japan, pp. 471-475. [Boy et al91] Boyer, K.L., Wuescher, D.M., and Sarkar, S. 1991. Dynamic Edge Warping: An Experimental System for Recovering Disparity Maps in Weakly Constrained Systems. IEEE Trans. on SMC 21 (1): 143-158. [BriBra90] Brint, A.T., and Brady, M. 1990. Stereo matching of curves. image and vision computing 8 (1): 50-56. [BroYan89] Brockelbank, D.C., and Yang, Y.H. 1989. An Experimental Investigation in the Use of Color in Computational Stereopsis. IEEE Trans. on SMC 19: 1365-1383. [Bro et al90] Brookshire, G., Nadler, M., and Lee, C. 1990. Automated Stereophotogrammetry. CVGIP 52: 276-296. [Bru et al92] Bruzzone, E., Cazzanti, M., De Floriani, L., and Mangili, F. 1992. Applying Two-dimensional Delaunay Triangulation to Stereo Data Interpolation. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 368-372. [Bur et al86] Burns, J.B., Hanson, A.R., and Riseman, E.M. 1986. Extracting straight lines. IEEE Trans. on PAMI 8 (4): 425-455. [BurJul80] Burt, P., and Julesz, B. 1980. Modifications of the classical notion of Panum's fusional area. Perception 9: 671-682. [Buu92] Buurman, J. 1992. Ellipse based stereo vision. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 363-368. [Can86] Canny, J. 1986. A Computational Approach to Edge Detection. IEEE Trans. on PAMI 8 (6): 679-698. [ChaBer93] Chabbi, H., and Berger, M.-O. 1993. Recovering Planar Surfaces by Stereovision Based on Projective Geometry. Proc. CVPR´93, New York, USA, pp. 649-650. [ChaCha90] Chang, C., and Chatterjee, S. 1990. Multiresolution Stereo - A Bayesian Approach. Proc. 10th ICPR, Atlantic City, New Jersey, USA, Vol. I, pp. 908-912. [Cha et al91] Chang, C., Chatterjee, S., and Kube, P.R. 1991. On an Analysis of Static Occlusion in Stereo Vision. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 722-723. [ChaCha92] Chang, C., and Chatterjee, S. 1992. A Deterministic Approach for Stereo Disparity Calculation. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 420-424.
29
[Cha et al92] Chakrapani, P.N., Khokhar, A.A., and Prasanna, V.K. 1992. Parallel Stereo on Fixed Size Arrays using Zero Crossings. Proc. 11th IAPR Int. Conf. on Pattern Recognition, The Hague, The Netherlands, Vol. IV, pp. 79-82. [Che91] Chen, H.H. 1991. Determining Motion and Depth from Binocular Orthographic Views. CVGIP: Image Understanding 54 (1): 47-55. [CheHua91] Chen, H.H., and Huang, T.S. 1991. Using Motion from Orthographic Views to Verify 3-D Point Matches. IEEE Trans. on PAMI 13 (9): 872878. [CheMed90] Chen, J.-S., and Medioni, G. 1990. Parallel Multiscale Stereo Matching Using Adaptive Smoothing. Proc. 1st ECCV, Antibes, France, pp. 99103. [Chi et al89] Chiaradia, M.T., Distante, A., and Stella, E. 1989. Three-dimensional surface reconstruction integrating shading and sparse stereo data. Optical Engineering 28 (9): 935-942. [Cho et al90] Choudhary, A.N., Das, S., Ahuja, N., and Patel, J.H. 1990. A Reconfigurable and Hierarchical Parallel Processing Architecture: Performance Results for Stereo Vision. Proc. 10th ICPR, Atlantic City, New Jersey, USA, Vol. II, pp. 389-393. [ChuNev91] Chung, R.C.K., and Nevatia, R. 1991. Use of Monocular Groupings and Occlusion Analysis in a Hierarchical Stereo System. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 50-56. [CocMed89] Cochran, S.D., and Medioni, G. 1989. Accurate Surface Description from Binocular Stereo. Proc. DARPA Image Understanding Workshop, Palo Alto, California, USA, pp. 857-869. [CocMed92] Cochran, S.D., and Medioni, G. 1992. 3-D Surface Description from Binocular Stereo. IEEE Trans. on PAMI 14 (10): 981-994. [Coh et al89] Cohen, L., Vinet, L., Sander, P.T., and Gagalowicz, A. 1989. Hierachical Region Based Stereo Matching. Proc. CVPR'89, San Diego, California, USA, pp. 416-421. [Cry et al93] Cryer, J.E., Tsai, P.-S., and Shah, M. 1993. Integration of Shape from X modules: Combining Stereo and Shading. Proc. CVPR´93, New York, USA, pp. 720-721. [DasAhu93] Das, S., and Ahuja, N. 1993. A Comparative Study of Stereo, Vergence, and Focus as Depth Cues for Active Vision. Proc. CVPR´93, New York, USA, pp. 194-199. [DayMul89] Day, T., and Muller, J.-P. 1989. Digital elevation model production by stereo-matching spot image-pairs: a comparison of algorithms. image and vision computing 7 (2): 95-101. [Der87] Deriche, R. 1987. Using Canny´s criteria to derive a recursively implemented optimal edge detector. Int. J. of Comp. Vision 1 (4): 167187. [DerFau90] Deriche, R., and Faugeras, O. 1990. 2-D Curve Matching Using High Curvature Points: Application to Stereo Vision. Proc. 10th ICPR Atlantic City, New Jersey, USA, Vol. I, pp. 240-242. [DhoAgg89] Dhond, U.R., and Aggarwal, J.K. 1989. Structure from Stereo - A Review. IEEE Trans. on SMC 19 (6): 1489-1510. 30
[DhoAgg91] Dhond, U.R., and Aggarwal, J.K. 1991. A Cost-Benefit Analysis of a Third Camera for Stereo Correspondence. Int. J. of Comp. Vision 6 (1): 39-58. [DhoAgg92] Dhond, U.R., and Aggarwal, J.K. 1992. Computing Stereo Correspondences in the Presense of Narrow Occluding Objects. Proc. CVPR´92, Champaign, Illinois, USA, pp. 758-760. [Dre93] Dreschler-Fischer, L. 1993. Geometrische Szenenrekonstruktion. In: Görz, G. (Ed.): Einführung in die künstliche Intelligenz. AddisonWesley: Bonn, Germany, pp. 681-711, (in German). [DruPog86] Drumheller, M., and Poggio, T. 1986. On Parallel Stereo. Proc. IEEE Conf. on Robotics and Automation, pp. 1439-1448. [EnsLi93] Ens, J., and Li, Z.-N. 1993. Real-time Motion Stereo. Proc. CVPR´93, New York, USA, pp. 130-135. [Fau92] Faugeras, O. 1992. What can be seen in three dimensions with an uncalibrated stereo rig? Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 563-578. [Fau et al 92] Faugeras, O., Fua, P., Hotz, B., Ma, R., Robert, L., Thonnat, M., and Zhang, Z. 1992. Quantitative and qualitative comparison of some area and feature-based stereo algorithms. Proc. Workshop on Robust Computer Vision, Bonn, Germany, pp. 1-26. [Fle91] Fleck, M.M. 1991. A Topological Stereo Matcher. Int. J. of Comp. Vision 6 (3): 197-226. [Fle et al91] Fleet, D.J., Jepson, A.D., and Jenkin, M.R.M. 1991. Phase-Based Disparity Measurement. CVGIP: Image Understanding 53 (2): 198-210. [Fua91] Fua, P. 1991. Combining Stereo and Monocular Information to Compute Dense Depth Maps that Preserve Depth Discontinuities. Proc. IJCAI´91, Sydney, Australia, Vol. 2, pp. 1292-1298. [FunBaj92] Funka-Lea, G., and Bajcsy, R. 1992. Active Color Image Analysis for Recognizing Shadows. Technical Report GRASP LAB 336 MS-CIS-9282, University of Pennsylvania, Dept. of Computer and Info. Science, USA. [Gam etal89] Gamble, E.B., Geiger, D., Poggio, T., and Weinshall, D. 1989. Integration of Vision Modules and Labeling of Surface Discontinuities. IEEE Trans. on SMC 19 (6): 1576-1581. [Gam90] Gambotto, J.-P. 1990. Determining Stereo Correspondences and Egomotion from a Sequence of Stereo Images. Proc. 10th ICPR Atlantic City, New Jersey, USA, Vol. I, pp. 259-262. [GazMed89] Gazit, S.L., and Medioni, G. 1989. Multi-Scale Contur Matching in a Motion Sequence. Proc. DARPA Image Understanding Workshop, Palo Alto, California, USA, pp. 934-943. [Gei et al92] Geiger, D., Ladendorf, B., and Yuille, A. 1992. Occlusions and Binocular Stereo. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 425-433. [Ger et al87] Gershon, R., Jepson, A. D., and Tsotsos, J. K. 1987. Highlight Identification Using Chromatic Information. Proc. 1st ICCV, London, GB, pp. 161-170.
31
[GerKwa86] Gerhardt, L.A., and Kwak, W.I. 1986. An Improved Adaptive Stereo Ranging Method for Three-Dimensional Measurements. Proc. CVPR'86, Miami Beach, Florida, USA, pp. 21-26. [Gri83] Grimson, E. 1983. An Implementation of a Computational Theory of Visual Surface Interpolation. CVGIP 22: 39-69. [GroTis93] Grosso, E., and Tistarelli, M. 1993. Active / Dynamic Stereo: a General Framework. Proc. CVPR´93, New York, USA, pp. 732-734. [Har92] Hartley, R.I. 1992. Estimation of Relative Camera Positions for Uncalibrated Cameras. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 579-587. [Har et al92] Hartley, R., Gupta, R., and Chang, T. 1992. Stereo from uncalibrated cameras. Proc. CVPR´92, Champaign, Illinois, USA, pp. 761-764. [HayNeg90] Hayashi, B.Y., and Negahdaripour, S. 1990. Direct Motion Stereo: Recovery of Observer Motion and Scene Structure. Proc. 3rd ICCV, Osaka, Japan, pp. 446-449. [HofAhu89] Hoff, W., and Ahuja, N. 1989. Surfaces from Stereo: Integrating Feature Matching, Disparity Estimation, and Contour Detection. IEEE Trans. on PAMI 11 (2): 121-136. [HorSko89] Horaud, R., and Skordas, T. 1989. Stereo Correspondence Through Feature Grouping and Maximal Cliques. IEEE Trans. on PAMI 11 (11): 1168-1180. [HuaBlo85] Huang, T.S., and Blostein, S.D. 1985. Robust algorithms for motion estimation based on two sequential stereo image pairs. Proc. CVPR´85, San Francisco, California, USA, pp. 518-525. [HuaDub89] Hua, Z.D., and Dubuisson, B. 1989. String Matching for Stereo Vision. Pattern Recognition Letters 9: 117-126. [Ike87] Ikeuchi, K. 1987. Determining a Depth Map Using a Dual Photometric Stereo. Int. J. of Robotics Research 6 (1): 15-31. [Ish et al90] Ishiguro, H., Yamamoto, M., and Tsuji, S. 1990. Omni-directional Stereo. Proc. 3rd ICCV, Osaka, Japan, pp. 540-547. [Ish et al92] Ishiguro, H., Yamamoto, M., and Tsuji, S. 1992. Omni-directional Stereo. IEEE Trans. on PAMI 14 (2): 257-262. [JaiBin91] Jain, R.C., and Binford, T.O. 1991. Ignorance, myopia, and naiveté in computer vision systems. CVGIP: Image Understanding 53 (1): 112-117. [JiaWey89] Jiang, F., and Weymouth, T.E. 1989. Depth From Dynamic Stereo Images. Proc. CVPR´89, San Diego, California, USA, pp. 250-255. [JepJen89] Jepson, A.D., and Jenkin, M.R.M. 1989. The Fast Computation of Disparity from Phase Differences. Proc. CVPR'89, San Diego, California, USA, pp. 398-403. [JepRic92] Jepson, A., and Richards, W. 1992. A Lattice Framework for Integrating Vision Modules. IEEE Trans. on SMC 22 (5): 1087-1096. [JonMal92] Jones, D.G., and Malik, J. 1992. A Computational Framework for Determining Stereo Correspondence from a Set of Linear Spatial Filters. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 395-410. [JorBov88] Jordan III, J.R., and Bovik, A.C. 1988. Computational Stereo Vision Using Color. IEEE Control Systems Magazine June: 31-36.
32
[JorBov91] [JorBov92] [Jor et al89] [Jul71] [Kak et al86]
[KanOku90]
[Kas86] [KimBov88] [Kim et al92] [Kli et al88] [Kos91] [Kos92]
[Kos93] [Kro89] [KroBaj93] [Krot et al90]
Jordan III, J.R., and Bovik, A.C. 1991. Using chromatic information in edge-based stereo correspondence. CVGIP: Image Understanding 54 (1): 98-118. Jordan III, J.R., and Bovik, A.C. 1992. Using Chromatic Information in Dense Stereo Correspondence. Pattern Recognition 25 (4): 367-383. Jordan III, J.R., Bovik, A.C., and Geisler, W.S. 1989. Chromatic Stereopsis. Proc. IJCAI 1989, pp. 1649-1654. Julesz, B. 1971. Foundations of Cyclopean Perception. The University of Chicago Press: Chicago, USA, 1971. Kak, A.C., Boyer, K.L., Safranek, R.J., and Yang, H.S. 1986. Knowledge-Based Stereo and Structured Light for 3-D Robot Vision. In: Rosenfeld, A. (Ed.): Techniques for 3-D Machine Perception. Elsevier Science Publ.: North-Holland, Amsterdam, the Netherlands, pp.185-218. Kanade, T., and Okutomi, M. 1990. A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment. Technical Report CMUCS-90-120, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Kass, M. 1986. Computing Visual Correspondence. In: Pentland, A. (Ed.). From Pixels to Predicates: Recent Advances in Computational and Robotic Vision. Ablex Publ.: Norwood, New Jersey, USA, pp. 78-92. Kim, N.H., and Bovik, A. 1988. A Contour-Based Stereo Matching Algorithm using Disparity Continuity. Pattern Recognition 21 (5): 505514. Kim, D.H., Choi, W.Y., and Park, R.-H. 1992. Stereo matching technique based on the theory of possibility. Pattern Recognition Letters 13: 735744. Klinker, G.J., Shafer, S.A., and Kanade, T. 1988. The Measurement of Highlights in Color Images. Int. J. of Comp.Vision 2: 7-32. Koschan, A. 1991. Stereo Matching using a new Local Disparity Limit. Proc. IVth Int. Conf. on Computer Analysis of Images and Patterns CAIP'91, Dresden, Germany, pp. 48-53. Koschan, A. 1992. Methodic Evaluation of Stereo Algorithms. Proc. 5th Workshop 1992 on Theoretical Foundations of Computer Vision, Buckow, Germany. R. Klette, W.G. Kropatsch (Eds.), Akademie Verlag, Berlin, Germany, Mathematical Research, Vol. 69, 1992, pp. 155-166. Koschan, A. 1993. Chromatic block matching for dense stereo correspondense. Proc. 7th Int. Conf. on Image Analysis and Processing 7ICIAP. Bari, Italy, Sept. 1993, (to appear). Krotkov, E. 1989. Active Computer Vision by Cooperative Focus and Stereo. Springer-Verlag: New York, USA. Krotkov, E., and Bajcsy, R. 1993. Active Vision for Reliable Ranging: Cooperating Focus, Stereo, and Vergence. Int. J. of Comp. Vision 11 (2): 187-203. Krotkov, E., Henriksen, K., and Kories, R. 1990. Stereo Ranging with Verging Cameras. IEEE Trans. on PAMI 12 (12): 1200-1205.
33
[LaiRom90] Laine, A.F., and Roman, G.-C. 1990. A Parallel Algorithm for Incremental Stereo Matching on SIMD Machines. Proc. 10th ICPR, Atlantic City, New Jersey, USA, Vol. II, pp. 484-490. [LaaAar87] van Laarhoven, P.J.M., and Aarts, E.H.L. 1987. Simulated Annealing: Theory and Applications. D. Riedel Publ.: Dordrecht, the Netherlands. [Lee et al93] Lee, C.-Y., Cooper, D.B., and Keren, D. 1993. Computing Correspondence Based on Region and Invariants without Feature Extraction and Segmentation. Proc. CVPR´93, New York, USA, pp. 655656. [LeeBaj92] Lee, S.W., and Bajcsy, R. 1992. Detection of Specularity Using Color and Multiple Views. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 99-114. [LeeJos93] Lee, C.-H., and Joshi, A. 1993. Correspondence Problems in Image Sequence Analysis. Pattern Recognition 26 (1): 47-61. [LiuHua93] Liu, J., and Huang, S. 1993. Using Topological Information of Images to Improve Stereo Matching. Proc. CVPR´93, New York, USA, pp. 653654. [Lud et al92] Ludwig, K.-O., Neumann, H., and Neumann, B. 1992. Local Stereoscopic Depth Estimation Using Ocular Stripe Maps. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 373-377. [LuoMai90] Luo, W., and Maitre, H. 1990. Using Surface Model to Correct and Fit Disparity Data in Stereo Vision. Proc. 10th ICPR, Atlantic City, New Jersey, USA, Vol. I, pp. 60-64. [Mar et al93] Maru, N., Nishikawa, A., Miyazaki, F., and Arimoto, S. 1993. Active Binocular Stereo. Proc. CVPR´93, New York, USA, pp. 724-725. [MarAbe93] Maruyama, M. , and Abe, S. 1993. Range Sensing by Projecting Multiple Slits with Random Cuts. IEEE Trans. on PAMI 15 (6): 647-651. [MarHil80] Marr, D., and Hildreth, E. 1980. Theory of Edge Detection. Proc. of the Royal Soc. of London 207 (B): 187-217. [MarPog76] Marr, D., and Poggio, T. 1976. Cooperative Computations of Stereo Disparity. Science 194: 283-287. [MarTri92] Marapane, S.B., and Trivedi, M.M. 1992. Multi-Primitive Hierarchical (MPH) Stereo System. Proc. CVPR´92, Champaign, Illinois, USA, pp. 499-505. [Mar82] Marr, D. 1982. Vision - A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman & Co.: New York, USA. [Mar89] March, R. 1989. A regularization model for stereo vision with controlled continuity. Pattern Recognition Letters 10: 259-263. [Mat et al89] Matthies, L., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. Int. J. of Comp.Vision 3: 209-236. [Mat89] Matthies, L. 1989. Dynamic Stereo Vision. Ph.D. Thesis, Technical Report CMU-CS-89-195, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
34
[MayFri81] [McKHsi92] [McL et al91] [Mey et al90] [MoeBou88] [Mor77] [Mul et al88]
[Nak et al92] [NakKan92] [NasLiu89] [NasCho92] [Nas92] [Nav et al90] [Neg et al92] [NevBab80] [NguCoh92]
Mayhew, J., and Frisby, J. 1981. Psychophysical and Computational Studies towards a Theory of Human Stereopsis. Artificial Intelligence 17: 349-385. McKeown Jr., D.M., and Hsieh, Y. C. 1992. Hierarchical Waveform Matching: A New Feature-Based Stereo Technique. Proc. CVPR´92, Champaign, Illinois, USA, pp. 513-519. McLauchlin, P.F., Mayhew, J.E.W., and Frisby, J.P. 1991. Stereoscopic recovery and description of smooth textured surfaces. image and vision computing 9 (1): 20-26. Meygret, A., Thonnat, M., and Berthod, M. 1990. A Pyramidial Stereovision Algorithm based on Contour Chain Points. Proc. 1st. ECCV 90, Antibes, France, pp. 83-88. Moerdler, M.L., and Boult, T.E. 1988. The Integration of Information from Stereo and Multiple Shape-From-Texture Cues. Proc. CVPR'88, Ann Arbor, Michigan, USA, pp. 524-529. Moravec, H.P. 1977. Torwards Automatic Visual Obstacle Avoidance. Proc. IJCAI'77, Cambridge, Massachusetts, USA, pp. 584. Muller, J.P., Otto, G.P., Chau, K.W., Collins, K.A., Dalton, N.M., Day, T., Dowman, I.J., Gugan, D., Morris, A.C., O´Neill, M.A. , Robert, J.G.B., Stevens, A., and Upton, M. 1988. Real-time stereo matching using transputer arrays. Proc. IGARSS 88, Edinburgh, GB, pp. 11851186. Nakayama, O., Yamaguchi, A., Shirai, Y., and Asada, M. 1992. A Multistage Stereo Method Giving Priority to Reliable Matching. Proc. Int. Conf. on Robotics and Automation, Nice, France, pp. 1753-1758. Nakahara, T., and Kanade, T. 1992. Experiments in Multiple-Baseline Stereo. Technical Report CMU-CS-93-102, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Nasrabadi, N.M., and Liu, Y. 1989. Stereo vision correspondence using a multichannel graph matching technique. image and vision computing 7 (4): 237-245. Nasrabadi, N.M., and Choo, C.Y. 1992. Hopfield Network for Stereo Vision Correspondence. IEEE Trans. on Neural Networks 3 (1): 5-13. Nasrabadi, N.M. 1992. A Stereo Vision Technique Using CurveSegments and Relaxation Matching. IEEE Trans. on PAMI 14 (5): 566579. Navab, N., Deriche, R., and Faugeras, O.D. 1990. Recovering 3D motion and structure from stereo and 2D token tracking cooperation. Proc. 3rd ICCV, Osaka, Japan, pp. 513-516. Negahdaripour, S., Kolgani, N., and Hayashi, B. 1992. Direct Motion Stereo for Passive Navigation. Proc. CVPR´92, Champaign, Illinois, USA, pp. 425-431. Nevatia, R., and Babu, K.R. 1980. Linear Feature Extraction and Description. CGIP 13: 257-269. Nguyen, H.H., and Cohen, P. 1992. Correspondence from Color Shading. Proc. 11th IAPR Int. Conf. on Pattern Recognition, the Hague, the Netherlands, Vol. I, pp. 113-144. 35
[Nis84] [Oht et al80] [Oht et al86] [Oht et al88] [OhtKan85] [Oku et al92]
[OkuKan91] [OkuKan93] [Ols90] [Ols93] [OttCha88] [Pan92] [Pav92] [Pol et al85] [Pol et al89] [Pon et al89] [Pra85] [RobFau91]
Nishihara, H.K. 1984. Practical real-time imaging stereo matcher. Optical Engineering 23 (5): 536-545. Ohta, Y.-I., Kanade, T., and Sakai, T. 1980. Color Information for Region Segmentation. CGIP 13: 222-241. Ohta, Y., Watanabe, M., and Ikeda, K. 1986. Improving Depth Map by Right-Angled Trinocular Stereo. Proc. 8th ICPR, Paris, France, pp. 519-521. Ohta, Y., Yamamoto, T., and Ikeda, K. 1988. Collinear Trinocular Stereo Using Two-Level Dynamic Programming. Proc. 9th ICPR, Rome, Italy, pp. 658-662. Ohta, Y., and Kanade, T. 1985. Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming. IEEE Trans. on PAMI 7 (2): 139154. Okutomi, M., Yoshizaki, O., and Tomita, G. 1992. Color Stereo Matching and its Application to 3-D Measurement of Optic Nerve Head. Proc. 11th IAPR Int. Conf. on Pattern Recognition, the Hague, the Netherlands, Vol. I, pp. 509-513. Okutomi, M., and Kanade, T. 1991. A Multiple-Baseline Stereo. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 63-69. Okutomi, M., and Kanade, T. 1993. A Multiple-Baseline Stereo. IEEE Trans. on PAMI 15 (4): 353-362. Olsen, S.I. 1990. Stereo Correspondence by Surface Reconstruction. IEEE Trans. on PAMI 12 (3): 309-315. Olson, T.J. 1993. Stereopsis for Verging Systems. Proc. CVPR´93, New York, USA, pp. 55-60. Otto, G.P., and Chau, T.K.W. 1988. A "Region-Growing" Algorithm for Matching of Terrain Images. UCL-CS Research Note RN/88/14, University College London, Department of Computer Science, GB. Pan, X. 1992. What we need is an integrated test system A response to the letter of Pavlidis. Pattern Recognition Letters 13: 541-543. Pavlides, T. 1992. Why progress in machine vision is so slow. Pattern Recognition Letters 13: 221-225. Pollard, S.B., Mayhew, J.E.W., and Frisby, J.P. 1985. PMF: A stereo correspondence algorithm using a disparity gradient limit. Perception 14: 449-470. Pollard, S.B., Pridmore, T.P., Porrill, J., Mayhew, J.E.W., and Frisby, J.P. 1989. Geometrical Modeling from Multiple Stereo Views. Int. J. of Robotics Research 8 (4): 3-32. Pong, T.-C., Haralick, R.M., and Shapiro, L.G. 1989. Matching topographic structures in stereo vision. Pattern Recognition Letters 9: 127136. Prazdny, K. 1985. Detection of Binocular Disparities. Biological Cybernetics 52: 93-99. Robert, L., and Faugeras, O.D. 1991. Curve-Based Stereo: Figural Continuity and Curvature. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 57-62.
36
[RobFau93] [Rom et al88] [Ros93] [RubRic82] [Sas93] [Sha93] [Sha et al88]
[ShePel90] [Shi87] [ShiNis85] [Shi92] [Shi93] [SieUrq90] [SkeLiu92]
[SmiNan93] [Sny91] [Son et al93]
Robert, L., and Faugeras, O.D. 1993. Relative 3D Positioning and 3D Convex Hull Computation from a Weakly Calibrated Stereo Pair. Proc. 4th ICCV, Berlin, Germany, pp. 540-544. Roman, G.-C., Laine, A.F., and Cox, K.C. 1988. Interactive Complexity Control and High-Speed Stereo Matching. Proc. CVPR'88, Ann Arbor, Michigan, USA, pp. 171-176. Ross, B. 1993. A Practical Stereo Vision System. Proc. CVPR´93, New York, USA, pp. 148-153. Rubin, J.M., and Richards, W.A. 1982. Color Vision and Image Intensities: When are Changes Material? Biol. Cybern. 45: 215-226. Sasse, R. 1993. Bestimmung von Entfernungsbildern durch aktive stereoskopische Verfahren. Ph. D. Thesis, Dept. of Computer Science, Technical University Berlin, Germany (in German). Shah, J. 1993. A Nonlinear Diffusion Model for Discontinuous Disparity and Half-Occlusions in Stereo. Proc. CVPR´93, New York, USA, pp. 3440. Shao, M., Simchony, T., and Chellappa, R. 1988. New Algorithms for Reconstruction of a 3-D Depth Map from one or more Images. Technical Report USC-SIPI No. 113, Signal and Image Processing Institute, University of Southern California, Los Angeles, USA. Sherman, D., and Peleg, S. 1990. Stereo by Incremental Matching of Contours. IEEE Trans. on PAMI 12 (11): 1102-1106. Shirai, Y. 1987. Three-Dimensional Computer Vision. Springer-Verlag: Berlin, Germany. Shirai, Y., and Nishimoto, Y. 1985. A Stereo Method Using Disparity Histograms of Multi-Resolution Channels. 3rd Int. Symp. on Robotics Research, Gouvieux, France, pp. 27-32. Shizawa, M. 1992. On Visual Ambiguities Due to Transparency in Motion and Stereo. Proc. 2nd ECCV´92, Santa Margherita Ligure, Italy, pp. 411-419. Shizawa, M. 1993. Direct Estimation of Multiple Disparities for Transparent Multiple Surfaces in Binocular Stereo. Proc. 4th ICCV, Berlin, Germany, pp. 447-454. Siebert, J.P., and Urquhart, C.W. 1990. Active stereo: texture enhanced reconstruction. Electronics Letters 26 (7): 427-430. Skerjanc, R., and Liu, J. 1992. Computation of Intermediate Views for 3DTV. Proc. 5th Workshop 1992 on Theoretical Foundations of Computer Vision, Buckow, Germany. R. Klette, W.G. Kropatsch (Eds.), Akademie Verlag, Berlin, Germany, Mathematical Research, Vol. 69, 1992, pp. 191-202. Smith, P.W., and Nandhakumar, N. 1993. An Accurate Stereo Correspondence Method for Textured Scenes Using Improved Power Cepstrum Techniques. Proc. CVPR´93, New York, USA, pp. 651-652. Snyder, M.A. 1991. A commentary on the paper by Jain and Binford. CVGIP: Image Understanding 53 (1): 118-119. Sonka, M., Hlavac, V., and Boyle, R. 1993. Image Processing, Analysis and Machine Vision , Chapman & Hall: London, UK. 37
[SteMac90] [Sto et al90] [SwaStr93] [Sug et al91] [TakTom88] [Tir et al90] [Tir et al92] [TohFor90] [Tsa83] [TuDub90] [VaiBoy91] [Vle93] [WalMer92] [WatOht90] [Wei89] [Wen90] [WeyMoe88] [WilKnu89]
[Wil90]
Stewart, C.V., and MacCrone, J.K. 1990. Experimental Analysis of a Number of Stereo Matching Components using LMA. Proc. 10th ICPR, Atlantic City, New Jersey, USA, Vol. I, pp. 254-258. Stoner, G.R., Albright, T.D., and Ramachandran, V.S. 1990. Transparency and coherence in human motion perception. Nature 344: 153-155. Swain, M.J., and Stricker, M.A. (Eds.). 1993. Promising Directions in Active Vision. Int. J. of Comp. Vision 11 (2): 109-126. Sugimoto, K., Takahashi, H., and Tomita, F. 1991. Integration and Interpretation of Incomplete Stereo Scene Data. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 683-685. Takahashi, H., and Tomita, F. 1988. Planarity Constraint in Stereo Matching. Proc. 9th ICPR, Rome, Italy, pp. 446-449. Tirumalai, A.P., Schunck, B.G., and Jain, R.C. 1990. Dynamic Stereo with Self-Calibration. Proc. 3rd ICCV, Osaka, Japan, pp. 466-470. Tirumalai, A.P., Schunck, B.G., and Jain, R.C. 1992. Dynamic Stereo with Self-Calibration. IEEE Trans. on PAMI 14 (12): 1184-1189. Toh, P.-S., and Forrest, A.K. 1990. Occlusion Detection in Early Vision. Proc. 3rd ICCV, Osaka, Japan, pp. 126-132. Tsai, R.Y. 1983. Multiframe Image Point Matching and 3-D Surface Reconstruction. IEEE Trans. on PAMI 5 (2): 159-174. Tu, X.-W., and Dubuisson, B. 1990. 3-D Information Derivation from a Pair of Binocular Images. Pattern Recognition 23 (3-4): 223-235. Vaidya, N.M., and Boyer, K.L. 1991. Stereopsis and Image Registration from Extended Edge Features in the Absense of Camera Pose Information. Proc. CVPR´91, Lahaina, Maui, Hawaii, USA, pp. 76-81. de Vleeschauwer, D. 1993. An Intensity-Based, Coarse-to-Fine Approach to Reliably Measure Binocular Disparity. CVGIP: Image Understanding 57 (2): 204-218. Waldmann, J., and Merhav, S. 1992. Fusion of Stereo and Motion Vision for 3-D Reconstruction. Proc. 11th IAPR Int. Conf. on Pattern Recognition, The Hague, the Netherlands, Vol. I, pp. 5-8. Watanabe, M., and Ohta, Y. 1990. Cooperative Integration of Multiple Stereo Algorithms. Proc. 3rd ICCV´90, Osaka, Japan, pp. 476-480. Weinshall, D. 1989. Perception of multiple transparent planes in stereo vision. Nature 341: 737-739. Weng, J. 1990. A Theory of Image Matching. Proc. 3rd ICCV, Osaka, Japan, pp. 200-209. Weymouth, T.E., and Moezzi, S. 1988. Wide Base-Line Dynamic Stereo: Approximation and Refinement. Proc. CVPR'88, Ann Arbor, Michigan, USA, pp. 183-188. Wilson, R., and Knutsson, H. 1989. A Multiresolution Stereopsis Algorithm Based on the Gabor Representation. Proc. 3rd Int. Conf. on Image Processing and its Applications, University of Warwick, GB, pp. 19-22. Williams, L.R. 1990. Perceptual Organization of Occluding Contours. Proc. 3rd ICCV, Osaka, Japan, pp. 133-137. 38
[Yac et al86] Yachida, M., Kitamura, Y., and Kimachi, M. 1986. Trinocular Vision: New Approach for Correspondence Problem. Proc. 8th ICPR Paris, France, pp. 1041-1044. [Yan et al93] Yang, Y., Yuille, A., and Lu, J. 1993. Local, Global, and Multilevel Stereo Matching. Proc. CVPR´93, New York, USA, pp. 274-279. [YosHir92] Yoshida, K., and Hirose, S. 1992. Real-Time Stereo Vision with Multiple Arrayed Camera. Proc. Int. Conf. on Robotics and Automation, Nice, France, pp. 1765-1770. [ZhaFau92] Zhang, Z., and Faugeras, O. 1992. 3D Dynamic Scene Analysis. Springer Verlag: Berlin, Germany. [ZhoChe88] Zhou, Y.T., and Chellappa, R. 1988. Stereo Matching Using a Neural Network. Technical Report USC-SIPI No. 124, Signal and Image Processing Institute, University of Southern California, Los Angeles, USA.
39
APPENDIX: A Survey on Stereo Matching Techniques This survey contains 87 selected stereo papers published since 1989, 66 papers on binocular monochromatic stereo (part 1), 15 papers on polynocular monochromatic stereo (part 2) and 7 selected papers on binocular chromatic stereo (part 3). This survey is not intended to be complete due to the great amount of journals, conferences, and workshops held all over the world on this topic. All papers in this survey are written in English language and they are chosen due to importance, novelty, and originality. Nevertheless, some important papers may have escaped our notice and we should like to apologize to all authors who do not find their paper in this survey. The approaches are listed in tabular form. A brief description of the methods is given including the applied constraints and the techniques of feature extraction (if features are used). Moreover, the primitives being part in the correspondence search are listed as well as the application tasks and the classes of images used in the evaluation process. The following abbreviations are used in this survey. The reference of the stereo paper is given in the column "Publication". More than one reference are listed if multiple publications have been found for one method. An abstract of the suggested technique is presented in the column "Method". The constraints are listed below (behind "C:") if they are mentioned in the papers. The primitives being part of the correspondence analysis are mentioned behind "F:", where "ZCs" is the abbreviation of "zero-crossings of the LoG-filtered image". The matched features are classified in the column "Ma." into curves (C), cepstra (CA), edges (E), ellipses (EL), line segments (LS), normal vectors (NV), pixels (P), regions (R), supersegments (SS), and topographic or topological structures (TS). If the stereo approach has been designed for a selected application task and this task has been mentioned in the paper, a corresponding entry results in the column "Appl.". Otherwise, the term "no information" (no info.) is entered into this column. The classes of image data being part in the experimental evaluation of the methods are listed in the column "Exp.". Images being taken by a real camera and representing physically existing scenes are distinguished as indoor or outdoor images. Line drawings and images generated by applying computer graphics techniques are called "synthetical" (synth.) images. Random Dot Stereograms (RDS) (see [Jul71]) are explicitly mentioned because of their relevance to psychological investigations. All entries in the following tables are exclusively effected by the details given in the papers. No personal statements or conjectures are added by the author.
40
Some selected statistics are presented in the following to illustrate the distribution of task domains, matching primitives, and classes of test images mentioned in the 87 stereo papers. no information
mobile robot
sorting robot
motion analysis
50
28
1
1
photogrammetry
aerial inspection
cartography
medicine
1
2
1
1
No.
No.
Tab. A.1: Number of papers giving details about a specific field of application.
No.
C
CA
E
EL
LS
NV
P
R
SS
TS
8
1
13
1
9
2
44
4
1
3
Tab. A.2: Number of papers matching a specific primitive.
No.
only id
only od
only syn
only mi
id + od
id + RDS
42
7
3
1
7
4
od + RDS
syn + id
syn + od
syn + mi
id + od + syn
2
6
1
1
1
No.
id + od + RDS
id + syn + RDS
4
2
No.
od + syn + RDS id + od + syn + RDS 3
2
Tab. A.3: Number of papers giving details about specific classes or combinations of different classes of test images used in the evaluation process. The following abbreviations are used in this table: indoor image (id), outdoor image (od), synthetic image (syn), Random Dot Stereogramm (RDS), and medical image (mi). 1 image 2 images 3 images 4 images 6 images No.
12
7
5
5
2
55 images
series / seq. of images
1
30
Tab. A.4: Number of papers giving details about the number of indoor images used in the evaluation process.
No.
1 image
2 images
4 images
series / sequence of images
11
5
1
11
Tab. A.5 Number of papers giving details about the number of outdoor images used in the evaluation process. 41
Part 1: Binocular Monochromatic Stereo Publication
Method
Ma.
[Abd et al93]
Dynamic stereo approach. The algorithm starts byP calculating the instantaneous FOE (focus of expansion). Knowing FOE a MAP estimate of the displacement is calculated for every pixel and an associated confidence measure. C: epipolar geometry, surface smoothness, F: no feature extraction. [AloHer90] Aggregate stereo approach computing surface normalsN [AloShu89] without point-to-point correspondence by geometricV equations. All point correspondences are considered at once. C: all objects in the scene must have planar surfaces, F: no feature extraction. [Alv et al89] Dynamic stereo using axial motion of 1 camera moving inP the direction of the optical axis. Matching is established using constraints on the luminance distribution in the scene ("scene radiance - image irradiance model"). C: epipolar geometry, surface smoothness, F: no feature extraction. [Bar89] A stochastic stereo approach using a variant of theP simulated annealing technique to compute dense depth maps. A hierarchical coarse-to-fine control structure employing Gaussian or Laplacian pyramids is used for multi-resolution matching. C: epipolar geometry, photometric similarity, disparity smoothness, F: no feature extraction. [BelMum92 A dense depth map is obtained by minimizing an energyP ] functional using a Bayesian approach. The energy [Bel93] function was defined as combination of two functions, one being defined for half-occluded parts in the image and another one being defined for non-occluded parts in the image [BelMum92]. Moreover, the energy functional was extended to incorporate surface orientation and creases [Bel93] and "iterated stochastic dynamic programming" was applied. C: epipolar geometry, ordering, photometric similarity, F: no feature extraction.
42
Appl.
Exp.
mobile 1 robot indoor image seq.
no series info. of indoor image s mobile 2 robot indoor image s
no RDS, info. 2 outdoor image s
no 4 info. indoor image s
Publication
Method
Ma. Appl.
[Bon et al92]
Regions representing planar surfaces in the scene areR matched using a planarity constraint guaranteeing the existence of an affine transform between the coordinates of the corresponding points in the left and the right images. C: epipolar geometry, planarity constraint, only polyhedrial objects, F: segmented regions. [Boy et Feature-based stereo called dynamic edge warpingE al90] (DEW). The approach is based on structural stereopsis [Boy et [Boy et al86] using Dynamic Programming and works al91] with uncalibrated stereo image pairs. C: epipolar geometry, figural continuity, photometric consistency, global consistency, ordering, disparity smoothness, F: "ZCs". [Bro et al90] Feature-based stereo using a structural pattern recognitionE approach. Oriented-edge graphs are obtained with edge vectors and filtered to obtain feature points. At each stage of a resolution pyramid, pseudo-hexagonal gray scale arrays are used to bypass a four-connectivity paradox. Finally, a normalized gray-scale correlation is used to refine the results. C: epipolar geometry, similarity constraint, F: edge vectors. [Buu92] Feature-based stereo. Ellipses are matched due to theirE position in a specific volume in space. L C: epipolar geometry, similarity constraint, F: feature detection is not specified.
Exp.
mobile 1 robot indoor image
aerial 2 inspec- indoor tion + 1 aerial image
photo- 1 gram- aerial metry image
mobile series robot of indoor image s [ChaCha90] A dense depth map is computed by minimizing a costP no RDS, [Cha et function using stochastic relaxation [ChaCha90], [Cha et info. synth. al91] al91] or a deterministic version of the simulated annealing + [ChaCha92] algorithm (mean field approximation) [ChaCha92]. series of outC: epipolar geometry, photometric similarity, surface door smoothness, image F: no feature extraction. s no 1 out[CheMed90] Feature-based multi-scale (coarse-to-fine) hierarchicalP info. door + stereo approach. The features are extracted by adaptive 1 insmoothing and the matching is established in accordance door with [DruPog86] on a Connection Machine. image C: epipolar geometry, continuity, uniqueness, compatibility, F: edges from multi-scale adaptive smoothing.
43
Publication
Method
Ma. Appl.
[ChuNev91] Feature-based hierarchical approach that uses structuralT descriptions up to surface level to deal with occlusionsS Surface descriptions are computed from monocular images by using perceptual grouping technique. C: epipolar geometry, uniqueness, mutual constraints, F: edges [Can86], curves, contours, ribbons. [CocMed89] Integration of area-based and feature-based stereo. AP [CocMed92] dense depth map computed by cross correlation is refined by matching edges in a multi-resolution pyramid. C: epipolar geometry, ordering constraint, surface smoothness, no isolated pixels, F: intensity values + edges (Canny [Can86] or NevatiaBabu [NevBab80]). [Coh et Hierarchical region based stereo approach using aR al89] combination of region segmentation and correspondence analysis. A hierarchical tree is established for every image applying "split and merge" segmentation. Matching is established by checking similarity at several levels in the hierarchical tree. C: no statement, F: regions segmented using split-and-merge techniques whose maximum area is bounded by contours from [Der87]. [DerFau90] Feature points corresponding to points with highC curvature are extracted from each image and matched. A correspondence between curves is then established using figural continuity. C: epipolar geometry, disparity limit, figural continuity, F: edges by [Der87]. [DhoAgg92] Feature-based hierarchical stereo using a dynamicE disparity search in different levels of resolution. The correspondence process takes place in a background and a foreground disparity pool to deal with narrow occluding objects. C: epipolar geometry, hierarchical disparity limit, F: "ZCs". [Fle91] Coarse-to-fine approach to obtain a dense subpixel depthT maps by comparing topological structures. At each scale,S the two images are compared at a variety of possible disparities. The candidates data include disparities found at the previous (next coarser) scale, plus a range of similar values. C: uniqueness, smoothness, topological similarity, F: no explicit feature extraction.
44
Exp.
no 2 info. indoor image s
no RDS, info. 1 outdoor + 4 indoor image s mobile 1 robot synth. + 1 indoor image
mobile series robot of indoor image s no 3 info. indoor image s
no RDS, info. synth. 3 indoor + 2 outdoor image s
Publication
Method
Ma. Appl.
[Fua91]
Exp.
Area-based correlation approach followed byP no 1 ininterpolation to obtain a dense depth map. During the info. door + correlation phase the pixel in the two images have to 1 outshow mutual consistency to be matched. The information door is then interpolated across the featureless areas but not image across depth discontinuities taking image gray levels into account to preserve image features. C: epipolar geometry, uniqueness, F: no feature extraction. [GazMed89] Feature-based multi-resolution (coarse-to-fine) approachS motion 2 to match supersegments (lists of line segments belongingS analy- series to one contour). Line segments are matched first due to sis of outtheir contrast and orientation. Adjacent line segments are + matched then due to similarity. indoor image C: disparity limit, figural continuity, s F: line segments by applying [Can86] to adaptively filtered images no 1 [Gei et al92] Area-based stereo using the Bayesian approach toP compute a dense depth map. Adaptive correlation info. aerial image between windows is used with an occlusion constraint and Dynamic Programming. C: epipolar geometry, disparity smoothness, occlusion constraint, uniqueness, F: no feature extraction. [HofAhu89] Integration of correspondence analysis and surfaceP no series reconstruction. A coarse-to-fine strategy (starting with a info. of planar surface) is used for surface reconstruction. Pixels synth. are matched based on least squared differences between outthe 3D points and the reconstructed surface. door + indoor C: epipolar geometry, surface smoothness, image F: "ZCs". s [HorSko89] Feature-based approach matching straight lines due toL mobile series S robot of their neighborhood relations in a graph. indoor C: geometric constraints on epipolar lines, the position image of image points and the orientation of lines in the s image, F: linear line segments of connected edges from [Can86]. [HuaDub89] Feature-based approach comparing and matching strings.P no series The strings are zero crossing patterns coded due to the info. of direction of the zero crossing along an epipolar line. indoor image C: epipolar geometry, continuity, disparity limit, s ordering constraint, F: "ZCs".
45
Publication
Method
Ma. Appl.
[JepJen89] Disparity is expressed in terms of phase differences in theP [Fle et al91] output of local, bandpass filters (Gabor) applied to the left and right image. Dense depth maps can be obtained with subpixel accuracy without subpixel feature detection. C: epipolar geometry, F: no feature extraction. [JonMal92] A dense depth map is obtained using a bank of linearP spatial filters at multiple scales and orientations. A technique based on using the pseudo-inverse is applied to characterize the information presented in a vector of filter responses. Occlusion is handled by defining a visibility map due to non-corresponding points. C: epipolar geometry, piecewise smoothness, depth discontinuity, F: linear spatial filter responses. [KanOku90] Intensity-based stereo using correlation techniques. AnP optimal search window is estimated for every pixel using a statistical measure. C: epipolar geometry, F: no feature extraction.
Exp.
no RDS, info. out- + indoor image s no RDS, info. series of synth. + indoor image s
no series info. of synth. + indoor image s [Kim et Feature-based fuzzy stereo approach matching lineL mobile 2 segments due to a probability (possibility) given for aS robot indoor al92] match by edge orientation, edge length, and maximum image disparity. s C: epipolar geometry, disparity limit, geometric similarity, F: edges from [NevBab80]. [Kro et al90]Feature -based stereo with verging cameras. The matcherL mobile series proceeds by recursive prediction and verification: first,S robot, of hypothetical matches are generated due to the orientation grab indoor length and gradient magnitude of the line segments; image second it prunes the hypotheses due to local and global s constraints. C: epipolar geometry, disparity limit, uniqueness, geometric similarity, continuity, F: straight lines from [Bur et al86]. R no 2 [Lee et al93] Correspondence analysis by affine moment invariants. info. indoor C: Corresponding regions are assumed to be related by image affine transformations, s F: no feature or segmentation is needed. [LiuHua93] Feature-based stereo using Dynamic Programming withE no indoor additional topological characteristics in the cost function info. image (modification of [OhtKan85]). s C: epipolar geometry, global consistency, 46
Publication
Method
Ma. Appl.
Exp.
F: unspecified edge detection. [Lud et Physiology-based approach. A dense depth map isC no synth. al92] obtained by simple maximum detection in the cepstralA info. + plane. 2 indoor C: epipolar geometry, Panum´s fusional area, image F: cepstrum of the image. s 1 [MarAbel93 Feature-based active stereo projecting line patterns withL no random cuts on object surfaces. The correspondence isS info. indoor ] found by using adjacency relations between the line image segments and an ordering constraint. C: epipolar geometry, ordering, F: feature extraction is not specified. no 1 [Mar et Feature-based "active" stereo using multiple baselines.P The second camera is lateral shifted with known motion info. indoor al93] and the edgels in the images are matched due to heir image position in the shifted image. C: epipolar geometry, uniqueness, static scene, F: "ZCs". [MarTri92] Feature-based multi-primitive hierarchical stereo. StereoP no series analysis is performed in multiple levels utilizing a info. of hierarchical control strategy. The results at higher levels indoor are used for guidance at the lower levels. + outdoor C: hierarchical (spatial, relational, interval, and disparity image constraints), s F: regions, edge segments, and edgels. no 1 [Mar89] Regularization of the disparity estimation based on theP info. synth. solution of an Euler-Lagrange equation. image C: "controlled continuity" (piecewise continuous regularization), F: "ZCs". [McKHsi92] Feature-based stereo which matches waveforms alongP carto– 1 epipolar lines due to the similarities between waveforms graphy aerial. and intensities in multi resolutions (coarse-to-fine). image C: epipolar geometry, F: waveforms (approximations of the intensity profiles). [McL et Feature-based approach using a disparity histogram forE mobile 2 potential edge matches, the Hough transform for aboverobot indoor al91] threshold peaks in the histogram, and hypothesis in image adjacent overlapping patches on matching candidates. A region growing procedure locates large areas of mutually connected hypothesis. C: epipolar geometry, continuity, smoothness, textured surfaces, F: edges from [Can86]. 47
Publication
Method
Ma. Appl.
[Mey et al90]
Feature-based pyramidal approach matching contours in 4C resolutions (4 image sizes) due to the orientation and magnitude of the intensity gradient. Validation of the matches assuming figural continuity. C: epipolar geometry, uniqueness, figural continuity, F: contour chains based on the local maxima in the first derivative. [Nak et Feature-based stereo. In a multi-stage process, startingE al92] from high contrasted ZCs (with high gradient magnitude) to low contrasted ZCs, matching is established due to peaks in local disparity histograms (cp. [ShiNis85]). C: epipolar geometry, figural continuity, F: "ZCs". [NasLiu89] 3-channel approach for matching curve-segments in aC [Nas92] relational graph using the Hough transform and the length of the curves. C: epipolar geometry, figural continuity, F: contour-segments being linked "ZCs". [NasCho92] An optimization approach is used to solve theP correspondence problem for a set of features in the images. A cost function representing the matching constraint is mapped onto a two-dimensional Hopfield neural network. C: surface smoothens, F: interest points from [Mor77]. [Ols90] Integration of surface reconstruction and correspondenceP analysis. Matching in multiple resolutions (coarse-to-fine) using edge orientation and the disparity gradient. C: epipolar geometry, disparity limit, disparity gradient limit, local smoothness, F: "ZCs".
Exp.
mobile 2 outrobot door (ob- image stacle s avoidness) mobile indoor robot image s
mobile series robot of indoor image s mobile series robot of indoor image s
no RDS, info. synth. outdoor + indoor image s [Ols93] Multi-resolution area-based stereo using a LaplacianP mobile series pyramid and an active vision system with verging robot of cameras. Matching candidates are classified by synth. normalized correlation in a 5x5 neighborhood in every + resolution. 1 indoor C: epipolar geometry, disparity limit (< 7 values), image F: no feature extraction. [Pol et al89] Feature-based approach using the disparity gradient limitL mobile RDS [Pol et al85]), matching due to a maximum of a weightingS robot + function. series of C: epipolar geometry, uniqueness, disparity gradient indoor limit, image F: edges from the Canny operator [Can86]. 48
Publication
Method
Ma. Appl.
Exp.
s [Pon et al89] Matching of topographic structures (arc- and region-T mobile 1 segments) concerning their similarity in orientation,S robot medcontrast and size. ical image C: epipolar geometry, disparity limit, F: topographic structures (arc- and region-segments)., [RobFau93] Approach to compute relative position and convex hullP mobile 2 robot indoor from weakly calibrated images with exclusively known image epipolar geometry. s C: only epipolar geometry, F: no feature extraction. no 1 RDS [Sha93] A dense depth map is obtained by using energyP info. + functionals designed to deal with half-occlusions. A 1 nonlinear system of diffusion equations is derived by indoor simultaneously applying gradient descent to these image functionals. C: epipolar geometry, ordering, F: no feature extraction. 4 [ShePel90] Feature-based stereo matching contours by similarC no info. aerial orientation and contrast sign. Best matched contours are image paired first, constraining through neighborhood support s their neighboring contours. C: epipolar geometry, figural continuity, disparity limit, ordering, F: contours from [Der87]. no 1 [SieUrq90] Active stereo projecting random noise texture onto theP info. indoor scene to enhance stereo matching. A Gaussian windowed image first order moment calculation is used to find correspondences in multiscale LoG-filtered channels. C: epipolar geometry, textured scenes, F: no explicite feature extraction. no 1 [SmiNan93] Dense stereo maps are obtained by matching powerP info. indoor cepstra. image C: epipolar geometry, textured scenes, F: power cepstrum. 1 [SteMac90] Feature-based stereo using a modified form of disparityE no gradient support and a multi-resolution support method info. synth. that bases support on similarities between matches at +1 different resolution levels. A final consistency check aerial helps to filter out incorrect matches. image C: epipolar geometry, disparity gradient, disparity limit, F: edges from [Can86].
49
Publication
Method
Ma. Appl.
[TuDub90] Combination of feature-based and area-based stereo.C Using neighborhood relations connected contours are recursively matched first for every epipolar line. The matching results are verified point by point then using correlation between intensity values in local windows in both images. C: epipolar geometry, ordering constraint, global consistency, F: "ZCs" + intensity values. [VaiBoy91] Feature-based approach in uncalibrated domains usingC extended edge contours as a source of primitive. the method is a new implementation of the structural stereopsis [Boy et al86]. C: geometric constraints, F: "ZCs". [Vle93] Intensity-based coarse-to-fine approach to obtain a denseP depth map by minimizing a cost function and interpolation based on smoothness constraints. The technique is based on the simplex algorithm and its associated sensitivity analysis. C: epipolar geometry, surface smoothness, F: gradient of the intensity difference between the left and right image. [Wen90] A dense depth map is obtained by matching windows ofP Fourier phases. C: epipolar geometry, F: Fourier phase.
[WilKnu89] A dense depth map is obtained by matching power spectraP in the frequency domain using the Gabor representation in 4 resolutions (coarse-to-fine). Proof for the 1-D case only, C: disparity limit, F: power spectra in frequency domain. [Yan et A dense depth map is obtained by a multi-level (fine-to-P al93] coarse instead of a coarse-to-fine) approach. A local matching module cascaded with a global matching module is used. Local matching outputs a 3D gray-scale image in which each and every point has an intensity measuring the goodness of a possible match. Global matching is reduced to (disparity) surface fitting. C: epipolar geometry, smoothness of disparity surface, F: no feature extraction.
50
Exp.
no 3 info. indoor image s
no 2 info. aerial image s
no 1 info. RDS, 1 synth. +1 indoor image no 3 RDS info. + 2 outdoor image s no 2 info. image swhite noise+ stripes no series info. of outdoor + aerial image s
Part 2: Polynocular Monochromatic Stereo Publication
Method
Ma. Appl.
[AyaLus91] Trinocular approach with "free" camera arrangement.L [Aya91] Candidates for correspondence are found due toS orientation and magnitude of linear approximated line segments. Validation of the matchings using geometric and neighborhood relations. The images are rectified by reprojection to an image plane with the epipolar lines being identical to the horizontal scanlines. C: epipolar geometry, continuity constraint, geometric similarity, disparity gradient limit, F: linear approximated line segments extracted with [Can86]. [BriBra90] Trinocular approach to match curves representingC parametric elastic strings / snakes. Correspondence is established due to a disparity gradient and the minimum energy needed to map a curve in one image to the other image. First, two images are compared, then the third image is used for a consistency check. C: uniqueness, ordering constraint, epipolar lines should "nearly" match the image scanlines, F: curves defined by connected edge elements extracted with [Can86]. [ChaBer93] Planar surfaces are recovered based on projectiveL S geometry. C: polyhedral scene, 2D faces are already extracted and matched, F: no specification. [EnsLi93] Real-time realization of feature-based motion stereoP assuming that the incremental disparity is less than the minimum distance between edges in a search window. A multi-scale algorithm with a Gaussian pyramid is used to relax that constraint. C: epipolar geometry, uniqueness, compatibility, ordering, F: edge pixels from Sobel operator. [Ish et al90] Feature-based omni-directional stereo using one rotatingE [Ish et al92] camera. Matching is established in two panoramic views by Dynamic Programming. Multiple local maps are combined to build a more reliable global map. C: "epipolar geometry", F: edges by the Sobel operator with subpixel accuracy. 51
Exp.
mobile series robot of synth. and indoor image triples
mobile 1 robot indoor image triple
no 1 info. synth. image triple sort2 ing indoor robot image seq.
mobile indoor robot image seq.
Publication
Method
Ma. Appl.
[Mat89]
Dynamic stereo approach which matched pixels based onP statistical correlation. Matching is established between static binocular images as well as between dynamic image sequences. C: smoothness, disparity limit, F: "interest points" from the Moravec operator [Mor77]. [OkuKan91] Dense area-based stereo using multiple images withP [OkuKan93] different baselines obtained by a lateral displacement of a camera. Matching is performed by computing the sum of squared-differences in the DOG-filtered images. C: epipolar geometry, F: no explicit feature extraction. [RobFau91] Feature-based trinocular approach based on [Aya91]. TheL primitives it works on are cubic B-spline approximationsS of the 2D edges. C: epipolar geometry, continuity constraint, geometric similarity, disparity gradient limit, F: B-spline approximations of edges from [Der87]. [Sug et al91]Feature-based approach for a mobile robot in aunstructured environment to generate boundary representations of the large environment by incrementally integrating local and incomplete stereo data. C: no specification, F: no feature extraction. [Tir et al90] Dense feature-based approach for incremental refinementP [Tir et al92] of disparity maps obtained from dynamic stereo sequences of a static scene (cp. [Mat89]). Interest points are matched along the epipolar lines and the motion is estimated by linear least-square regression. Disparity fusion is established by Kalman filtering. C: epipolar geometry, unity, F: "interest points" from the Moravec operator [Mor77] and edges from the Canny operator [Can86]. [YosHir92] Real-time stereo using a multiple arrayed camera systemP named MAC with 5 collinear cameras. Correspondence is established by using geometrical relations along the epipolar lines in all images. C: epipolar geometry, ordering, F: no feature extraction.
52
Exp.
mobile 55 robot indoor image s
no 2 sets info. of indoor image s mobile 3 robot indoor image triples
mobile 3 robot views of indoor scene mobile 2 robot indoor image seq.
mobile series robot of out+ indoor image s
Part 3: Binocular Chromatic Stereo Publication
Method
Ma. Appl.
[BroYan89] Feature-based approach using the opponent color modelE for image representation. Each potential match between edges is given a probability with regard to edge orientation, edge contrast, and a maximum disparity value defined by Panum´s fusional area. Assuming figural continuity and global surface smoothness, a relaxation labeling is used to increase the probabilities of possible matches that agree in disparity with the majority of possible matches within the local neighborhood. C: epipolar geometry, continuity, geometrical similarity, figural continuity, surface smoothness, disparity limit, F: edges from zerocrossing of second directional derivatives. [Jor et al89] Feature-based stereo approach matching edges byE [JorBov91] similarity of orientation, contrast sign, and the chromatic gradient in every color. In addition, the chromatic gradient limit has been added to the PMF [Pol et al85] algorithm. C: uniqueness, epipolar geometry, figural similarity, chromatic gradient limit, F: "ZCs" in every color channel. [JorBov92] A dense disparity map is computed by using the simulatedP annealing technique to minimize an energy function. The similarity of potential matches is measured by computing the absolute differences between the intensity values in the R, G and B components. Together with a disparity smoothness constraint, that is, the disparity values are similar in the four-connected neighborhood, the energy function is defined. C: epipolar geometry, uniqueness, disparity smoothness, chromatic photometric similarity, F: no feature extraction. [Kos93] An area-based approach providing a dense depth map byP applying a refined Block Matching technique in different color spaces with different color measures. C: epipolar geometry, disparity limit, chromatic photometric similarity, F: no feature extraction.
53
Exp.
no RDS info. + 2 indoor image s
no 4 info. indoor image s
no RDS, info. 6 indoor image s
no 6 info. indoor image s
Publication
Method
Ma. Appl.
Exp.
[NguCoh92] Feature-based stereo matching regions with similarR no series curvature characteristics in both images. The matching is info. of performed separately for the R, G and B channel and the indoor results are concatenated by averaging. A dense disparity image map is computed by a succeeding spline interpolation. s C: disparity continuity, curvature consistency, F: region detection as in [Aud et al91]. [Oku et Area-based technique using the sum of squaredP medi- synth. al92] differences (SSD) of the intensity values in the three color cine + channels R, G and B for correspondence. This technique mediwas applied to the 3D measurement of optic nerve heads. cal image C: no statement, s F: no feature extraction.
54