Extended view interpolation by parallel use of the GPU ... - CiteSeerX

2 downloads 0 Views 914KB Size Report
For every feature (pim1) in one image, all possible corresponding features (pim2) in the other image are collected. Selection is based on the criterion that pim2 ...
Extended view interpolation by parallel use of the GPU and the CPU. Indra Geysa and Luc Van Goola,b a ESAT

/ PSI-VISICS, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium; b D-ITET / BIWI, Swiss Federal Institute of Technology (ETH), Gloriastrasse 35, 8092 Z¨ urich, Switzerland ABSTRACT

This paper presents an algorithm for efficient image synthesis. The main goal is to generate realistic virtual views of a dynamic scene from a new camera viewpoint. The algorithm works online on two or more incoming video streams from calibrated cameras. A reasonably large distance between the cameras is allowed. The main focus is on video-conferencing applications. The background is assumed to be static, as is often the case in such environments. By performing a foreground segmentation, the foreground and the background can be handled separately. For the background a slower, more accurate algorithm can be used. Reaching a high throughput is most crucial for the foreground, as this is the dynamic part of the scene. We use a combined approach of CPU and GPU processing. Performing depth calculations on the GPU is very efficient, thanks to the possibilities of the latest graphical boards. However the result tends to be rather noisy. As such we apply a regularisation algorithm on the CPU to ameliorate this result. The final interpolation is again provided by rendering on the graphical board. The big advantage of using both CPU and GPU is that they can run completely in parallel. This can be realised by an implementation using multiple threads. As such different algorithms can be applied to two frames simultaneously and the total throughput is increased. Keywords: view synthesis, GPU, multi-threading, multi-camera

1. INTRODUCTION We present an algorithm to create realistic view interpolations and extrapolations using the video streams from two or more calibrated cameras. A depth map is calculated on the fly and textures are rendered onto this map to create the virtual view. Image synthesis can be performed by methods as light field and lumigraph rendering,12 . However these methods require a lot of images as input, which restricts them to a conditioned environment. View dependent texturemapping, as presented in the work of Debevec et al.,3 also allows to visualise the scene from all directions. The textures are projected onto the geometry for rendering, using a ’view map’ for every polygon. It is based on a relatively accurate 3D model of the scene, which is calculated on beforehand. As such this method is only applicable to static scenes. Recently Zitnick et al.4 developed a system to create high-quality interpolations. They can process recorded video streams of moving objects. However the depth calculation remains off-line. Only the rendering, visualisation of the intermediate views is done on-line. Only recently, real-time calculation of depth without the use of special hardware became feasible567 . Ansar et al.6 use bilateral filtering, while an MMX-optimised code of a three-dimensional similarity accumulator is used by Schmidt et al..7 A recent example of view interpolation is the work of Criminisi et al.,8 where an interpolation of a stereo pair of an upper body is made. Their system is based on a dynamic programming algorithm for the generation of the depth map. The baseline has to be small. We want to create virtual views from moving, deformable objects. As such the time to create an interpolation Further author information: (Send correspondence to I.G.) I.G.: E-mail: [email protected], Telephone: +32-(0)16-32.10.61 L.V.G.: E-mail: [email protected], [email protected], Telephone: +32-(0)16-32.17.05

Figure 1. Overview of the system: First a background subtraction is performed. The foreground and the background are processed in a separate pipeline, although the same algorithm can be used. A plane sweep and a graph-cut algorithm are applied to obtain a dense depth map. A triangle mesh is built and the input textures are blended onto it. An interpolated view is composed of a combination of this foreground and background interpolation, seen from the desired intermediate position.

has to be very limited. Besides that, the system should be able to deal with reasonably large baselines. The background is assumed to be static. As such a foreground-background separation is possible, which makes handling with wider baseline situations more feasible. Occlusions of the background by the foreground object are catered for. Also, there is no need to calculate the depth of the background for every frame, as is the case for the foreground. A depth map is calculated in very limited time by using both the CPU and the GPU in parallel. This can be accomplished by an implementation using multiple threads. From this depth map, a triangle mesh is sampled to generate the interpolated views. The input textures are blended on this mesh using OpenGl. In this way also extrapolations are possible. The next section explains the different steps of the algorithm. In section 3 the specific implementation details concerning the multi-threaded implementation are described. Section 4 enlights the changes to the algorithm necessary to allow inputs from more than two cameras. Some results are shown in section 5.

2. THE TWO CAMERA SYSTEM One of the major challenges is to compute a reasonably accurate depth map given the available time between consecutive frames. To accomplish this we make a distinction between the foreground and the background. A segmentation algorithm separates both. For the foreground, calculation time is critical. A more accurate, but slower depth calculation process can be applied to the background, since it is assumed to be static. The problem of generating a dense depth map in limited time is addressed by a combined use of a plane sweep algorithm implemented on the GPU, followed by a graph-cut based regularisation on the CPU. Unlike most stereo algorithms, the depth map is not calculated with respect to one of the input cameras. The calculations are rather performed relative to the desired ’virtual’ camera. Once the depth map is determined, a triangle mesh is sampled and rendered in OpenGl. Interpolations or extrapolations are accomplished by view dependent blending of the textures on this triangle mesh. These different processing steps are outlines in figure 1. For every step, it is also indicated whether it runs on CPU or GPU.

2.1. Foreground segmentation The goal of background subtraction is to separate the foreground from the background, for every frame that is processed. A fast algorithm that is resistant to local illumination changes is required. A reference background image is initialised by taking an image of the scene without a foreground object present. This reference is updated every frame by use of the segmentation result. The implementation of the segmentation algorithm is based on a technique for illumination-invariant change

detection in a sequence of images, using an appropriate model and statistical decision criterion.9 We have extended it to colour images, and have added a compensation to handle partial darkness in the background and the objects. A neighbourhood is defined for every image-pixel, typically a 3 × 3 window. A descriptor vector x for this neighbourhood is defined by concatenating the 27 colour values (3 channels for the 9 pixels). During on-line segmentation, for every pixel pi this vector xi in the current frame is compared to the corresponding vector bi of the reference background by checking for their collinearity. Figure 2 shows this principle for two dimensions. The decision if a pixel-value is foreground or background, depends on the angle between these vectors. However, it was shown that minimisation of k d1 2 + d2 2 k is a more robust test for collinearity than evaluating the angle.9 This

Figure 2. Geometrical interpretation of testing the collinearity of vectors xi and bi .

problem corresponds to solving a 2x2 matrix eigenvalue decomposition, which can be efficiently implemented. But since the distance d is also dependent on the length of the vector, it becomes more difficult to segment darker objects. To compensate for this, the distance measure is divided by the distance that would be obtained if the vectors were 45 deg. apart from each other. As a result, a foreground object can be robustly extracted in varying illumination conditions, almost without spurious spot-like decision errors. This separation is represented in fig. 3 [middle] by a black and white image, where the white pixels are assigned to be foreground, and the black pixels are background. Depending on the difficulty of the scene, it can be necessary to further remove erroneous spots. This is accomplished with a recursive floodfill algorithm. Small white blobs surrounded by a black area turn black and black blobs within a white area turn white. This resulting mask is used to separate the foreground from the input image, as is shown in fig. 3[right].

Figure 3. [left] Camera image. [middle] The foreground mask before applying the floodfill algorithm. [right] The segmented foreground image.

The algorithm is speeded up by exploiting the MMX-functionalities of the CPU, so that it works in real-time. Further implementation details can be found in Ref. 10.

2.2. Radial distortion correction and sparse correspondence search We work with cheap cameras, like webcams. Radial distortions are common for the low-end lenses of such cameras. Compensation for this is necessary before determining the image correspondences. The radial distortion parameters are calculated during camera calibration and the computations to obtain the inverse deformation

function are carried out off-line. The correction is performed by orthographic projection of the images on a flat surface, which is deformed with the inverse distortion, as can be seen in figure 4. This surface is implemented as a highly accurate nurbs-interpolant, which is tied to the input image with a texture coordinate per control point. As radial distortions stay unchanged during a sequence, the unwarping procedure can be packed in a display list and can be done completely on the GPU. This doesn’t require any overhead to load the images in or out of the texture memory, since the plane sweeping algorithm will also be performed on the GPU. An example of such radial distortion correction on a real image is shown in figure 5.

Figure 4. The input images suffer from serious ’barrel distortion’. By projecting on a surface with exactly the same amount of ’pincushion distortion’, the image becomes undistorted.

Figure 5. [left] Image of a planar surface, suffering from large radial distortions. [right] The same image after radial distortion correction.

After the images are corrected for radial distortions, we search for sparse, but certainly valid correspondences between the left and right background image. Since the cameras are calibrated, the F-matrix Fim1im2 between the images is known, and the knowledge of the epipolar lines can be exploited. Features, more exactly corners, are detected in the two corresponding images with the SUSAN detector.11 This detector has a good performance for only limited processing time. The outline of the rest of the point matching algorithm is as follows: • For every feature (pim1 ) in one image, all possible corresponding features (pim2 ) in the other image are collected. Selection is based on the criterion that pim2 has to lie close to the corresponding epipolar line lepi(im2) : lepi(im2) = Fim1im2 * pim1 • For removing erroneous correspondences, we apply the following selection-criteria to the n candidate correspondences: – Similarity: The normalised cross-correlation between the image patches around the features must be higher than a predefined threshold (T hS ): P

√P

pim1 pim2 P pim2 2

pim1 2

> T hS

– Monoticity: The disparities of nearby corresponding points must be similar as to their directions and magnitudes: P P direction −−−−→ && kpim1 − pim2 k ≅ length ∡− p− im1 pim2 ≅ n n

By applying these criteria, some sparse, good correspondences are obtained, without the need for a full image rectification. ˜ corresponding to the matched features x Next a triangulation yields the 3D points X ˜L and x ˜R . The triangulation algorithm is outlined in Table 1. By combining the projection equations from the two cameras, it is possible to obtain a linear expression between the disparity values and the corresponding 3D points. Table 1. The triangulation algorithm.

˜ ρx ˜L = PL X,

with PR/L the projection matrices

(1)

˜ ρ′ x ˜R = PR X,

with x ˜R/L = [xR/L , yR/L , 1]T

(2)

After rewriting equation 2 we obtain ˜ = WX ˜ =0 [PR (0, 0) − xR ∗ PR (2, 0) ... PR (0, 3) − xR ∗ PR (2, 3)] X and in combination with equation 1: ¸ · ¸ ¸ · · x ˜L PL x ˜L −1 ˜ ˜ ˜ ⇒ X=M X = MX = O W O A fast implementation of this algorithm is possible. The matrices M can be pre-calculated for every possible disparity value x. When the inverse matrices M −1 are stored in a lookup table, the triangulation only requires one matrix multiplication per feature. The depth range covered by these points is used to initialise the search range for the plane sweep algorithm 12 for the background. The initial search range for the foreground begins at the position where the camera frusta intersect each other. The maximum possible depth for the foreground is set at the fore-bound of the depth range for the background. Since the cameras are fixed and the background is assumed to be static, the background range will stay the same throughout the sequence. Hence, the correspondence search and triangulation only have to be performed on the first frame. The depth range for the foreground is refined and adjusted every frame by use of the depth map.

2.3. Plane sweep The plane sweeping algorithm is completely implemented on the GPU. A plane is swept through the predicted search range in 3D space. The input images are simultaneously projected onto this plane. Pixel-wise correlation of these images determine the correspondences for every plane position. The best correspondences obtained in the search range determine the depth values. The calculation of the correlations is done during the rendering and the texture projection. It is programmed in a dedicated fragment program on the graphical board. Also the comparison of these correlation measures for the different depth levels is done by the GPU. A depth map and a colour interpolation image, at the position of the required intermediate camera position, are resulting from this plane sweep algorithm. Only approx 100 msec are needed to check 40 possible depth values for a set of 640x480 images. A histogram is calculated from the resulting depth values. The search range in 3D space can be adjusted based on the mean position and the wideness of this histogram. Further explanation of the plane sweeping algorithm and range update can be found in Ref. 13.

2.4. Min-cut regularisation: Graph cut Since the GPU operations are pixel-wise, the depth map resulting from the plane sweep tends to be rather noisy. To overcome this we use a graph-cut based regularisation. This algorithm can take a wider spatial environment into account and yields a global result. We determine an energy function that has to be minimised. The four energy terms contained in this function are:

• a data term that expresses the correlation measure • a temporal continuity term to take continuity with the previous frame into account • an occlusion term • a spatial continuity term to express connectivity between neighbouring pixels Furthermore we can deal with multiple resolutions by adjusting the sampling distance based on the image information (e.g. lower sampling density at homogeneous regions). To determine the global minimum of this energy function a graph is set up. The nodes of the graph represent the different samples at their possible depth values. The edges that connect these nodes carry weights that represent the strength of the connection according to the energy function. The minimum cut of this graph14 yields the minimum energy and as such also the optimal depth map. A considerable speed up in the graph cut algorithm is reached by using the result of the sweep as an initialisation. We refer to Ref. 13 for a broader explanation of the energy function, the graph setup and the minimum cut search.

2.5. Generation of new views For the generation of the new views, the resulting depth map is sampled. A triangle mesh is fitted through the samples, using a delaunay triangulation, and rendered using OpenGl (see figure 6[left]). Both input images are simultaneously projected on this mesh. This projection from their camera positions to the triangle mesh is performed in a geometrically correct way. The textures are blended together to obtain a realistic view. Figure 6[right] shows the result after blending two input images on the mesh in fig. 6[left]. View dependent texture mapping is used to enhance to degree of realism. To realize this, we specify a third image which contains the view dependent blending factors in its alpha value. The two input textures and this ’alpha’ texture are rendered simultaneously on the triangle mesh, with the correct blending realized by specifying the GL TEXTURE ENV. The desired interpolations and extrapolations are obtained by specifying the viewing position and the direction of the ’virtual’ camera in OpenGl.

Figure 6. [left] A triangle mesh is generated from the depth map and rendered using openGl. [right] The input images are blended on the mesh to obtain a virtual view.

The blending factors are adjusted, dependent on the position of the ’virtual’ camera. If this position is more to the right, the right texture will have more influence, and the opposite is true for the other side. Also the difference in illumination and colour between the left and the right camera image is catered for in this way. By specifying the blending factors in the alpha values of a third texture, instead of using the alpha values of the original images, the blending factors can be adjusted during the camera movement, without the need for reloading the original images into texture memory. Since the view interpolation is based on matching between two input images, parts that are only visible in one of the inputs, will be missing in the interpolation. This will occur mainly where the object is at the borders of one of the images. To compensate for this, we implemented an automatic zooming function. For the two input images, we search if and where the foreground object reaches the image border. At these sides, the object is not completely visible in one or in both input images. The depth map and the 3D model will only contain the parts of the object that are visible in both inputs.

The projection of the 3D model in the interpolated view is determined. If this projection doesn’t reach the image borders at the sides where it does in the inputs, then the distance (D) between the projection and the border is measured (see figure 7).

Figure 7. An illustration on a dummy person of how automatic zooming can be applied to avoid ’floating’ objects.

This distance will be used to calculate the zooming factor. The focal length of the ’virtual’ camera is adjusted so that only the middle part of the intermediate image becomes visible. To minimise the necessary zooming, we first translate the principal point over a distance D/2. Then only a zooming to reduce the image height with D/2 is still needed to make the missing foreground part invisible. When the ’virtual’ camera is moved and the projection of the 3D foreground object appears larger in this view we zoom out. The resulting views show as much information as possible, as can be seen in figure 8.

Figure 8. Automatic zooming applied to a synthetic view of a scene.

3. IMPLEMENTATION USING MULTIPLE THREADS To improve the performance of the program we use parallel programming instead of serial programming techniques. The parallel programming facilities are based on the concept of threads. Traditionally, multiple single-threaded processes have been used to achieve parallelism, but here we can benefit from a finer level of parallelism. Multi-threaded processes offer parallelism within a process and share many of the concepts involved in programming multiple single-threaded processes. In single-threaded process systems, a process has a set of properties. In multi-threaded systems, these properties are divided between processes and threads. A process in a multi-threaded system is the changeable entity. It must be considered as an execution frame. A thread is the schedulable entity. It is an independent flow of control that operates within the same address space as other independent flows of controls within a process. It has only those properties that are required to ensure its independent control of flow. This means that in such multi-threaded processes, the different subparts of an algorithm can be scheduled and executed independently. Our program is modelled as a number of distinct parts interacting with each other to produce the desired result. These parts can be implemented as several entities, each entity performing a part of the program and sharing

resources with other entities. By using multiple entities, a program can be separated according to its distinct activities, each having an associated entity. These entities do not have to know anything about the other parts of the program except when they exchange information. In these cases, they must synchronise with each other to ensure data integrity. In case multiple processing units are available, computation can be spread out over those units. In our case, both the CPU and the GPU are considered as independent units executing a part of the algorithm. Threads are well-suited entities for modular programming because they provide simple data sharing (all threads within a process share the same address space) and powerful synchronisation facilities.Threads must synchronise their activities to effectively interact. This synchronisation includes: • Implicit communication through the modification of shared data (fig.9 bold: activities, italic: locking by mutexes) • Explicit communication by informing each other of events that have occurred (fig. 9 arrows) We opted for the parallel execution of the GPU and CPU based algorithms. Two threads are specified: one thread contains all CPU based algorithms, while the other runs the GPU algorithms and performs the rendering. In figure 9 the outline of both threads is given, with the main algorithms performed by each thread shown in boldface.

Figure 9. Data flow and synchronisation between two threads.

The data flow between both threads has to be scheduled in such a way that the correct order of operations is maintained for each processed frame. Explicit communication between the threads takes care of this, by the sending of and the waiting for conditional signals. Sometimes a thread has to wait for data from the other thread. This thread stops processing at such point and waits for the corresponding signal from the other thread. Once the signal is received, processing continues. Figure 9 illustrates this inter thread synchronisation. The waiting time should be minimal. This is accomplished if both threads are synchronised in such a way that the data from one step is available (triggering is done) before the other needs it (has to wait for it). This can only be accomplished by carefully designing the threads, taking all calculation times into account. Not only data flow but also memory access has to has to be carefully controlled when using multiple threads. A memory block can only be accessed by one process at the same time. To prevent multiple threads from accessing the same memory at the same moment, it is necessary to lock the memory for all other threads when one thread accesses it. In figure 9 it is illustrated in italic when certain blocks of memory are locked or unlocked again.

Such a mutual exclusion lock to protect data or other resources from concurrent access, is called a mutex. It is important to keep the number of shared resources (memory) between both threads as limited as possible to achieve the highest possible performance. The implementation using multiple threads allows multiple frames to be processed at the same time. For example, the next frame can already be swept, while the previous one is still being processed by the graph-cut. The consequence of such two-stage parallel execution however is that the output lags one frame behind. As such throughput is gained at the cost of a single frame delay.

4. USE OF MULTIPLE CAMERAS An extension to the algorithm is made for a multi-camera setup. Using multiple cameras allows to view the scene from more directions. As such, virtual views can be generated from more sides. Also the robustness of the matching increases with the use of more cameras. The foreground segmentation and the radial distortion correction are performed on every input image. Sparse feature matching and triangulation are done between every two images, as explained in subsection 2.2. The two images used for this, are the inputs closest to the position of the desired ’virtual’ camera. The graphical card provides the possibility to load and render four textures simultaneously. Thanks to this, our plane sweep algorithm can process up to four inputs at once. These input images are projected onto a plane that is sweeping through 3D space. The correlation measure between those images is calculated while rendering. For the foreground we use the knowledge of the foreground segmentation to outline the valid image region. The processing time only increases slightly, compared to the two camera case. Mainly the loading to the GPU has an incremented complexity, while the actual algorithm on the GPU remains of the same complexity. The same amount of rendering cycles are performed, only more texture access operations are required. Since more than two images are used for determining the image correlation, it shows a more explicit minimum. As a result, the depth map obtained from the plane sweep is more accurate. As such the initialisation for the graph cut regularisation algorithm is improved compared to the two camera case. Also the graph cut algorithm can be implemented taking into account the knowledge of multiple cameras. The edge weights become a function of all of these image values. This is already implicitly done by using the correlation measures and the result from the plane sweep for the graph setup. Since the initial depth estimate and the edge weights of the graph are defined more accurately, also the final depth map will gain accuracy compared to the two camera case. The depth map is sampled and a triangle mesh is fitted through the points. This mesh is rendered by OpenGl. The input textures are projected onto this mesh to obtain valid interpolations. We don’t project all input images simultaneously. Only the two inputs closest to the ’virtual’ camera position are blended on the mesh. While moving the ’virtual’ camera the blending factors are adjusted and the pair of inputs to be blended, is selected automatically.

5. RESULTS In figure 10, several results are shown from recordings of a real scene with two or three cameras. In this experiment, the cameras are positioned with approximately one meter between each other, and the person is standing at 2-4 meter from the cameras. All input images as well as several interpolations are shown. The advantage of using three cameras also becomes clear. The results obtained with three cameras are slightly better, as can be seen by comparing image 10(h) of the 3 camera case with image 10(e) obtained with 2 cameras. Especially in the face (which is rather difficult to reconstruct) the difference becomes clear. Another advantage of using three cameras is that a larger range of viewing positions can be covered, without the need of doing extrapolations. As such images 10(j and k) can be obtained by also using image 10(i) as input besides input images 10(a and c). Figure 11 shows a sequence of synthesised views of the foreground only, generated from three input images. The views are taken at different intermediate positions in between the inputs. At the left, the left most input is shown with next to it the synthesised view at this position. The images more to the right and at the next row

(a) The input from camera 0, standing at the left.

(c) The input from camera 1, standing in the middle.

(d) A virtual view generated with 2 input images, at the position of camera 0.

(e) A virtual view generated with 2 input images, at the position in between camera 0 and 1.

(f) A virtual view generated with 2 input images, at the position of camera 1.

(g) A virtual view generated with 3 input images, at the position of camera 0.

(h) A virtual view generated with 3 input images, at the position in between camera 0 and 1.

(i) A virtual view generated with 3 input images, at the position of camera 1.

(j) A virtual view generated (k) A virtual view generated (l) The input from camera 2, with 3 input images, at the powith 3 input images, at the postanding at the right. sition in between camera 1 and sition of camera 2. 2.. Figure 10. Results from recordings of a real scene with two or three cameras.

are generated at positions evolving more and more to the position of the right input camera, which is shown as last image.

Figure 11. A sequence of synthesised views of the foreground, at intermediate positions in between the input cameras.

6. CONCLUSION AND FUTURE WORK We developed an algorithm that create a ’virtual’ camera based on two or more video streams. Both interpolations and extrapolations are possible in a limited amount of time. The major challenge in this view synthesis process is the calculation of a rather accurate depth map in very limited time. We chose to solve this problem with a combined approach of a GPU based plane sweep and graph cut regularisation on the CPU. These two algorithms can run in parallel when using a carefully designed multi threaded implementation. As such throughput is increased at the cost of a single frame delay. For the moment we are working on an algorithm for automatic viewpoint selection. This is based on face tracking, and seeks to select always a view which is fronto parallel.

ACKNOWLEDGEMENTS The authors gratefully acknowledge support from the Research Fund K.U.Leuven (GOA).

REFERENCES 1. M. Levoy and P. Hanrahan, “Light field rendering,” in SIGGRAPH’96, pp. 31–42, 1996. 2. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” SIGGRAPH’96 30(Annual Conference Series), pp. 43–54, 1996. 3. P. Debevec, Y. Yu, and G. Borshukov, “Efficient view-dependent image-based rendering projective texturemapping,” in Eurographics Rendering Workshop (EGRW’98), pp. 105–116, June 1998. 4. C. L. Zitnick, S. B. Kang, M.Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” in ACM SIGGRAPH and ACM Trans. on Graphics, Vol. 23, Issue 3, pp. 600–608, August 2004. 5. L. D. Stefano and S. Mattoccia, “Real-time stereo within the videt project,” in Real-time Imaging, 8, pp. 439–453, Elsevier Science Ltd. 2002, 2002. 6. A. Ansar, A. Castano, and L. Matthies, “Enhanced Real-time Stereo Using Bilateral Filtering,” in 3D Data Processing, Visualization, and Transmission, 2nd International Symposium on (3DPVT’04), pp. 455–462, September 2004. 7. J. Schmidt, H. Niemann, and S. Vogt, “Dense disparity maps in real-time with an application to augmented reality,” in WACV 02, pp. 225–230, December 3-4, 2002.

8. A. Criminisi, J. Shotton, A. Blake, and P. Torr, “Gaze manipulation for one-to-one teleconferencing,” in ICCV 03, pp. 191– 198, October 13-16, 2003. 9. T. Aach and A. Kaup, “Illumination-invariant change detection using a statictical colinearity criterion.,” in Pattern Recognition: Proceedings 23rd DAGM Symposium, Lecture Notes in Computer Science 2191, B. Radig and S. Florczyk, eds., pp. 170–177, 2001. 10. N. Galoppo von Bories, T. Svoboda, and S. De Roeck, “Real-time segmentation of color images — implementation and practical issues in the blue-c project,” Technical Report 261, Computer Vision Laboratory, Swiss Federal Institute of Technology, March 2003. 11. S. Smith and J. Brady, “Susan - a new approach to low level image processing,” in Int. Journal of Computer Vision, 23-1, pp. 45–78, May 1997. 12. S. Baker, R. Szeliski, and P. Anandan, “A layered approach to stereo reconstruction,” in Proc. of CVPR’98, pp. 434–441, June 1998. 13. I. Geys, T. P. Koninckx, and L. VanGool, “Fast Interpolated Cameras by combining a GPU based Plane Sweep with a Max-Flow Regularisation Algorithm.,” in Proceedings of second international symposium on 3D Data Processing, Visualization & Transmission - 3DPVT’04, IEEE ISBN 0-7695-2223-8/07, 072-1064, Sept 2004. 14. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” in Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 359–374, 2001.

Suggest Documents