Fast Depth-Image Meshing and Warping - Semantic Scholar

29 downloads 237 Views 5MB Size Report
University of California, Irvine. July 2002 ... Information & Computer Science Department. Electrical .... To allow
Fast Depth-Image Meshing and Warping

Renato Pajarola, Yu Meng, Miguel Sainz

UCI-ECE Technical Report No. 02-02 The Henry Samueli School of Engineering University of California, Irvine

July 2002

Fast Depth-Image Meshing and Warping Renato Pajarola*, Yu Meng*, Miguel Sainz† *Computer Graphics Lab Information & Computer Science Department University of California Irvine [email protected], [email protected]

Abstract In this paper we present a novel and efficient depthimage warping technique based on a piece-wise linear approximation of the depth-image as a textured and simplified triangle mesh. We describe the application of a hierarchical triangulation method to generate triangulated depthmeshes efficiently from depth-images, discuss alternative depth-mesh segmentation methods to avoid occlusion artifacts, and propose a new technique to reduce exposure artifacts that exploits hardware accelerated warping and weighted blending of multiple depth-images in real-time. Applications of our technique include image-based object representations, large scale walk-through visualization systems, as well as terrain rendering. Keywords: image based rendering, depth-image warping, multiresolution triangulation, level-of-detail, hardware accelerated blending

1. Introduction In recent years a new rendering paradigm called Image Based Rendering (IBR) [DBC+99], that is based on the reuse of image data rather than geometry to synthesize arbitrary views, has attracted growing interest. Since IBR works on sampled image data, and not on geometric scene descriptions, the rendering cost is independent of the scene complexity, and depends only on the resolution of the sampled data. In fact, one of the goals of IBR is to de-couple 3D rendering cost from geometric scene complexity to achieve better display performance in terms of interactivity, frame-rate, and image quality. Target applications include interactive rendering of highly complex scenes, display of captured natural environments, and rendering on time-budgets. There exist many approaches to accelerate rendering for interactive navigation such as reducing the scene complexity using different levels-of-detail (LODs) and exploiting frame-to-frame coherence. Although these approaches achieve significant improvements, they are still limited when the complexity of the scene has increased far beyond the image-space resolution. In this case, IBR techniques seem to have advantages over traditional geometric rendering. In this paper we expand on the technique of depth-image warping [Mil95]. Images with depth(s) per pixel have been used to represent individual objects [Max96, SGHS98, OB99] or to approximate parts of a large scene in interactive walk-through applications [RP94, SLS+96, SS96, AL99, ACW+99, QWQK00]. We present an improved depth-

†Image Based Modeling and Rendering Lab Electrical and Computer Engineering Department University of California Irvine [email protected]

image warping technique based on adaptive triangulation and simplification of the depth-buffer, and rendering this depth-mesh with the color texture of the depth-image (see also [MMB97]). In addition to reducing the rendering cost from the geometric scene complexity to the resolution of the depth-image, adaptively triangulating the depth-map further reduces the rendering cost down to the complexity of the depth-variation within this image. 1.1 Main contributions Our method offers several improvements and alternatives compared to previous depth-image warping techniques: • The main contribution of our approach is that the simplification of the triangulated depth-buffer is performed at interactive frame-rates for high-resolution depth-images (i.e. with 250,000 pixels or more), in contrast to previous methods such as [DCV97]. • We also propose an efficient and simple segmentation method that improves on previous methods such as [MMB97]. • Depth-image warping is efficiently performed by rendering a comparatively small and bounded set of textured triangle-strips instead of warping a large number of individual pixels. • Furthermore, we propose a novel technique for positional weighted blending of multiple reference depthmeshes in real-time that exploits hardware graphics acceleration. Due to its efficiency, our approach supports rendering systems such as [RP94, SLS+96, SS96, ACW+99] which update image-based scene representations frequently at runtime. 1.2 Organization The remainder of the paper is organized as follows. In Section 2 we briefly review the most related methods in depth-image warping. Section 3 describes our depth-meshing, segmentation and rendering approach and in Section 4 we provide experimental results supporting our claims. Finally, Section 5 concludes the paper.

2. Related work The notion of depth per pixel has been introduced as disparity between images in [CW93] and used for image synthesis by interpolation between pairs of input images. The depth information – distance from the center of projection along the view direction to the corresponding surface point – allows to re-project pixels from a depth-image to arbitrary new views. In [Mil95], a unique evaluation order is pre2

sented to guarantee back-to-front drawing order when (forward) warping pixels from the input depth-image to the frame buffer of a new view. An extension to depth-images is presented in [SGHS98] called a layered depth-image (LDI) which can store multiple depth and color values per pixel. The use of a LDI allows improved depth-image warping with fewer exposure artifacts – exposure of regions not visible in the reference image. The use of precalculated, multi-layer depth-images has previously been discussed in [Max96] for rendering of complex trees. LDIs are also used in [AL99] together with an automatic preprocess image placement method to support interactive rendering at guaranteed frame rates, and in [OB99] to represent objects using a bounding box of LDIs. The idea of LDIs has further been extended in [CBL99] to LDI-trees for improved control over the sampling rate. Point based approaches such as [PZvBG00, RL00, ZPBG01, CN01] eliminate exposure artifacts due to undersampling and zooming in on a fixed set of discrete samples by rendering points as disks, surface elements, with nonzero extent. When rendered, the tightly packed surface elements appear to represent a smooth surface. Another approach to cope with exposure artifacts is to represent depth-images as triangle meshes [MMB97]. The triangulated depth-buffer provides a connected surface approximating the 3D scene and supports automatic pixel interpolation for exposed or stretched regions as well as hardware acceleration by rendering the textured triangle mesh. The approach presented in [DCV97, DCV98] creates simplified irregular triangulations of depth-images used as cubical environment maps. Both approaches use multiple reference depth-images to limit exposure artifacts for new views. In [DSSD99] multiple layers of triangulated depthimages are proposed to solve exposure problems.

3. Depth-image meshing and warping 3.1 Meshing The depth values of a depth-image can be considered to be a 2.5-dimensional height-field data set similar to terrain elevation models. Given the depth values in the z-buffer for a particular reference image, we calculate for each pixel (i, j) its 3 corresponding 3D coordinate P i, j ∈ R in the viewing coordinate system. Using the coordinates of point Pi,j, a quadtree based multiresolution triangulation [Paj02] is constructed on the grid of pixels of the reference depth-image. We use the restricted quadtree triangulation method presented in [Paj98a] to generate a simplified triangulation of the z-buffer, called a depth-mesh. For details on this hierarchical triangulation method we refer to [Paj98a] and only discuss modifications in the context of applying it to depthimages here. Figure 1 shows the basic refinement steps for this hierarchical triangulation method.

level l-1

level l

level l

level l+1

a)

b)

c)

d)

FIGURE 1. Recursive quadtree subdivision and triangulation. Refinement points are shown as grey circles in b) for a diagonal edge bisection and c) for vertical and horizontal edge bisections.

As error metric we use point-to-line distance functions. For performance reasons, for vertical and horizontal refinement points shown in Figure 1 c) we use an approximative 2D distance function instead of the more complicated 3D point-to-line distance. The approximation error d of a refinement point P, bisecting a vertical (horizontal) edge between two points A and B of a coarser LOD, is calculated as the 2D distance of P to the line AB in the projection on the y,z-plane (x,z-plane). Figure 2 shows an example configuration of projecting points A, B and P from pixels within the same column onto the y,z-plane. Thus using the line equation z = m ⋅ y + b for AB, with m = ( z B – z A ) ⁄ ( y B – y A ) , the approximation error for vertical refinement points (and analogously for horizontal refinement points) can be evaluated by: m ( yP – yA ) – ( zP – zA ) d = ----------------------------------------------------1 + m2

y

(EQ 1)

3D point

pixel

A P

d

x

B

z

reference viewpoint image plane

FIGURE 2. Approximation error of vertical refinement point P is calculated as point-to-line distance in the projection on the y,z-plane.

For refinement points P bisecting a diagonal edge between points A and B as in Figure 1 b), the approximation error is calculated as the actual 3D distance: (B – A) × (B – P) d = --------------------------------------------B–A

(EQ 2)

For field-of-view angles of less than 50˚, Equation 1 introduces an error of less than 1.0 – cos 25° = 9.4% .1 However, because only 20% of points are diagonal refinement points, we achieve a significant speed-up using Equation 1 instead of Equation 2 for vertical and horizontal refinement 1. Thus this could conservatively be taken into account and added to the result of Equation 1 if desired.

3

points. To avoid expensive square root evaluations the distance is used as squared value d2. Furthermore, to correct for perspective distortion, the error d2 is normalized by division 2 through the depth z P of refinement point P, yielding the projected error in image-space. To allow efficient top-down vertex selection to generate a simplified depth-mesh, the approximation error of each pixel or point is saturated according to the restricted quadtree triangulation dependencies as proposed in [Paj98a]. Given an image-space error tolerance ε, an adaptively simplified depth-mesh is extracted from the restricted quadtree triangulation hierarchy defined on the reference 2 2 depth-image by selecting all points with d P ⁄ z P > ε 2 . See Section 4 for example depth-mesh triangulations. 3.2 Segmentation The triangulation of the z-buffer introduces surface interpolation between different object surfaces and the background as shown in Figure 3 that causes artificial occlusion or rubber sheet [MMB97] artifacts when warped to new views. Therefore, it is important to determine whether triangles represent actual object surfaces or not.

P 1 = ( x 1, y 1, z 1 ) and P 2 = ( x 2, y 2, z 2 ) , and given their correnormal vectors N 1 = ( nx 1, ny 1, nz 1 ) and N 2 = ( nx 2, ny 2, nz 2 ) . Given the viewpoint e, the two points

sponding

(row pixels) and normals are first transformed into the coordinate system with the z’-axis being aligned with the vector from e to P1 and view-up vector being (0,1,0). Using the normals to define the first derivatives f' ( x ) , the curve parameters can be evaluated to: A = f ( 0 ) = z1 nx 1 B = ∂f' ( 0 ) = ---------– nz 1

(EQ 3)

nx 2 ---------∂f' ( x 2 ) – ∂f' ( 0 ) – nz 2- – B ∂f'' ( 0 ) C = --------------- = ------------------------------------ = --------------------2x 2 2x 2 2

We observed that endpoints of stretched edges have almost identical normals – steep rubber sheet triangles – if the normal is estimated from the z-buffer. This makes it easy to fit a quadratic curve according to Equation 3 such that the points are on or very close to the curve. To prevent the quadratic curve f ( x ) from perfectly fitting vertices of rubber sheet triangles we use a sheared coordinate system as illustrated in Figure 4 b) with the xˆ -axis parallel to the edge from P1 to P2. Thus x 2 changes to xˆ 2 = P 2 – P 1 . z’

z’

f(x) P1

P2

N1 a)

b)

FIGURE 3. Rubber sheet artifacts showing in a) are removed in b) by appropriate segmentation.

If performed at run-time (i.e. if used in systems such as [RP94, SLS+96, SS96, ACW+99]), the segmentation of the triangulated depth-mesh has to be done very quickly. Therefore, sophisticated range-image segmentation methods such as [MW99] cannot be applied since most of these approaches as reported in [HJBJ+96] are far too slow for our purpose. In [MMB97], the notion of connectedness is defined to disambiguate the connected and disconnected mesh regions. Each triangle in the mesh is designated as either low-connectedness or high-connectedness, indicating whether or not the triangle is believed to represent part of a single actual surface. Given a surface normal for each pixel by interpolation, the connectedness between vertical or horizontal pixel neighbors is calculated by comparing the real z-difference with an estimated value from curve fitting. Because we were not able to get any reasonable segmentation results by a straight forward implementation from the explanations in [MMB97] we present a slightly modified version of connectedness here that we used in our experiments. Figure 4 a) illustrates the basic fitting of a quadratic 2 curve f ( x ) = A + Bx + Cx using the row pixels

f(x) P1 N1

N2

e

P2

x’ a)

N2

x

e b)

FIGURE 4. Basic quadratic curve fitting in a), and modified sheared coordinate system in b).

The connectedness is then computed as ratio of the absolute error in the range estimation to the parameter range xˆ 2 : f ( xˆ 2 ) – z 2 r = ------------------------xˆ 2

(EQ 4)

If the ratio of Equation 4 is greater than some threshold τ then the two pixels are considered to belong to different objects or surfaces. Triangles with such low connectedness r > τ are removed from the depth-mesh. For each right-triangle in our depth-mesh we check the connectedness according to Equation 4 for the endpoitns of the vertical and horizontal edges and if one has low connectedness the triangle is removed. This connectedness is more computing intensive than other methods discussed below due to the fact that normals have to be computed for the points of the depth-image and the complexity of evaluating Equation 4. In [OB98] changes in disparity values associated with pixels of the depth-image are used to segment the image. Given the distance d of the image plane from the viewpoint (focal length), the disparity δ of a pixel with depth z is given by δ = d ⁄ z . For three consecutive pixels (in a row or col4

umn of the depth-image) with δ1, δ2, δ3 and z1, z2, z3 that are on a line or plane in 3D we have δ 2 – δ 1 = δ 3 – δ 2 . As pointed out in [OB98], for curved surfaces, pixels form a piecewise bilinear approximation and are considered to represent a discontinuity if 〈 δ 2 – δ 1〉 – 〈 δ 3 – δ 2〉 > σ

(EQ 5)

for a given disparity threshold σ. In that case the corresponding triangles are removed from the depth-mesh. Using δ i – δ j = d ( z j – z i ) ⁄ ( z i z j ) , Equation 5 can be rewritten to: d z1 – z2 z2 – z3 ---- --------------– --------------- > σ z2 z1 z3

(EQ 6)

As pointed out in [OB98], Equation 5 must vary the threshold parameter σ with the z-distance of the pixels. As can be seen from Equation 6, this adjustment is necessary because of the d/z2 factor. In our implementation of this segmentation measure we drop this factor to get a projectionnormalized decision relation: z1 – z2 z2 – z3 --------------- – --------------- > σ z1 z3

(EQ 7)

This modified disparity measure of Equation 7 is an extremely simple measure, it only compares the difference in z-values between adjacent pixels. For each right-triangle in our depth-mesh we evaluate Equation 7 for all three corners and if two or more exceed the disparity threshold σ we remove the corresponding triangle from the depth-mesh. As an alternative to the connectedness and the disparity measures discussed above we propose an orthogonality test on triangles. We can observe that the artificial, or rubber sheet triangles, introduced during the triangulation of the depth-image have the following property: the triangle normal is almost perpendicular to the vector from the viewpoint to the center of the triangle as shown in Figure 5. Let v be the vector from viewpoint to the center of triangle t and nt be the normal of t. The following inequality using a threshold ω can be used to determine if a triangle is a rubber sheet triangle and connects different objects or surfaces: v ----- • n t < cos ( 90° – ω ) v

(EQ 8)

Compared to the disparity test of Equation 7 which only compares z-ranges of two adjacent points, the orthogonality test of Equation 8 is more powerful since it considers the orientation of a plane in 3D space with respect to the viewpoint. triangle normal nt

rubber sheet triangle

v e depth-mesh

FIGURE 5. Rubber sheet triangles in the depth-mesh.

In all three segmentation methods, additional care should be taken not to remove small triangles which represent rough surface features and do not constitute a discontinuity. Therefore, in addition to Equation 4, Equation 7 or Equation 8 we also consider the depth-range ∆z of triangles at distance zt and we only remove triangles which span a depth-range larger than some threshold λ: ∆z ------ > λ zt

(EQ 9)

3.3 Rendering Depth-image warping is efficiently performed by hardware supported rendering of textured polygons instead of projecting every single pixel from a reference depth-image to new views. The approximate depth-image consisting of a segmented triangulation of the depth-buffer, as outlined in the previous sections, is rendered using the color values of the reference image as texture. Rendering a depth-image with a resolution of 2 k × 2 k involves warping of 2 2k pixels with traditional depth-image warping techniques (or rendering about 2 2k + 1 triangles with [MMB97]). With the proposed technique, instead of warping the 2 18 = 262144 pixels of a 512 × 512 reference image, a textured depth-mesh with only a few thousand triangles can be rendered using hardware accelerated 3D graphics at a fraction of the cost. We also want to point out that the (unsegmented) depthmesh generated by our mesh simplification algorithm can be represented by one single triangle strip as shown in [Paj98a] to take further advantage of hardware graphics acceleration. After segmentation the depth-mesh consists of multiple triangle strips. The depth-mesh generation and segmentation is performed in the reference view coordinate system. Whenever the depth-mesh of a reference view has to be rendered, the coordinate system transformation of that view is used as model-view transformation to place the depth-mesh correctly in the world coordinate system. To reduce exposure artifacts, most image warping techniques use multiple reference images that have to be merged to synthesize new views. We present a novel efficient blending technique that expoits graphics hardware acceleration and that supports positional weighting of reference images. Blending of k reference depth-meshes to synthesize a new view consists of the following basic steps: 1. Calculate the blending weights wi for all depthmeshes Mi (i = 1…k). 2. Generate an occupancy image O for the new view that specifies for each pixel p ∈ O the set Sp of reference depth-meshes that are used. 3. Render the visible part of each textured depth-mesh Mi separately and store its image Ii. 4. Synthesize the new image I by iteratively blending images Ii according to occupancy O. For each set of pixels P ⊆ O with equal occupancy SP generate the partial image I P ⊆ I :

5

I P =  ∑ w j ⋅ I j ⁄  ∑ w j j ∈ S  j ∈ S  P

P

The final image is I = ∑P I P . Steps 2 and 3 can be implemented in OpenGL using stencil- and z-buffer techniques. First, the depth-meshes are rendered at a slight negative offset without any shading, illumination or texturing to initialize the z-buffer for the desired view. Second, the textured depth-meshes are rendered and captured with the z-buffer set to read-only, and the stencil buffer is activated and updated to reflect which pixels are covered by which depth-meshes. For k depthmeshes the stencil buffer will contain values between 0 and 2k-1 denoting the combination of depth-meshes covering that pixel. Step 4 can be implemented by successive rendering quadrilaterals textured with the captured depth-mesh images and alpha-blended according to the stencil-buffer and blending weights. The proposed rendering algorithm can be combined with the depth-meshes to create Image Based Objects (IBO). In [OB99] the IBO’s are constructed by generating six layered depth images taken from the six sides of a cubic bounding box surrounding a single object. Similarly, we extract six textured depth-meshes around the axis-oriented bounding box of the object and store them as the reference views. During rendering step we will first select three out of the six reference views based on the relative position of the novel rendering position in respect to the reference depth meshes. The blending weights are calculated based on the euclidean distances of the depth-mesh viewpoints and the novel viewpoint, and they are normalized to add up to one. The pseudocode for the rendering algorithm where the number of depth-meshes (k) is three is presented in Figure 6. In order to generate the proper alpha blending effect we calculate for each view i a corresponding weight mask by rendering a set of stenciled quadrilaterals with the appropriate weights as in the alpha and before switching to process the next view, the color channels of the image Ii are modulate with the calculated alpha values. Every pass will accumulate the new weighted colors with the previous values. The final result is a weighted sum of each of the three colors of each reference image. When the object modeled presents local concavities, using only three reference views in the rendering can generate artifacts where there is no mesh coverage. This affects the overall quality of the final render and becomes noticeable when switching between different sets of reference views. To overcome this problem we can add a final pass in the rendering algorithm that will render the three remaining textured reference views using standard z-buffer visibility in non-occupied areas by selecting the appropriate stencil test function on the current calculated stencil buffer. The complete rendering algorithm for IBO’s renders the appropriate reference views twice (one with no illumination and no texturing) and then a total of eighteen non illumi-

nated quadrilaterals, three of them with textures. Additionally it renders the three not active reference views only in uncovered areas. Since the depth-meshes have a low polygon count, the total number of rendered primitives is lower than the original complex object, and higher frame rates can be achieved. determine_visible_meshes(Viewer, meshID[]); calculate_weights(Viewer, meshID[]); clear_buffers(); N = 3; // Number of depth-meshes // Calculate zBuffer for (i = 0; i < N; i++) render_mesh(meshID[i]); set_zbuffer_read(LESS_OR_EQUAL); set_polygon_offset(); // Calculate Stencil mask = 1; for (i = 0; i < N; i++) { set_stencil_ref_value(mask); set_stencil_to_write_ref_value_iff_z_passes set_stencil_mask(mask); clear_screen(); render_mesh(meshID[i]); capture_frame_to(colorPlate[i]); mask++; } set Orhtographic 2D projection matrices // Render quads for first mesh set_RGBA_mask(FALSE,FALSE,FALSE,TRUE); draw_quad_with_color(0,0,0,0); for (mask = 1; mask < 8; mask +=2) { set_stencil_to_pass_for(mask); draw_quad_with_color(0,0,0, weight[mask]); } set_RGBA_mask(TRUE,TRUE,TRUE,FALSE); set_blending_function(src*dst_alpha+0*dst); draw_quad_with_texture(colorPlate[1]); disable textures and blending // Render quads for second mesh set_RGBA_mask(FALSE,FALSE,FALSE,TRUE); draw_quad_with_color(0,0,0,0); for (mask = 2; mask < 4; mask++) { set_stencil_to_pass_for(mask); draw_quad_with_color(0,0,0, weight[mask]); } for (mask = 6; mask < 8; mask++) { set_stencil_to_pass_for(mask); draw_quad_with_color(0,0,0, weight[mask]); } set_RGBA_mask(TRUE,TRUE,TRUE,FALSE); set_blending_function(src*dst_alpha+1.0*dst); draw_quad_with_texture(colorPlate[2]); disable textures and blending // Render quads for third mesh set_RGBA_mask(FALSE,FALSE,FALSE,TRUE); draw_quad_with_color(0,0,0,0); for (mask = 4; mask < 8; mask++) { set_stencil_to_pass_for(mask); draw_quad_with_color(0,0,0, weight[mask]); } set_RGBA_mask(TRUE,TRUE,TRUE,FALSE); set_blending_function(src*dst_alpha+1.0*dst); draw_quad_with_texture(colorPlate[3]); disable textures and blending restore state machine status // Render quad for background set_stencil_to_pass_for(0); draw_quad_with_RGBA(1,1,1,1); FIGURE 6. Depth-mesh blending algorithm of an image based object.

6

FIGURE 7. Depth-meshing of different objects and scenes for different image-space error tolerances ε. From left to right: rendering from polygonal model, depth-mesh for ε = 0.00001 (wire frame and textured) and for ε = 0.00005 (wire frame and textured).

4. Experiments Table 1 shows timing results for the generation of depthmeshes with our approach given a depth-image size of 5132 = 263169 pixels. Figure 7 provides the corresponding screenshots of the polygonal models and the depth-mesh representations. The triangulation cost includes the imagespace error calculation outlined in Section 3.1, saturation of the error measure, selection of points with error > ε and triangle strip generation. The cost for triangulation is mainly determined by the reference image resolution and the cost to compute the error metric. The segmentation cost for removing rubber sheet triangles is mainly determined by the size of the depth-mesh. The last column in Table 1 denotes the number of triangles in the unsegmented depth-mesh and not the number of triangles that are actually used for display. In addition to the triangulation and segmentation time, the capturing and transformation of the z-buffer has to be taken into account when generating depth-mesh representation wich was about 80ms for all experiments.

Model

ε

0.00001 Scene 0.00005 Mindego 0.00001 Hills 0.00005 0.00001 Bell 0.00005 0.00001 Blob 0.00005

Triangulation Segmentation Depth-Mesh ∆s 294 ms 237 ms 295 ms 230 ms 234 ms 209 ms 261 ms 222 ms

48 ms 16 ms 46 ms 15 ms 17 ms 5.8 ms 19 ms 6 ms

28937 10073 32961 10121 15271 5699 18637 6447

TABLE 1. Depth-mesh construction cost and size for different polygonal models and geometric image-space error tolerances ε. The reference depth-image size is 513 × 513 . Timing is performed on a Sun 450MHz UltraSPARC-II.

In Table 2 we compare the timing of the different segmentation methods described in Section 3.2 with threshold parameters adjusted to generate similar visual results as shown in Figure 8. As can be expected, the connectedness measure is the most expensive method among the considered ones and the reported time in Table 2 does not include the calculation of normals per point that is required by this segmentation method. Moreover, despite experimenting with the threshold parameters the visual results from the connectedness measure are not very good. The much faster disparity method was able to achieve superior segmentation results as shown in Figure 8. Similar results were achieved using the orthogonality measure. While time cost is a little bit bigger than the cost of computing disparity, the orthogonality test produced slightly improved segmentation results. In particular, the segmentation boundaries are cleaner and have fewer artifacts. 7

side view front view front view side view front view side view connectedness disparity orthogonality FIGURE 8. Comparison of the different segmentation methods. For each model and segmentation method the front view of the segmented depth-mesh is shown as well as a view from a zoomed-in or rotated position. Connectedness τ time Scene 40 128 ms Mindego 13 249 ms Bell 11 66 ms Blob 8.7 101 ms Model

Disparity Orthogonality δ ω time time 0.15 5 ms 1.4 27 ms 0.06 10 ms 2.8 52 ms 0.08 4 ms 3.7 15 ms 0.02 5 ms 6.0 23 ms

TABLE 2. Depth-mesh segmentation cost for different segmentation methods.

Figure 9 illustrates results from our positional weighted blending algorithm presented in Section 3.3. For each model we create an image-based object (IBO) similar to [OB99] using a bounding box of six depth-meshes around the object. The images show the polygonal object, the occupancy image, and the textured and blended depth-meshes for particular views. Each color in the occupancy image specifies how many and which depth-meshes are used to render a particular pixel. The colors red, green and blue specify the individual depth-meshes, and any combination thereof determines which depth-meshes are blended together. For each depth-mesh rendering we also provide the mean of the squared error (MSE) in color difference with respect to the original polygonal rendering for the same view (only considering non-background pixels in the image based rendering). We measure the color difference between two corresponding pixels as the 3D distance in RGB color space. The reported MSEs are normalized and denote the percentage of the maximum error in 24bit RGB color space (which is the diagonal of a 255 × 255 × 255 cube). Table 3 shows the size and rendering performance of image based objects using our depth-image meshing and warping approach. Despite the fact that the depth-mesh representation may be larger than the polygonal representation,

our warping and blending technique provides stable frame rates for arbitrary complex objects and the rendering performance is completely independent from the size of the input polygon mesh. Model Polygons FPS IBO size Happy 100,000 15 219019 Dragon 871,414 1.8 192689 David 8,254,150 0.2 254383

FPS 33 >33 33

TABLE 3. Depth-mesh size and rendering performance of image based objects.

A different reference reference view setup is shown for a terrain fly-through experiment in Figure 10. Along a flight path a number of reference depth-meshes are recorded for key frames. Furthermore, a top-view that overlooks the entire data set is also recorded. An arbitrary view is rendered by blending the two closest reference depth-meshes with our technique. In addition the top-view is rendered where the stencil buffer does not show coverage by the other two depth-meshes. In the top row of Figure 10, 16 key frames are computed in 15 seconds and our depth-image warping and blending achieves a stable frame rate of 20 FPS. The other images show results from using 32 key frames that are precomputed in 28 seconds, and the frame rate is 16 FPS. Rendering the polygonal terrain representation (leftmost images in Figure 10) achieves only 3 frames per second.

5. Discussion We presented an efficient depth-image meshing and segmentation method as well as a novel real-time depth-image blending algorithm that exploits hardware accelerated graphics cards. The proposed depth-buffer triangulation 8

approach reduces the complexity of depth-image warping by simplifying the triangulated depth-mesh and rendering textured triangle strips. Theoretically our novel blending technique is scalable to any number of meshes as long as the pixel fill rate of the graphics card is not exceeded. However, due to limitations of the OpenGL API and the size of the depth-meshes we have limited the blending up to three meshes to obtain interactive rendering speeds. Our approach improves over previous depth-image warping methods [Mil95, MMB97, DCV97, DCV98] in particular in the following aspects: efficient generation of image based object representations, and weighted blending of multiple reference views at interactive frame rates. The presented approach can be used as a rendering component in visualization systems such as large scale walk-through applications [RP94, SLS+96, SS96, ACW+99]. The main limitations of the presented method are the limited ability to cover exposure artifacts in comparison to layered approaches [SGHS98, DSSD99], and the exponential increase in rendering cost with respect to the number of blended reference views. Future work in this context will include extension of the proposed approach to a multi-layer depth-mesh representation, application of our blending algorithm to other IBR methods, and taking advantage of per-pixel and per-vertex programming features of modern graphics hardware accelerators.

Acknowledgements This work was partially supported by NSF grant CCR0119053.

References [ACW+99] Daniel Aliaga, Jon Cohen, Andrew Wilson, Eric Baker, Hansong Zhang, Carl Erikson, Kenneth Hoff, Tom Hudson, Wolfgang Stuerzlinger, Rui Bastos, Mary Whitton, Fred Brooks, and Dinesh Manocha. MMR: An interactive massive model rendering system using geometric and image-based acceleration. In Proceedings Symposium on Interactive 3D Graphics, pages 199–206. ACM SIGGRAPH, 1999. [AL99] Daniel Aliaga and Anselmo Lastra. Automatic image placement to provide a guaranteed frame rate. In Proceedings SIGGRAPH 99, pages 307–316. ACM SIGGRAPH, 1999. [CBL99] Chun-Fa Chang, Gary Bishop, and Anselmo Lastra. Ldi tree: A hierarchical representation for image-based rendering. In Proceedings SIGGRAPH 99, pages 291–298. ACM SIGGRAPH, 1999. [CN01] Baoquan Chen and Minh Xuan Nguyen. POP: A hybrid point and polygon rendering system for large data. In Proceedings IEEE Visualization 2001, pages 45–52, 2001. [CW93] Shenchang Eric Chen and Lance Williams. View interpolation for image synthesis. In Proceedings SIGGRAPH 93, pages 279–288. ACM SIGGRAPH, 1993. [DBC+99] Paul Debevec, Christoph Bregler, Michael Cohen, Leonard McMillan, Francois Sillion, and Richard Szeliski. Image-based modeling and rendering. SIGGRAPH 99 Course Notes 39, 1999. [DCV98] Lucia Darsa, Bruno Costa, and Amitabh Varshney. Walkthroughs of complex environments using image-based simplification. Computers & Graphics, 22(1):25–34, 1998.

[DSSD99] Xavier Decoret, Gernot Schaufler, Francois Sillion, and Julie Dorsey. Multi-layer impostors for accelerated rendering. In Proceedings EUROGRAPHICS 99, pages 61–72, 1999. [DSV97] Lucia Darsa, Bruno Costa Silva, and Amitabh Varshney. Navigating static environments using image-space simplification and morphing. In Proceedings Symposium on Interactive 3D Graphics, pages 25–34. ACM SIGGRAPH, 1997. [HJBJ+96] Adam Hoover, Gillian Jean-Baptiste, Xiaoyi Jiang, Patrick J. Flynn, Horst Bunke, Dmitry B. Goldgof, Kevin Bowyer, David W. Eggert, Andrew Fitzgibbon, and Robert B. Fisher. An experimental comparison of range image segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):673–689, July 1996. [Max96] Nelson Max. Hierarchical rendering of trees from precomputed multilayer z-buffers. In Proceedings Eurographics Rendering Workshop 96, pages 165–174. Eurographics, 1996. [McM95] Leonard McMillan. A list-priority rendering algorithm for redisplaying projected surfaces. Technical Report UNC-95-005, University of North Carolina, 1995. [MMB97] William R. Mark, Leonard McMillan, and Gary Bishop. Post-rendering 3d warping. In Proceedings Symposium on Interactive 3D Graphics, pages 7–16. ACM SIGGRAPH, 1997. [MW99] A. Mangan and R.T. Whitaker. Partitioning 3d surface meshes using watershed segmentation. IEEE Transactions on Visualization and Computer Graphics, 5(4):308–321, 1999. [OB98] Manuel M. Oliveira and Gary Bishop. Dynamic shading in imagebased rendering. Technical Report UNC-98-023, University of North Carolina, 1998. [OB99] Manuel M. Oliveira and Gary Bishop. Image-based objects. In Proceedings Symposium on Interactive 3D Graphics, pages 191–198. ACM SIGGRAPH, 1999. [Paj98] Renato Pajarola. Large scale terrain visualization using the restricted quadtree triangulation. In Proceedings IEEE Visualization 98, pages 19–26,515, 1998. [Paj02] Renato Pajarola. Overview of quadtree-based terrain triangulation and visualization. Technical Report UCI-ICS-02-01, Information & Computer Science, University of California Irvine, 2002. submitted for publication. [PZvBG00] Hanspeter Pfister, Matthias Zwicker, Jeroen van Baar, and Markus Gross. Surfels: Surface elements as rendering primitives. In Proceedings SIGGRAPH 2000, pages 335–342. ACM SIGGRAPH, 2000. [QWQK00] Huamin Qu, Ming Wan, Jiafa Qin, and Arie Kaufman. Image based rendering with stable frame rates. In Proceedings IEEE Visualization 2000, pages 251–258. Computer Society Press, 2000. [RL00] Szymon Rusinkiewicz and Marc Levoy. Qsplat: A multiresolution point rendering system for large meshes. In Proceedings SIGGRAPH 2000, pages 343–352. ACM SIGGRAPH, 2000. [RP94] Matthew Regan and Ronald Pose. Priority rendering with a virtual reality address recalculation pipeline. In Proceedings SIGGRAPH 94, pages 155–162. ACM SIGGRAPH, 1994. [SGHS98] Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. Layered depth images. In Proceedings SIGGRAPH 98, pages 231– 242. ACM SIGGRAPH, 1998. [SLS+96] Jonathan Shade, Dani Lischinski, David H. Salesin, Tony DeRose, and John Snyder. Hierarchical image caching for accelerated walkthroughs of complex environments. In Proceedings SIGGRAPH 96, pages 75–82. ACM SIGGRAPH, 1996. [SS96] Gernot Schaufler and Wolfgang Stürzlinger. A 3d image cache for virtual reality. In Proceedings EUROGRAPHICS 96, pages 227–236, 1996. [ZPvBG01] Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross. Surface splatting. In Proceedings SIGGRAPH 2001, pages 371–378. ACM SIGGRAPH, 2001.

9

MSE = 3.5%

MSE = 4.8%

MSE = 3.0%

MSE = 7.0%

MSE = 3.8% MSE = 3.1% FIGURE 9. Renderings of complex polygonal objects from six reference depth-meshes, aligned with the bounding box, using our proposed positional weighted blending algorithm. The leftmost of any triple of images shows the actual polygonal object, the center image shows the occupancy image generated and used by our blending algorithm, and the rightmost image is the weighted blending of the textured depth-meshes.

MSE = 3.2%

MSE = 3.5%

MSE = 3.8%

FIGURE 10. Weighted blending of depth-mesh reference frames for terrain visualization. Reference depth-meshes are recorded for key frames along a flight trajectory. Left column shows geometric renderings of the terrain. The top row shows one view from a 16 key frame sequence, the other two rows use 32 key frames.

10

Suggest Documents