International Journal of Image and Graphics c World Scientific Publishing Company
Real-Time Out-of-Core Rendering
Michael Guthe Institute of Computer Science II, Bonn University, R¨ omerstraße 164 Bonn, 53117, Germany
[email protected] Pavel Borodin
[email protected] Reinhard Klein
[email protected] Received (25 10 2003) Revised (20 02 2004) Accepted (23 09 2004) Hierarchical levels of details (HLODs) have proven to be an efficient way to visualize complex environments and models even in an out-of-core system. Large objects are partitioned into a spatial hierarchy and for each node a level of detail is generated for efficient view-dependent rendering. To ensure correct matching between adjacent nodes in the hierarchy care has to be taken to prevent cracks along the cuts. This either leads to severe simplification constraints at the cuts and thus to a significantly higher number of triangles or the need for a costly runtime stitching of these nodes. In this paper we present an out-of-core visualization algorithm that overcomes this problem by filling the cracks generated by the simplification algorithm with appropriately shaded Fat Borders. Furthermore, several minor yet important improvements of previous approaches are made. This way we come up with a simple nevertheless efficient viewdependent rendering technique which allows for the natural incorporation of state-of-theart culling, simplification, compression and prefetching techniques leading to real-time rendering performance of the overall system. Several examples demonstrate the efficiency of our approach. Keywords: Real-time rendering; complex models/environments; out-of-core algorithms.
1. Introduction Polygon models generated by scientific or engineering visualization and especially from 3D scanners exceed the capability for interactive rendering on standard graphics hardware. Furthermore, it is not uncommon that these models are several gigabytes in size and thus exceed the amount of main memory available in a common PC as well. Many acceleration techniques have been developed to render complex models at interactive frame rates. They include visibility culling, simplification and im1
2
Michael Guthe, Pavel Borodin, and Reinhard Klein
age based representations. Visibility culling works good only for models with high depth complexity and image based methods tend to produce more popping artifacts than simplification. Therefore, the simplification approaches are most promising for general complex datasets. Early simplification techniques can generally be divided into two approaches, static and view-dependent22 , where both have their advantages and drawbacks. View-dependent simplification represents the geometry as split/collapse operations that are only locally dependent. Therefore, they can be applied depending on the position of the viewer to generate an approximation that results in the same screen space error everywhere on the model. Furthermore, the view-dependent level of detail (LOD) allows a smooth transition between frames since split/collapse operations approximate geomorphing22 . In contrast to this, static levels of detail are approximations that have the same geometric error on the whole model. Since no split/collapse operations are stored, smooth transitions between different LODs are not possible. A recent extension of static level of detail are the hierarchical LODs that are build upon a bounding box hierarchy for more efficient simplification. To generate coarser levels of detail whole subbranches of the bounding box hierarchy are combined to a single object and simplified. If spatially large objects are partitioned hierarchically, HLODs are an approximation of view-dependent simplification. When it comes to out-of-core simplification, static HLODs are most commonly used because view-dependent simplification is too costly for scenes containing several hundred million triangles. However, at the cuts through spatially large objects the subparts cannot be simplified without introducing cracks in the model. This √ restricts the simplification of these objects to produce at least O( n) triangles where n is the original number of triangles in the subpart. On the other hand using runtime stitching requires much CPU power and therefore a multiprocessor system or a cluster is needed. To overcome this problem we use a crack filling algorithm that entirely works on recent graphics hardware and thus leaves the CPU power for decompression and culling. It reduces the number of triangles to approximately O(k), where k depends only on the simplification error and thus on the desired size of the LOD on screen. For inner nodes we use a simplifier that efficiently performs topological simplification, since the cracks have to be closed by the simplifier during the construction of HLODs. Additionally we present an efficient and simple motion prediction algorithm for a more efficient prefetching and thus reduction of cache misses during rendering. The main features of our algorithm are: • Real-time out-of-core visualization of complex scenes containing hundreds of million triangles. • Screen space error of less than 12 pixel compared to the original model. • Filling of cracks introduced due to independent simplification of subparts
Real-Time Out-of-Core Rendering
3
entirely on graphics hardware. • Priority-driven prefetching with a simple motion prediction based on natural movement to reduce the number of misses. • Reduced number of triangles in each node compared to previous HLOD rendering algorithms like Ref. 5 and Ref. 16. 2. Related Work In this section we give a brief overview of related work in the area of large model and out-of-core visualization. 2.1. Large model visualization Many methods have been developed for the visualization of large models. They are based on the three main approaches: geometric simplification, visibility culling and image based representations. Geometric simplification algorithms generate an approximation of the original model with a reduced number of primitives. A recent survey of simplification algorithms can be found in Ref. 29. They can be classified as either dynamic (view-dependent) or static (view-independent). As shown by Erikson and Manocha16 , static HLODs generated by partitioning approximate view-dependent simplification. Visibility culling algorithms try to quickly determine possibly visible and definitely invisible objects. There are three general methods to determine and remove invisible parts of the scene. The first is view frustum culling which removes all objects outside the view frustum. The second is backface culling that removes all polygons facing away from the current view position. If normal cones37 are used, whole objects or subtrees of a scene graph can be removed. The third approach is occlusion culling where objects are removed from the scene that are hidden behind other objects. Ref.11 gives a good overview of different occlusion culling approaches. Image based representations use impostors to replace distant objects with previously rendered images1,2,31,34,35 . In cell based approaches these can be combined with geometric simplification (e.g. Ref. 1). 2.2. Out-of-core algorithms In computational geometry and related areas a lot of effort has been spent on so called out-of-core or external memory algorithms8,42 . Based on the spatial subdivision of architectural models a real-time memory management algorithm to swap objects in and out of memory was developed by Funkhouser et al.18,19 . Aliaga et al.1 developed a system for interactive rendering of complex models than can easily be partitioned into cells. However, there is no good
4
Michael Guthe, Pavel Borodin, and Reinhard Klein
algorithm known for partitioning a general model into such cells. Furthermore, the image based methods used in this work tend to produce severe popping artifacts. A number of out-of-core simplification algorithms for large models have been developed6,10,27,28,36 that can be used to generate static levels of detail. An outof-core simplification algorithm for view-dependent simplification was developed by El-Sana and Chiang14 and tested for models with a few hundred thousand triangles. Although view-dependent simplification15,23,30,43 generates less triangles and works well for spatially large objects it has a significant overhead during rendering. Instead of performing calculations at nodes inside a scene graph it has to query every active vertex or edge of all visible objects. Furthermore, since the geometry of the objects changes frequently, it has to be sent to the graphics card at almost every frame shifting the bottleneck from fill rate limitations to bus speed. A few out-of-core visualization methods including rendering of large unstructured grids17 and isosurface extraction3,9 have been developed. In Ref. 12 and Ref. 40 application controlled segmentation and paging methods for the out-of-core visualization of computational fluid dynamics data were presented. A number of techniques like indexing, caching and prefetching13,38 were developed to increase the performance for large environment walkthrough applications. Recently an outof-core rendering algorithm41 was presented using hierarchical levels of detail16 .
3. Overview In this section we give an overview of our rendering algorithm that is composed of HLOD generation, crack filling, culling and memory management. Since a general scene graph (if provided with the scene) may not be optimized for rendering, culling and HLOD generation, we build an additional axis aligned bounding box hierarchy. In contrast to previous approaches we subdivide the whole scene with an octree, which is one of the most efficient spacial subdivisions for numerous culling techniques to create this hierarchy. The spatial subdivision can only be used efficiently, since our algorithm does not require the geometry to exactly fit together between two adjacent HLODs in contrast to Ref. 41. Therefore, no constraints preventing efficient simplification would need to be minimized. Details of the HLOD generation are discussed in Sec. 4 and a detailed description on the culling techniques used is given in Sec. 6.2 To fill the cracks between adjacent HLODs fat borders are rendered along the cuts of each inner HLOD node. When rendering a HLOD object, these fat borders are drawn using an algorithm we developed for view-dependent NURBS rendering4 which entirely runs on the GPU to conceal the cracks. Section 5 shortly describes this technique and the modifications necessary for its adaption to HLODs in detail. To reduce latency when new geometry needs to be loaded from disk, we use a prefetching algorithm that runs parallel to the rendering thread. In contrast to previous prefetching techniques we estimate the likeliness of the geometry to be rendered not only using its approximation error, but also the angle by which it has
Real-Time Out-of-Core Rendering
5
to rotate. The geometry file format is described in Sec. 4.4 and a detailed description of the prefetching algorithm can be found in Sec. 6.3. 4. HLOD generation In this section we describe the details of our method to generate HLODs. To prevent material changes inside a node that are very costly, especially when using textures, first all objects with the same material are clustered into a single super-object. Then they are partitioned using an octree, hierarchically grouping subparts and simplifying them together. Finally, a HLOD (or LOD at leaf nodes) is generated for every node of the octree and stored on disk. 4.1. Overall algorithm Since the super-objects are treated separately, the subsequent algorithm is described for one such object only. The partitioning algorithm starts with the whole super-object in the root node of an octree. The object is partitioned by cutting the geometry of each inner node into eight subparts and storing them in its children. If its geometry contains at most Tmax triangles the node is not split further. If a node contains no geometry it is marked and not partitioned further. The out-of-core partitioning algorithm is described in detail in Sec. 4.2. After the partitioning the geometry contained in the leafs of the octree is stored on disk. Starting from the geometry of these nodes the HLOD hierarchy is build recursively from bottom to top with the following algorithm: • Gather the simplified geometry from all child nodes that are two levels below the current node (or the original geometry if there is no HLOD at this depth). Its approximation error εprev is then the maximum error of the child nodes. • Simplify resulting geometry as long as the Hausdorff distance εh to the gathered geometry is less than ∆ε = enode res − εprev , where enode is the edge length of the current node’s bounding cube and res is the desired resolution in fractions of enode , depending on the screen space error εscreen and the desired edge length of a node on screen escreen . This constant is defined as screen res = eεscreen , where escreen can be optimized for a specific PC system. • Store ε = εh + εprev as approximation error of the current node. By using the children at two levels below the current node instead of its direct children the simplified geometry contains less triangles, since the approximation of the real geometric error is better. This is due to the fact that the difference between the estimated geometric error ε and the real geometric error εreal is low, since εreal ≥ ∆ε =
enode enode enode 3 enode 3 − εprev ≥ − = = ε res res 4 · res 4 res 4
(1)
6
Michael Guthe, Pavel Borodin, and Reinhard Klein
and thus 34 ε ≤ εreal ≤ ε. Theoretically it would also be possible to use an out-of-core simplification algorithm to build the HLODs of inner nodes directly from the original geometry during the partitioning process. However, starting with the combination of already simplified geometry greatly reduces the computation cost and still leads to high quality drastic simplification. This is due to the fact that the number of triangles in the gathered geometry remains almost constant independent from the depth of the actual node and therefore, the number of triangles in the base geometry of this part of the object. This means that the total simplification time depends only linearly on the number of leaf nodes and thus linearly on the number of triangles in the base geometry. During the simplification process the geometric error of each node is stored along with the bounding box of the contained simplified geometry. Finally the geometry of each node is compressed and stored on disk. Additionally the skeleton of the scene graph containing the bounding boxes and the simplification error, but not the geometry itself, is stored. The size of such a node in the scene graph skeleton is 32 bytes (5 bytes geometry file info, 16 bytes normal cone, 6 bytes octree structure, 4 bytes simplification error and one status byte) and a leaf node should contain several thousand triangles on average for efficient rendering. Therefore, the size of this skeleton is only about 3.2 MB for a scene containing 100 million triangles. Nevertheless, in contrast to previous algorithms we do not keep the whole skeleton in memory. At the point where the depth of the skeleton reaches 5 (≤ 1.14 MB, the subtrees are stored on disk and only loaded on demand. For faster loading these subtrees have a maximum depth of three before they branch into different files again. During runtime the root skeleton and the 100 last recently used subtrees (≤ 18.3 KB each) are kept in memory.
4.2. Out-of-core partitioning Since the unsimplified geometry of a node does not fit into the main memory in general, the vertices and normals of the mesh are stored in blocks and swaped in and out from disk using a last-recently-used (LRU) algorithm. The indices of the triangles need not to be stored in memory and therefore can be streamed from the geometry file and cut to the current octree leaves. At each traversal step every triangle is cut by the three planes dividing the node into its children and the resulting triangles are stored in the appropriate geometry files. When a triangle edge is cut, the normal of the new point is calculated by linear interpolation. Note that new vertices may have the same coordinates as existing vertices, but this is resolved after the whole tree is build. If the number of triangles contained in a node exceeds Tmax during this process, the node is split and its triangles are cut into the child nodes. When the partitioning is complete new
Real-Time Out-of-Core Rendering
7
indices for the leaf node triangles are calculated and duplicate points are removed. The total complexity of the partitioning algorithm is O(n log n), since on each level of the octree all triangles need to be processed once.
4.3. Simplification of a node Since we need to combine disjoint parts of the super-object during simplification, the simplifier needs to be capable of performing topological simplification in order to close the cracks introduced by the partitioning and independent simplification in previous stages of the recursion. Furthermore, it has to be capable of simplifying the geometry up to a given geometric error (Hausdorff distance).
Fig. 1. Hierarchical simplification using only vertex pair contractions (left) and generalized pair contractions (right).
Standard vertex pair contraction simplification on such geometry can have undesirable results (Fig. 1, left). For this reason we use the generalized pair contractions algorithm described by Borodin et al.7 . This approach completely generalizes the vertex pair contraction operation introduced by Popovi´c and Hoppe32 and Garland and Heckbert20 by allowing contractions between a vertex and any other mesh simplex (vertex, edge or triangle) or between two edges. This technique allows to connect disjoint parts of the mesh already on the early stages of simplification and therefore resolves this problem by sewing disconnected parts together (Fig. 1, right).
4.3.1. Generalized pair contractions The three generalized pair contraction operations7 are shown in Fig. 2. In case of vertex-edge and vertex-triangle contractions the contraction vertex is contracted onto intermediate vertex, which is created on the contraction edge or triangle. In case of edge-edge contraction two intermediate vertices are created on each contraction edge and then contracted together. These three operations perform no reduction, but increase the connectedness of the mesh.
8
Michael Guthe, Pavel Borodin, and Reinhard Klein
v0
v0 v1
e
v2
ti
v1
v0
v2
v11
v1
vnew
v21
22
v12
v1i
v11
v2
v20
v2i v21
v22 v10
v1i
v2
v3
v2i
v2i t2i e2 v t1i
v2
v3
v3
e1
vnew
v1 vi
v0 t1 t3 v’0 t2
v1
t
v21
v2
vi
vi
v1
v’0 ti1 ti2
v12
v11
v22 vnew
v12
v1i
Fig. 2. Vertex-edge (top), vertex-triangle (middle) and edge-edge (bottom) contractions.
4.3.2. Error metrics As a criterium for the choice of next contraction operation we use the quadric error metric presented by Garland and Heckbert20 . All candidate contraction pairs are sorted in a priority queue according to the error that will arise after contracting them. The new position of a contraction vertex is chosen in order to minimize this error. 4.3.3. Control over geometric error Although the quadric error metric is a fast technique, which provides good results, it does not deliver the precise geometric error. In our case this is a necessary requirement. Furthermore we need the possibility to simplify up to a given geometric error. Therefore, in addition to quadric error metrics we use the Hausdorff distance between the original and the simplified meshes. This is implemented the same way as described by Hoppe23 and Klein et al.24 . This combination of different error metrics leads to higher speed without loss of accuracy. Since the input and output number of triangles generally remain constant, the complexity of the simplification algorithm linearly depends on the number of nodes in the octree and therefore is O(n) and the HLOD generation algorithm has a complexity of O(n log n) in total. The hierarchical simplification algorithm is parallelized. In our examples we used eleven PCs in a network for these preprocessing steps.
Real-Time Out-of-Core Rendering
9
4.4. Compression of connectivity and geometry In order to minimize the loading latency we employ sophisticated compression schemes capable of real-time decompression of the out-of-core data. It would lead to no benefit if the geometry would be stored as a sequence of simplification operations to construct it from its children two levels below, since it contains only approxi1 of their vertices and so the number of operations exceeds the number mately 256 of vertices in the geometry by a factor of approximately 200. Therefore, we apply the Cut-Border algorithm for non-manifold meshes21 to compress the connectivity information. For the compression of the vertex positions the bounding cube of the octree node is stored and the vertex coordinates of its geometry are discretized relatively to the bounding cube. Since the geometry of inner nodes is an approximation and already associated with a geometric error ε, it does not make sense to store the vertex positions with a much higher precision. We spent one bit more than needed node to encode with an error of ε for inner nodes (e.g. 8 bit for ε = e128 ). When the geometry of such a node is rendered, an error of ε is projected to at most the desired screen space error εscreen . Therefore, storing the vertices with an error of 2ε projects on screen to at most εscreen . Since we use a screen space error of 12 pixel, this is at 2 1 most 4 pixel and can thus be neglected. The normals are discretized to 8 bit per coordinate, which does not lead to a loss of image quality in general. If we choose for example res = 128 the vertex coordinates as well as the normals require 3 × 8 bit each and therefore 48 bit = 6 bytes, instead of 24 bytes when using float or even 48 bytes with double precision values. Since the coordinates generally need much more space than the connectivity, this reduces the file size to almost a third of the original size even without additional compression techniques. 5. Crack Filling The gap filling algorithm conceals the cracks between adjacent HLODs introduced during the simplification by rendering appropriately shaded Fat Borders4 . This has the big advantage that the HLOD of a node can be replaced without changing anything in the geometry of adjacent nodes and therefore, only information for the new node has to be send down the AGP bus to fill the cracks. The width, displacement and color of the fat borders are calculated every frame using the vertex program shown in Fig. 3. The input data for rendering the fat borders from the cutting and simplification stage consists of all edges along the cuts that need to be filled and the approximation error of the node. Since the Fat Border algorithm needs edge loops or at least edge strips, they are reconstructed from the Cut-Border edge loops. They are constructed by starting an edge strip with the first boundary edge and adding adjacent boundary edges at the start and end of this strip until no adjacent edge is left. Then the next strip is started, again with the first remaining boundary edge. If all boundary edges
10
Michael Guthe, Pavel Borodin, and Reinhard Klein
!!ARBvp1.0 PARAM PARAM PARAM TEMP
mvp[] mv[] param view,
= {state.matrix.mvp}; = {state.matrix.modelview[0]}; = program.local[0]; length, pos;
# move in view plane XPD pos, view, vertex.texcoord[0]; DP3 length, pos, pos; RSQ length, length.x; MUL pos, length, pos; MAD pos, pos, param, vertex.position;
# compute per vertex lighting ...
# move away from viewer MAD pos, view, param, pos;
# compute view direction MUL view, {-1, -1, -1, 0}, mv[2]; DP3 length, view, view; RSQ length, length.x; MUL view, length, view;
# transform and project vertex ... END
Fig. 3. Vertex program to render fat borders. The tangent vectors are stored as texture coordinates and the approximation error is given as local program parameter.
are concatenated to edge strips, the tangent vectors are calculated as described in Ref. 4 with the exception that if an edge strip does not start and end in the same vertex, only one tangent is generated at the start end end of the strip. To speed up rendering, we only use the middle two of the generated fat border vertices, as shown in Fig. 4, instead of all six and thus adding two triangles to the node for each boundary edge at a cut. si -1
si ti
vi 3
vi 2
vi1
vi 4 v i5 si -1
ti
vi 6
si -1
si
vi 3 vi 4
si
si -1
vi 2
vi1
ti
vi 6 si
ti
vi 5
Fig. 4. Fat border accuracy against speed tradeoff. Using only two vertices instead of al six increases the rendering performance at the cost of lower accuracy of the fat borders.
Since the edges along a cut between adjacent HLODs do not need to be preserved, the approximation of a view-dependent level of detail is better and less triangles need to be rendered. Figure 5 demonstrates this approximation and how the fat borders fill the cracks between adjacent HLODs. 6. Rendering For rendering we use the OpenSG33 scene graph which has a build in support for view frustum and occlusion culling. Since disk access and rendering are able to run
Real-Time Out-of-Core Rendering
11
Fig. 5. HLODs rendered for a specific view point (near the head). Note how the HLODs approximate the continuous view-dependent level of detail and the fat borders allow better simplification along cuts (left, fat borders not rendered) compared to 16 (right).
almost in parallel even on a single processor system, two threads are used. The first thread traverses the octree and then renders the scene. During rendering the prefetching thread loads geoemetry using a priority queue described in section 6.3. 6.1. Scene representation For scene graph traversal and culling a scene graph skeleton is kept in memory. This skeleton stores parent child relationships, the bounding box and accumulated error for each node. Furthermore, the normal cone, a pointer to the geometry (if in memory) and the status of the node are kept in this skeleton. The geometry is loaded on the fly into main memory when it is needed for rendering. A memory footprint size can be given to the visualization algorithm and we ensure that no more memory than specified is used by our method. The only restriction to this footprint size is that the root of the scene graph skeleton and of course the rendering buffers have to fit into this amount of memory. 6.2. Culling techniques Additionally to the built-in view frustum and occlusion culling39 , we implemented backface culling using normal cones37 . For this the normal cones of the child nodes are combined during the HLOD generation to calculate the normal cone of an inner node. To update the current scene graph the skeleton is traversed and the following operations are performed: Backface culling: The normal cone of the node is checked with the view direction. If the normal cone is completely facing away from the viewer, scene traversal is stopped. View Frustum culling: The bounding box of the node is checked against the view frustum and traversal is stopped when it is outside.
12
Michael Guthe, Pavel Borodin, and Reinhard Klein
Occlusion culling: The node is checked for occlusion using its bounding box. Again traversal is stopped if the node is occluded. HLOD selection: If the accumulated error ε of the node is at most the desired error εdes the geometry is rendered and traversal is stopped. If the node is not already in memory, it has to be loaded. The desired error is calculated by projecting the screen space error εscreen to the point on the bounding box of the node which is closest to the viewer. Note that the backface and occlusion culling are optional because they do not make sense for all scenes (e.g backface culling for scene with double sided triangles and occlusion culling for scenes with low depth complexity).
6.3. Memory management We use a mutex lock to synchronize the rendering and prefetching thread when the view changes. Additionally this lock is used to prevent the prefetching thread from using up CPU power during scene graph actualization and culling. The workflow of the two threads is shown in Fig. 6.
Fig. 6. Workflow of the parallel rendering and prefetching thread for a single processor system.
Fig. 7. Angles θ and γ used for visibility prefetching: V represents the view frustum, B the bounding box and N the normal cone of the current node.
We use two sets of priority bins Qload and Qremove to determine which nodes should be loaded from disk and which of the currently unused geometry nodes are not needed in consecutive frames and thus can be removed from memory. The priority p of a node in these queues represents the likeliness that a node will be rendered in the next frame. In contrast to previous approaches18,41 we do not only use the projected error of a node as base for the likeliness and cut off all nodes outside an extended view frustum. Instead we calculate the minimum angle the viewer would need to rotate to be able to see the node and then use this angle to weight the likeliness depending of the movement of the viewer. To calculate the priority of a node the scene graph is traversed and the priority p for all inactive nodes is calculated by the following formula:
Real-Time Out-of-Core Rendering
( pdetail =
p=
εdes εnode εparent εdes,parent
: εdes < εnode : else
13
(2)
pdetail : visible pdetail · arccos α : culled
(3)
α = max(θ, γ),
(4)
where θ is the angle between the view frustum and the bounding box of the node and γ the angle between the normal cone and the view plane if the node is backfacing, otherwise zero. For the 2D case this is shown in Fig. 7. As long as enough memory is available the node with the highest priority Nload is loaded from Qload . If not enough memory is available to load Nload , the node with the lowest priority Nremove from Qremove is removed, if its priority lower than that of Nload . If no more geometry can be loaded without violating this condition the prefetching is stopped. The priority queues have to be updated each time the scene graph traversal starts. Since the number of nodes is very high for complex scenes the priority is calculated by traversing the octree and traversal is stopped as soon as p < pstop for a given pstop (we use 0.5) and εdes < εnode . All nodes of the hierarchy not reached by this traversal are then assumed to have a priority of zero. Note that we do not use occlusion culling to calculate the priority of a note, because occlusion can change very abruptly between frames. To predict the motion of the viewer we use the fact that natural movements are continuous up to at least the second derivative. We approximate this movement by quadratic functions using the view position and direction of the last three frames. Based on this movement we predict the view position and direction of the next frame to perform a more efficient prefetching. This leads to the following formula to estimate the position Pi+1 and direction d~i+1 in the next frame: ~vi − ~vi−1 = ~vi+1 − ~vi
with
~vj =
Pj − Pj−1 tj − tj−1
(5)
~vi+1 = 2~vi − ~vi−1 Pi+1 − Pi Pi − Pi−1 Pi−1 − Pi−2 =2 − ti+1 − ti ti − ti−1 ti−1 − ti−2
(6)
Pi+1 = Pi + (ti+1 − ti ) 2
Pi − Pi−1 Pi−1 − Pi−2 − ti − ti−1 ti−1 − ti−2
d~∗i+1 = d~i + (ti+1 − ti )
d~i − d~i−1 d~i−1 − d~i−2 − ti − ti−1 d~i−1 − d~i−2
d~i+1 =
d~∗i+1 kd~∗ k
2
!
(7) (8) (9)
(10)
i+1
Since we cannot predict the time ti+1 when the next frame will be displayed, we assume that ti+1 − ti = ti −t2i−2 .
14
Michael Guthe, Pavel Borodin, and Reinhard Klein
This prediction works well as long as the user does not release the mouse button and thus performs an abrupt stop or starts to press a button while moving the mouse. The first case is no problem for our algorithm, since the geometry for the actual view position is already in memory. The second case cannot be solved by any movement prediction algorithm, since there is no data available to predict the movement from, when the viewer does not perform movements. It is not necessary to perform a prefetching for the subtrees of the scene graph skeleton, since they are already loaded during the geometry prefetching. 7. Results We tested our system with different models (see table 1) from the Stanford Scanning Repository26 (Armadillo, Happy Buddha, Dragon and Lucy) and the Digital Michelangelo Project25 (David and St. Matthew) on an Athlon 2800+ PC with 512 MB memory and an ATI Radeon 9800Pro. As parameters for the partitioning algorithm we chose a maximum number of triangles per node of Tmax = 1024 and a ratio of error to node size res = 256 (max. 128 pixel per node in image space). For rendering we set pstop = 0.5, εscreen = 12 and 256 megabytes as memory footprint size.
Table 1. Triangle numbers of models used for testing, their size on disk and frame rates (average: left and minimum: right) for different rendering algorithms. Model Armadillo Dragon Happy Buddha David 2mm Lucy David 1mm St. Matthew
# Triangles 345 944 871 414 1 087 716 8 254 150 28 055 742 56 230 343 372 422 615
Original (raw data) 11.9 MB 29.9 MB 37.3 MB 283.4 MB 963.2 MB 1.9 GB 12.5 GB
HLODs (comp.) 1.1 MB 2.4 MB 3.5 MB 20.2 MB 66.7 MB 138.6 MB 649.6 MB
Fat Borders 128/71 122/86 117/69 155/73 110/51 114/61 100/46
Constrained simplification 123/72 114/80 101/62 80/49 29/ 8 13/ 1 5/ 1
Since the depth complexity of the models is low and occlusion culling has a significant overhead, we only used view frustum and backface culling in our tests. The camera paths used for testing are rotations of the object and zooms to interesting parts. To evaluate the performance of our out-of-core rendering system we compare it to in-core rendering (if possible for the model) and to out-of-core rendering without crack filling. For the in-core rendering we load all geometry when the scene graph skeleton is loaded and do not start the memory management thread. Table 1 shows the frame rates for the tested models at a resolution of 1024 × 768. Especially interesting here is the comparison between the two David models at different reconstruction resolutions, since we used the same camera path for them to show how excellent our method scales with the size of the input model. For relatively small models the frame rates of our new algorithm and the constrained
Real-Time Out-of-Core Rendering
15
simplification41 are almost identical. But as the number of triangles increases, the √ performance drop due to the O( n) complexity of the constrained simplification compared to the O(log n) complexity of our approach becomes noticeable. Of course the achieved frame rates depend on the velocity of the viewer. For the recorded natural movement of the object almost no misses occurred. Therefore, a reduction of the velocity does not lead to higher framerates as approved by different tests. Figure 8 shows the frame rates and memory management statistics of the St. Matthew statue in detail for a freehand movement of the model. The memory management statistics show the memory needed for the actual rendered geometry (visible bytes: black) the size of the prefetched geometry (bytes prefetched: dark grey) and the size of the geometry that was loaded during octree traversal (bytes missed: light grey).
Fig. 8. Frame rates for the out-of-core rendering (left) and memory management statistics (right) of the 14 mm St. Matthew statue.
The memory management statistics demonstrate that the prefetching works very well and only few misses occurred during the movement. The algorithm provides a good balance between GPU and CPU load since the GPU load is almost 100% and the CPU load is 95% on average. Since only few nodes are replaced between frames, the AGP bus load is low, leaving enough bandwidth for textures. Videos showing the camera paths for some models can be downloaded at: http://cg.cs.uni-bonn.de/project-pages/opensg-plus/videos/. 8. Conclusion In this paper we presented a simple but efficient out-of-core visualization algorithm using hierarchical levels of detail and a crack filling algorithm running on graphics hardware. An intelligent out-of-core simplification technique, using vertex-vertex, vertex-edge, vertex-triangle and edge-edge collapses allows to merge parts of different nodes in the hierarchy in such a way that gaps between adjacent parts are closed in a controlled manner. The number of triangles in each HLOD node generated by partitioning can be reduced further compared to previous approaches, since the triangles at cuts do not need to be preserved. Additionally we optimize
16
Michael Guthe, Pavel Borodin, and Reinhard Klein
the geometry for rendering and AGP bus transfer. The algorithm has been tested on various models ranging from a few megabytes to more than one gigabytes showing its ability to render highly complex scenes at real-time frame rates on consumer level hardware. Furthermore, the results show the excellent scalability. Acknowledgments This work was partially funded by the BMBF under the project of OpenSG PLUS. We thank Marc Levoy and the Digital Michelangelo Project for providing us with the models. References 1. D. G. Aliaga, J. Cohen, A. Wilson, E. Baker, H. Zhang, C. Erikson, K. E. Hoff, T. Hudson, W. St¨ urzlinger, R. Bastos, M. C. Whitton, F. P. Brooks, and D. Manocha, “MMR: An interactive massive model rendering system using geometric and image-based acceleration”, in Symposium on Intercative 3D Graphics, (1999), p. 199. 2. D. G. Aliaga and A. Lastra, “Automatic image placement to provide a guranteed frame rate”, in Siggraph 1999, Computer Graphics Proceeding, ed. A. Rockwood, (Addison Wesley Longman, 1999), p. 307. 3. C. Bajaj, V. Pascucci, D. Thomson, and X. Y. Zhang, “Parallel accelerated isocontouring for out-of-core visualization”, in IEEE Parallel Visualization and Graphics Symposium, (1999), p. 87. ´ Bal´ 4. A. azs, M. Guthe, and R. Klein, “Fat borders: Gap filling for efficient viewdependent lod rendering”, Computers & Graphics, 28(1), (2004), p. 79. 5. W. V. Baxter, A. Sud, N. K. Govindaraju, and D. Manocha, “Gigawalk: Interactive walkthrough of complex environments”, (2002). 6. F. Bernadini, J. Mittleman, and H. Rushmeier, “Case study: Scanning michelangelo’s florentine pieta”, (1999). 7. P. Borodin, S. Gumhold, M. Guthe, and R. Klein, “High-quality simplification with generalized pair contractions”, in Proceedings of GraphiCon’2003, (2003), p. 147. 8. Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter, “External-memory graph algorithms”, in Symposium on Discrete Algorithms, (1995), p. 139. 9. Y.-J. Chiang and C. T. Silva. “External memory techniques for isosurface extraction in scientific visualization”, in AMS/DIMACS Workshop on External Memory Algorithms and Visualization, (1998). 10. P. Cignoni, C. Montani, C. Rocchini, and R. Scopigno, “External memory simplification of huge meshes”, in IEEE Trans. on Visualization and Comp. Graph., (2003). 11. D. Cohen-Or, Y. Chrysanthou, and C. Silva, “A survey of visibility for walkthrough applications”, (2000). 12. M. Cox and D. Ellsworth, “Application-controlled demand paging for out-of-core visualization”, in IEEE Visualization, (1997), p. 235. 13. C. DeCoro and R. Pajarola, “Xfastmesh: Fast view-dependent meshing from external memory”, in IEEE Visualization, (2002), p. 363. 14. J. El-Sana and Y.-J. Chiang, “External memory view-dependent simplification”, Computer Graphics Forum, 19(3), (2000). 15. J. El-Sana and A. Varshney, “Generalized view-dependent simplification”, Computer Graphics Forum, 18(3), (1999), p. 83.
Real-Time Out-of-Core Rendering
17
16. C. Erikson and D. Manocha, “HLODs for faster display of large static and dynamic environments”, in ACM Symposium on Interactive 3D Graphics, (2000). 17. R. Farias and C. T. Silva, “Out-of-core rendering of large unstructured grids”, in IEEE Compter Graphics and Applications, (2001). 18. T. A. Funkhouser, “Database management for interactive display of large architectural models”, in Graphics Interface ’96, ed. W. A. Davis and R. Bartels, (Canadian HumanComputer Communication Society, 1996), p. 1. 19. T. A. Funkhouser and C. H. S´equin, “Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments”, Computer Graphics, 27, (1993), p. 247. 20. M. Garland and P. S. Heckbert, “Surface simplification using quadric error metrics”, Computer Graphics, 31, (1997), p. 209. 21. S. Gumhold, “Improved cut-border machine for triangle mesh compression”, in Erlangen Workshop ’99 on Vision, Modeling and Visualization, (IEEE Signal Processing Society, 1999). 22. H. Hoppe, “Progressive meshes”, Computer Graphics, 30, (1996), p. 99. 23. H. Hoppe, “View-dependent refinement of progressive meshes”, Computer Graphics, 31, (1997), p. 189. 24. R. Klein, G. Liebich, and W. Straßer, “Mesh reduction with error control”, in IEEE Visualization ’96, ed. R. Yagel and G. M. Nielson, (1996), p. 311. 25. M. Levoy, “The Digital Michaelangelo Project”, http://www-graphics.stanford.edu/projects/mich. 26. M. Levoy, “The Stanford 3D Scanning Repository”, http://www-graphics.stanford.edu/data/3dscanrep. 27. P. Lindstorm, “Out-of-core simplification of large polygonal models”, in ACM Siggraph, (2000). 28. P. Lindstorm and C. T. Silva, “A memory insensitive technique for large model simplification”, in IEEE Visualization, (2001). 29. D. Luebke, “A developer’s survey of polygon simplification algorithms”, in IEEE CG & A, (2001), p. 24. 30. D. Luebke and C. Erikson, “View-dependent simplification of arbitrary polygonal environments”, Computer Graphics, 31, (1997), p. 199. 31. P. W. C. Maciel and P. Shirley, “Visual navigation of large environments using textured clusters”, in Symposium on Interactive 3D Graphics, (1995), p. 95. 32. J. Popovi´c and H. Hoppe, “Progressive simplicial complexes”, in SIGGRAPH (1997), p. 217. 33. D. Reiners, G. Voss, J. Behr, and M. Roth, “OpenSG”, http://www.opensg.org. 34. G. Schaufler and W. St¨ urzlinger, “A three dimensional image cache for virtual reality”, Computer Graphics Forum, 15(3), (1996), p. 227. 35. J. Shade, D. Lischinski, D. H. Salesin, T. DeRose, and J. Snyder, “Hierarchical image caching for accelerated walkthroughs of complex environments”, Computer Graphics, 30, (1996), p. 75. 36. E. Shaffer and M. Garland, “Efficient adaptive simplification of massive meshes”, in IEEE Visualization, (2001). 37. L. A. Shirman and S. S. Abi-Ezzi, “The cone of normals technique for fast processing of curved patches”, Computer Graphics Forum, 12(3), (1993), p. 261. 38. L. Shou, J. Chionh, Z. Huang, R. Ruan, and K. L. Tan, “Walking trough a very large virtual environment in real-time”, in Proceedings International Conference on Very Large Data Bases, (2001), p. 401. 39. D. Staneker, “A first step towards occlusion culling in OpenSG PLUS”, in 1st OpenSG
18
Michael Guthe, Pavel Borodin, and Reinhard Klein
Symposium, (2002). 40. S.-K. Ueng, C. Sikorski, and K.-L. Ma, “Out-of-core streamline visualization on large unstructured meshes”, IEEE Transactions on Visualization and Computer Graphics, 3, (1997), p. 370. 41. G. Varadhan and D. Manocha, “Out-of-core rendering of massive geometric environments” 42. J. S. Vitter, “External memory algorothms and data structures.”, in External Memory Algorithms and Visualization, ed. J. Abello and J. S. Vitter, (American Methemathical Society Press, 1999), p. 1. 43. J. C. Xia, J. El-Sana, and A. Varshney, “Adaptive real-time level-of-detail based rendering for polygonal meshes”, IEEE Transactions on Visualization and Computer Graphics, 3(2), (1997), p. 171.
Photo and Biography Michael Guthe received his Diploma in Computer Science at the University of T¨ ubingen, Germany in June 2000. Since November 2001 he works in the Computer Graphics Group at the University of Bonn, Germany as a Ph.D. student. His main research area is real time rendering of complex objects. Until December 2003 he worked in the OpenSG Plus project funded by the german ministry of research and education (BMBF). Pavel Borodin received his Diploma in Mathematics at the Lomonosov Moscow State University, Russia in June 1996. Since October 2000 he works in the Computer Graphics Group at the University of Bonn, Germany as a Ph.D. student. His main research areas are mesh simplification and mesh repair.
Reinhard Klein studied Mathematics and Physics at the University of T¨ ubingen, Germany, from where he received his Diploma in Mathematics (Dipl.-Math.) in 1989 and his Ph.D. in computer science in 1995. In 1999 he received an appointment as lecturer (”Habilitation”) in computer science also from the University of T¨ ubingen, with a thesis in computer graphics. In September 1999 he became an Associate Professor at the University of Darmstadt, Germany and head of the research group Animation and Image Communication at the Fraunhofer Institute for Computer Graphics. Since October 2000 he is professor at the University of Bonn and director of the Institute of Computer Science II (Computer Graphics Group).