3D Graphics Rendering Time Modeling and Control for Mobile Terminals Nicolaas Tack* IMEC Leuven - Belgium Also PhD student at KU Leuven
Francisco Morán† Universidad Politécnica de Madrid
Abstract 3D graphics has found its way to mobile devices such as Personal Digital Assistants (PDA) and cellular phones. Given their limited battery capabilities, these devices typically have less computational resources available than their counterparts connected to a power supply. Additionally, the workload of 3D graphics applications changes very drastically over time. These different and changing conditions make the creation of 3D content a real challenge for the content creators. To allow the rendering of arbitrary content on a mobile device without the need of ad-hoc content creation. We present a framework to adapt the resolution of 3D objects to the available processing resources. An MPEG-4 scalable geometry decoder is used to change the resolution and an analytical model of the workload of a mobile renderer is presented for controlling the scalable decoder. Because of the scarce computational resources, a good balance between accuracy and complexity is needed. The presented approach has an error and a complexity overhead of less than 10% for most practical cases. CR Categories:I.3.8 [Computer Graphics]: Applications. Keywords: Mobile terminals, Rendering time modeling, MPEG-4 WSS, Rendering time control.
1. Introduction 3D graphics has found its way to mobile devices such as Personal Digital Assistants (PDA) and cellular phones. Given their limited battery capabilities, these devices have typically less computational resources available than their counterparts that are connected to a power supply. For example, online games distribute their 3D content to users playing at home (on a graphics PC) or on the road (on a mobile). This challenges content creators to design 3D content suited for every possible terminal.
Gauthier Lafruit‡ IMEC Leuven - Belgium
Rudy Lauwereins§ IMEC Leuven - Belgium Also professor at KU Leuven
Progressive, multiresolution 3D coding formats support low complexity decoding (albeit at lower quality) on low performance terminals, without jeopardizing high quality decoding on more powerful devices. The selection of a suitable Level Of Detail (LOD) for controlling the average workload on the terminal is taken by using simple benchmarks, e.g. triangle and pixel fill rate. Unfortunately, 3D graphics applications are often very dynamic. With a constant LOD, the workload can vary over one order of magnitude [Lafruit et al. 2000]. For example, a virtual house walkthrough from an empty room to a room filled with furniture will change the instantaneous processing requirements very drastically. Therefore, in order to guarantee an acceptable frame rate at all times, the multi-resolution decoders should adapt the resolution of the content to the instantaneous workload. Reactive rendering time control engines that monitor the instantaneous processing load and then modify the resolution might be appropriate in applications with slowly and/or consistently varying scenes (e.g., terrain rendering and flight simulation), but such engines cannot deal well with abrupt changes as in virtual walkthroughs. These can only be tackled by adaptive techniques that actively predict the rendering time for proper adaptation [Funkhouser et al. 1993] This paper contributes to the work on mobile 3D graphics by proposing an analytical model of the execution time of a triangular 3D rendering engine. The analytical model is exploited to appropriately adapt the 3D geometry to the terminal’s resources. An additional layer of control to support a large number of objects has been reported in [Pham et al.2002; Raemdonck et al. 2002] and is a topic of on-going research. Such decision taking mechanisms for distributing the workload over a multitude of objects are based on constrained optimization techniques, which are recognized to be practically solvable only by heuristics. These aspects are not studied in the current paper, whose main contribution is the accurate workload modeling for enabling rendering time control on a mobile device. An overview of the framework is given in Section 3. Section 4 explains the workload model of our mobile renderer, while Section 5 describes the dynamic adaptation mechanisms, exploiting the workload models and the unique, view-dependent multiresolution features of the MPEG-4 Wavelet Subdivision Surfaces.
-------------------------------------------e-mail:
[email protected] † e-mail:
[email protected] ‡ e-mail:
[email protected] § e-mail:
[email protected] *
Copyright © 2004 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
[email protected]. © 2004 ACM 1-58113-845-8/04/0004 $5.00
109
2. Related work 2.1 PDA rendering With the advent of 3D graphics applications for mobile devices (e.g., community gaming) graphical 3D rendering chips for mobiles are rapidly gaining interest. The issue there is to keep good performance at low cost (i.e. price of end-product) and low
power consumption. Woo et al. [2003] used the conventional triangle based algorithms to design a 210 mW rendering engine. The ARM MBX [Stevens] core uses the powerVR [powerVR] rendering architecture. PowerVR uses tile based rendering for limiting the number of external memory accesses. This limits memory bandwidth and power consumption, since external memory accesses typically are one of the most energy consuming operations. Akenenine and Ström [2003] also acknowledged the latter. They proposed a new hardware architecture for rasterizing textured triangles. Although this research into hardware accelerated mobile devices is evolving rapidly, currently few mobile devices support 3D hardware acceleration. For these mobiles, software-rendering engines such as Swerve3D [Swerve3D] and pocketGL [Leroy] are used. The latter is considered within the scope of the current paper.
2.
Static Non-Uniform decoding/rendering. In this mode, the curvature of the base mesh is used to statically select which triangles are more important than others.
3.
View-dependent rendering/decoding. As stated by Benichou et al. [1999], silhouette preservation is important for the visual quality of 3D geometry. In this third mode, the WSS decoder therefore shows the silhouette at a higher resolution and eliminates back facing triangles.
2.2 Rendering time control
Figure 1: The 3D rendering framework. Funkhouser et al.[1993] and Gobetti et al. [1999] did some pioneering work on the rendering time control for 3D graphics. Both implement a benefit/cost model to deliver the best quality while minimizing the cost. Gobetti extended the work of Funkhouser by using multiresolution representations of the geometry instead of discrete LODs. Wimmer and Wonka [2003] investigated a number of algorithms for estimating an upper limit for rendering times on consumer hardware. Unfortunately, all these approaches rely on experimentally determined cost heuristics for estimating the rendering time. To model all possible changes of working parameters, such as screen size, number of light sources, etc., experimental data must be gathered under every possible situation (including all possible system architectures), which leads to long and only approximate calibration processes. Our proposal circumvents this problem by extracting from the source code specification of the 3D renderer an analytical model requiring only a limited set of calibrations. 2.3 Multiresolution representations The field of multiresolution object representation has been very fertile so far. We only refer to the work that directly influenced the MPEG-4 Wavelet Subdivision Surfaces (WSS) used in the current paper. MPEG-4 WSS directly builds upon the work of Khodakovsky et al. [2000] and Morán [2001].
3. Adaptation framework overview. Because of the limited availability of hardware acceleration for mobiles, we have chosen to analyze 3D software rendering engines: Mesa-3D [Paul] and PocketGL [Leroy]. Mesa is a wellknown implementation of the OpenGL specification, but because of the lack of floating point hardware on our experimental PDA (with Intel 80200 processor), it only gives reasonable performance for low-resolution 3D objects without texture rendering. PocketGL addresses this problem by implementing a complete fixed-point texture-rendering engine. Figure 1, shows our 3D rendering framework, which includes a scalable MPEG-4 Wavelet Subdivision Surfaces (WSS) decoder. This decoder operates in three different modes: 1.
The scalable decoder introduces overhead, which increases the execution time for the complete chain. To limit this overhead, the curvature and silhouette detections are done on the base mesh (at the lowest resolution), accompanied by a so-called subdivision code (see Section 5) for controlling the non-uniform decoding/rendering. When the available memory is not an issue, the bit streams of the MPEG-4 encoded 3D objects are uniformly decoded and adapting the mesh resolution is then only a matter of selecting the right triangles using the aforementioned subdivision code. However, with limited memory, it may be necessary to decode only the visible portions of the mesh. In the latter case, an update of the mesh is needed every time the viewpoint changes significantly. The influence on the execution time is discussed in Section 5. Finally, the performance estimation block of Figure 1 estimates the decoder parameters for regulating the execution time for the decoding/rendering chain. This estimator also contributes to the execution time and a proper trade-off should be found between accuracy and complexity. Section 4 describes the performance estimator and discusses how the base mesh can be used to estimate the parameters with limited complexity.
4. 3D Rendering performance model This section describes the performance model for the 3D rendering engine. In the first subsection, all parameters influencing the performance are discussed. In the second subsection, it is shown how they can be derived with little overhead from the MPEG-4 base mesh and in the last subsection, the calibration procedure for initializing the model is explained. 4.1 The 3D rendering parameters As typically done in embedded systems, the rendering pipeline of PocketGL is a limited version of those used on desktops, e.g., Mesa-3D. PocketGL’s optimizations and constraints hide a lot of interesting information about the performance modeling. Therefore, both rendering engines are discussed and it is shown where PocketGL differs from Mesa-3D.
Uniform adaptation of the 3D content, where all mesh triangles are equally treated without any distinction in importance.
Figure 2: Mesa-3D rendering pipeline
110
Figure 2 shows the different stages for the Mesa-3D rendering pipeline. The parameters determining the execution time of this rendering pipeline can be derived through a careful analysis of the source code. These parameters are found by looking for the important loop bounds and the if-conditions changing program path and execution time. The more parameters taken into account, the higher the accuracy but also the higher the complexity. In a software renderer, the different pipeline steps of Figure 2 are executed sequentially. The total execution time T is thus equal to the sum of the execution times for the different pipeline stages. The resulting model is given in Equation (1).
T fixed +
[
]
T = a ⋅ (V + VC ) ⋅ TMP + (1 − pclipped) ⋅ (L ⋅ pshaded ⋅ TL + TVL ) + (1)
(1 − pculled ) ⋅ (F + FC ) ⋅ TRF + S ⋅ TRS + P ⋅ TRP
Where: • The parameter a equals 0 if the system detects that an object is completely outside the viewing frustum and prevents it from being rendered. The parameters a equals 1 when the object is partially or completely inside the viewing frustum. • Tfixed is the time needed to execute parameter independent code. E.g. clearing the color and z-buffer, initialization code, … • TMP is the time needed for the modelview transformation, the projection, the perspective division and the clip tests for 1 vertex. This execution time is multiplied with the number V of vertices incremented with VC the number of vertices introduced by clipping. Clipping a triangle on the canonical viewing volume, possibly introduces new vertices and triangles. We approximate the real situation by adding these vertices VC and triangles FC to the original number of vertices V and triangles F. The body of the clip test loop (iterating over all vertices) contains some branch instructions, but the complexity of the different paths of the branches is very low and similar. These branches are therefore hidden in the performance model. PocketGL implements the same, but simplified loops, as Mesa-3D. PocketGL also uses fixed-point calculation instead of floating point operations, which is important for the performance on mobile devices missing a floating-point unit. Clipping in PocketGL is done in screen space and only the near plane clips triangles • TL and TVL are respectively the time needed to shade one vertex and the time to initialize this shading added with the time for the viewport mapping of one vertex. The Shading and viewport mapping of a vertex falling outside the canonical viewing volume is however prevented. This is taken into account with pclipped, which is the probability that a vertex is falling outside the canonical viewing volume. The lighting stage computes the contribution of every light source to the color of every vertex. Some if conditions check whether the light source is close enough to the object and whether the vertex is actually seen by the light source. These branches significantly change the code complexity for every vertex and are modeled by pshaded. PocketGL only supports texture rendering and the lighting step is limited to changing the intensity of the different vertices (pixels in the rasterization) instead of the full color computation in Mesa-3D. • TRF, TRS and TRP are the rasterization times respectively per triangle (F), line span (S) and pixel (P).
111
Figure 3: The rasterization for Mesa-3D and PocketGL. As shown in Figure 3, both PocketGL and Mesa-3D first interpolate the vertex attributes for all endpoints of the line spans (Ei). For every span, the endpoints are used in an interpolation for calculating the pixel attributes (Pi). These parameters are the same for all texturing modes (flat shaded, smooth shaded and textured rendering). 4.2 Parameter estimation The rendering time estimation relies on accurately estimating the parameter values and weighting constants in Equation (1). The parameters V, F and L are easily monitored. Approximately half the objects in our scenes are lit and therefore we have chosen a fixed value 0.5 for pshaded. Monitoring the other parameters is quite cumbersome since they are only known at late stages in the rendering pipeline. For a correct estimation of the rendering time, a correct estimation of these parameters for every frame of the 3D graphics animation is needed. Lafruit et al. [2000] proposed to calculate P in a preprocessing step for a number of viewpoints, out of which information for other viewpoints is determined by interpolation. In the present paper, we use the following algorithm for calculating the parameters: 1.
Transform the vertices from object to world coordinates
2.
Iterate over all triangles:
a. When all vertices are clipped continue to next triangle. Adapt pclipped with the number of clipped vertices. b. Calculate the number of introduced vertices and facets, based on the number of vertices outside the frustum. c. Calculate the projected area of the triangle. d. If area < 0, then the triangle is culled and go to step a. e. Calculate the number of spans S for the base mesh triangle.
This algorithm is more accurate and it estimates all parameters instead of only the number of projected pixels P, but at the same time it is more computationally expensive. However, if it is only applied on the base mesh (placing a constraint on the base mesh, i.e. it has to preserve the shape of the original object), the complexity is limited and the challenge is shifted to finding an easy relation between the parameters for the base and the parameters for the higher resolution meshes. These relations depend on the technique used for multiresolution modeling, which is therefore explained in Section 5. Rendering the base mesh, which only occurs in a few cases, the execution time overhead for estimating the parameters is around 25% for mesa-3D, while restricted to around 10% for PocketGL. All successive higher-level resolutions (most occurring situation) multiply the number of rendered triangles with four, yielding comparatively a rapidly decreasing estimation overhead below 6% from LOD 1 on. The high overhead of 25% for rendering the base mesh with Mesa-3D is again due to the extra floating point operations introduced by the estimation.
4.3 Calibration
4.4 Deficiencies of the model
This subsection explains how a limited set of calibrations determines the weighting constants T in Equation (1).
Profiling Mesa-3D, we noticed three situations in which the model is not accurate (and which were hidden by the optimizations of PocketGL):
A first calibration is done with the test mesh completely clipped. Normally the sending of a completely clipped mesh to the pipeline is prevented, but for calibration purposes this feature is disabled. The pipeline then stops rendering after the modelview and projection steps and equation (1) reduces to:
T1 = T fixed + V ⋅ TMP
(2)
For this situation, T is measured for different values of V and a linear regression yields Tfixed and TMP. With Tfixed and TMP known, the test mesh is set completely inside the viewing frustum, while culling is enabled for front- and back facing triangles, preventing the rasterization itself. Lighting is enabled but all lights remain disabled. In this case, only the initialization of the lighting is done and Equation (1) reduces to:
T2 = T1 + V ⋅ TVL
(3)
Using linear regression on multiples measures calibrates TVL. The same procedure is then repeated for calibrating TL with the light sources enabled.
1. When an object is partially or completely hidden behind another object, texturing, alpha blending, etc. are prevented by the z buffering (when rendered from front to back). This is not noticed in our simple PDA renderer because this pixel processing is very limited, but when the pixel shaders become more elaborate, the processing time and errors will increase. 2. Estimating the number of vertices and triangles added by the clipper for Mesa-3D is inaccurate, because no easy relationships exist for estimating these numbers for higher LODs given the numbers for the base mesh. However, for PocketGL, clipping is done in screen space, which only influences the number of projected pixels and scan line spans. 3. Texture size: this parameter is not directly visible in the source code but influences the execution time through the cache performance of the processor. This depends on the texture size and the angle from which a triangle is seen. With increasing texture size, the distance between texture samples grows and also the number of cache misses increases.
The calibration of the times TR is a little different because the number S of spans and the number P of pixels depend on each other. Figure 5: a scan line span (black line) in a textured object seen from two different orientations. Figure 5 shows an object with a simple texture, rendered with two different orientations. With the orientation of Figure 5.a, a scan line span (black bold line in Figure 5.a) sweeps horizontally through the texture. With a different orientation such as in Figure 5.b, a scan line span remains horizontal, but the orientation of the texture changes. For that reason, the scan line span samples the texture on different lines, resulting in a higher cache miss rate. This is illustrated in Figure 6, which shows the number of cache misses for Mesa-3D on an AMD processor for different texture sizes and different orientations. Figure 4: Calibration of the rasterization times with perspective correct texture rendering for PocketGL. A rectangle consisting of 512 triangles (512 because TRF must be large enough to measure it) is rendered in different positions, yielding different values for P and S. The rendering time is measured (the dots in Figure 4) and a second order linear regression (the plane) is used for calibrating the times TR. For Mesa-3D, the same procedure is applied for the different rendering modes: flat shading, smooth shading and texture rendered with different texture parameters. For PocketGL (Figure 4), the choices are limited to perspective correct, perspective incorrect texture rendering and flat shaded rendering. Figure 4 shows how the parameters TR can be derived from the plane equation. Because of the analytical model and the knowledge of the algorithms, the number of calibrations is limited to 6 for PocketGL and to 18 with Mesa-3D (Mesa-3D has a lot more possible combinations of texture parameters than PocketGL).
112
Figure 6: Cache misses for different texture sizes and different orientations (in degrees). (Measured on AMD processor with PAPI [Browne et al.]). Because of the limited processing power available on our experimental PDA, texture mapping using Mesa-3D is not feasible. PocketGL implements perspective correct texture
mapping, but it only supports nearest sampling, also because of performance reasons. The maximum texture size with PocketGL is limited to 128x128. As suggested in Figure 6, PocketGL’s small texture sizes do not incur cache effects. However, Mesa-3D’s texture sizes could cause annoying cache effects, slowing down the processing. 4.5 Results Figure 7 shows an example of a prediction and a measurement of the rendering time on our PDA with an Intel 80200 processor. For comparison reasons and to show that the model is also applicable to other types of processors, a measurement and prediction is also shown for a Pentium IV PC. The average error on the PC platform is 3% and the maximum error is 20%. The mean error on the PDA platform is 1.5% and the maximum error is 22%. The maximum errors occur when one of the objects is partially clipped. This can be solved by a better estimation (at the cost of higher overhead) of the number of introduced vertices and triangles. Figure 7 starts with a Venus mesh at LOD 1, i.e. the base mesh subdivided once. For the PC platform, the resolution is increased to LOD 3, which results in a rendering time of 25 ms. On the PDA, the LOD is set at 1 resulting in a rendering time of 500 ms. Figure 7 clearly shows the effect of user interaction (rotation, zoom in/out in frames 3 to 120) on the rendering time. At frame 120, the Stanford bunny is no longer completely clipped and the execution time changes very abruptly. For illustrative purposes, this situation is maintained in a number of successive frames. Later, at frame 127, the LOD is decreased to control the rendering time. The Stanford bunny then disappears after the Venus mesh illustrating that for the simple pixel processing allowed by the PDA, the deviations are indeed negligible. A rendering time of 500 ms yields 2 frames per second, which is clearly not enough for smooth animations and user interactions. The use of a simplified fixed-point renderer, such as PocketGL, decreases the rendering time, such that the textured Bunny and Venus meshes can be drawn at LOD 2 for a rendering time of 20ms. The mean error, measured on different test-sets, of the execution time estimation for PocketGL is also below 10%.
Figure 7: Measurement (black line) and model (grey line) of the rendering time for mesa-3D on the PDA and PC.
5. Adaptation Framework For controlling the rendering time, an adaptation framework reading MPEG-4 Wavelet Subdivision Surfaces (WSS) bit streams was developed. The framework is based on the work of Moran [2001], to which adaptive subdivision has been added for finer control because with uniform subdivision the mesh size changes in steps of a factor four, which only allows a coarse regulation of the rendering time. 5.1 Uniform Adaptation For uniform adaptation, the base mesh is recursively and systematically subdivided to increase the LOD. Since subdivision only smoothes the base mesh, MPEG-4 WSS encoded detail information is added to the “predicted” mesh to make it match the surface of the original/target high-resolution mesh. The following paragraphs explain the relationship of the different parameters of Equation (1) between different LODs.
113
Figure 8: Midpoint Subdivision of triangle a doubles the number of spans S in triangle b. The contribution of a particular triangle to the number of scan-line spans S is mainly determined by its height in screen coordinates, which is halved by midpoint subdivision. Therefore, the number of scan-line spans of any of the four triangles of Figure 8.b is half that of their mother triangle, depicted in Figure 8.a. Bearing in mind that between successive higher mesh resolutions the number of triangles is multiplied by four, the net result is that S doubles at each subdivision step. Of course, this reasoning is not completely accurate for more general subdivision schemes where vertices may be displaced at each subdivision step. However, important as these displacements might be for the final appearance of the rendered surface, they have a negligible impact on its size on screen, so that it is most reasonable to say that each subdivision step doubles S. For exactly the same reason, it can be argued that P, the total number of pixels covered by the mesh, hardly changes between LODs.
If a top-level triangle of the base mesh is culled or completely clipped, all triangles originating from that triangle will most likely be culled or completely clipped as well. The ratios pculled of culled and pclipped of clipped vertices are therefore also constant for different LODs (while still varying with the viewpoint).
and the children triangles sharing it (or any of its endpoints) are created. 5.2.2 Non-uniform Adaptation control For non-uniform subdivision, the criteria explained above are used to decide which edges must be split. If an edge is split, all triangles sharing a vertex with the edge are added to the data structure holding the relationship between mother and child triangles (a quadtree). For complexity reasons, we detect the important regions on the base mesh and we store the result into a subdivision code, which controls the selection and computation of triangles at subsequent LODs.
5.2 Non-Uniform Adaptation
Figure 9: Different types of refinement of a base mesh (a): static based on curvature (b), dynamic based on silhouette (c), b and c combined (d), and uniform (e)
For non-uniform adaptation (as shown in Figure 9), the important regions of the mesh are subdivided at a higher LOD than others. Such regions can be determined according to geometric criteria intrinsic to the target surface, e.g., its curvature. We call the adaptation resulting from such criteria static, as opposed to the dynamic adaptation resulting from taking into account user navigation, e.g., silhouette refinement, which is important for subjective quality [Benichou and Elber 1999]. Obviously, every time the viewpoint changes, the silhouette needs to be detected and the vertex buffer adapted accordingly. But similarly to what is done for the number of triangles, scan-line spans, etc., silhouette detection can be performed on the base mesh to reduce the complexity of the adaptation control.
Figure 11: The subdivision codes for non-uniform subdivision. Figure 11.1 shows the relationship between the subdivision code and the numbering of the children: bit i is set to 1 if child i is selected for rendering by the above-mentioned criteria, and to 0 otherwise. Three cases now occur: 1.
A triangle shares a vertex with a selected edge (Figure 11.{2,3,5}). In Figure 11.2 vertex a is part of a selected edge and the child triangle adf in the mother triangle abc is selected for rendering. The (possibly non-planar) quadrilateral dbcf must be decomposed in triangles dbc and dcf, which are temporarily created for rendering.
2.
One edge of the mother triangle is a selected edge (Figure 11.{4,6,7}). In Figure 11.4, the children of the neighboring mother triangles sharing edge ab are selected for rendering by setting the appropriate bits in the subdivision code of the two neighboring triangles. To avoid cracks, triangle fec is created for rendering. For higher LODs, still only child triangles adjacent to edge ab are selected for rendering, leaving e.g. triangle fec in Figure 11.4 unchanged.
3.
All edges of the mother triangle are selected for rendering (Figure 11.8). All children are selected for rendering and the subdivision code is set to ‘1111’. Also the appropriate bits in the neighboring triangles are updated.
5.2.1 Non-uniform Adaptation criteria
Figure 10:Changing the dot product criterion threshold: t ranges from 0 (base mesh) to 1 (2 steps of uniform subdivision)
The static surface curvature criterion has been implemented by checking whether the angle between two neighboring face normals is larger than a certain threshold. More specifically, if the dot product of these (normalized) normals is smaller than a value εt (set by the rendering time controller), the common edge is split. Figure 10 shows how the resolution (number of triangles) of the mesh changes with εt. For the dynamic non-uniform subdivision based on silhouette detection, we use an approach similar to the one of Benichou and Elber [1999]: if one of two neighboring triangles is visible and the other is not, their common edge is split
114
Figure 12: Non-uniform subdivision around the bold line.
Figure 12 shows a small mesh of triangles for which the bold edges are selected to be split. In Figure 12, triangles and vertices in grey) are added following the guidelines of Figure 11 and it shows that the resolution is locally increased without introducing cracks.
pixels is still more or less the same as for the base mesh. Experimental results show that as long as t is below a certain value , the parameters of the base mesh can be used and those of the uniformly subdivided mesh are taken otherwise (e.g., 0.5 gives good results for the Venus mesh).
5.2.4 Rendering time control
Figure 13: Inheritance of the code Because further splits happen next to the same base mesh edge, the subdivision code for the children is directly derived from the mother’s subdivision code. E.g., if the triangle of Figure 11.6 is further subdivided, the edges af and fc as shown in Figure 13 must be split, resulting in the same subdivision code 1011 for the triangles adf and fec. For the child triangle fed, only the child triangle near vertex f must be created and it derives the code 0010. This procedure can be recursively repeated until the desired subdivision level is reached. Once the subdivision code is derived for the base mesh, we know exactly which vertices and triangles are needed at all LODs without checking neighboring triangles. Consequently, the curvature and silhouette detection can be performed at the base mesh with a low overhead. The subsivision code can then be used to select vertices and triangles for rendering from a uniformly decoded WSS mesh, but it can also be used to select the vertices and triangles that need to be computed with non-uniform decoding. The latter is computationally more expensive, but may be needed because of limited memory in the mobile device.
Figure 15:The number of triangles as a function of εt and the LOD for the Venus mesh. With a time budget given, Equation (1) must be used to estimate the number of triangles. Figure 15 shows how a triangle budget is translated in the WSS decoder parameters. This is used in the following algorithm: 1. 2.
3.
5.2.3 Rendering time model This subsection explains how the parameters of Equation (1) depend on the non-uniform adaptation. For complexity reasons, all these parameters (see Subsection 4.2) were calculated on the base mesh. Unfortunately, some of them change with the LOD, and therefore good approximations should be determined at the current resolution level.
Figure 14: The effect on the rendering time of varying εt for the bunny mesh. When dynamic adaptation is enabled, invisible triangles are detected and not sent to the rendering pipeline, therefore avoiding any culled triangles at all (pCulled=0). The number P of projected
115
4.
Compute base mesh parameters. From Equation (1), compute the number of triangles T, using the base mesh parameters. The number of vertices V is more or less half the number of triangles because WSS meshes are semi-regular. Using T and Figure 15, select the decoder parameters εt and LOD. A high LOD and low εt gives a lot of detail at the very important (very local) regions, while a high εt takes more regions into account, but with less detail. Multiple solutions are possible, but experimental results show that the one with the highest εt gives the best quality. With these decoder parameters, the number of spans can be estimated. Go to step 2 using the new number of spans.
We do not take the overhead time into account in our performance model. We have made this choice because taking it into account gives a larger overhead complexity, further reducing the rendering time on our mobile (the time budget is divided between overhead and rendering). For example, for estimating the time needed to decode extra triangles, the subdivision code is needed, forcing the computation of this code in multiple iterations. Moreover, with static non-uniform subdivision, the largest part of the overhead is occurring only when the mesh is really adapted, i.e. when the performance model computes a significant change of the decoder parameters. This results in a framerate, which is some fps lower than the estimated one in the frames with active adaptation. With dynamic non-uniform adaptation, the situation is more complex because the viewpoint is an extra decoder parameter, which forces a redistribution of the triangles every time the viewpoint changes. With view-dependent adaptation, one can detect the back facing triangles and exclude them from rendering, which increases the performance. On the other hand, performance decreases, because silhouette preservation increases the number of triangles at the silhouette. Figure 16 shows two situations in which the performance increases (a) or decreases (b) by using dynamic view-dependent
adaptation. In Figure 16.a, a lot of back facing triangles are eliminated while a small number is added, while in Figure 16.b, the situation is reversed as shown in Table 1 (the depicted times include the extra overhead).
The Mesa-3D pipeline gives a very high flexibility but because of its high complexity, the rendering time is only acceptable for lowresolution objects without texture rendering. Therefore, renderers such as PocketGL optimize the pipeline by implementing a complete fixed-point pipeline and by introducing a lot of constraints. This illustrates the need for hardware accelerated 3D on mobile terminals and it initiates a topic of future research: validating and updating the workload model and control for upcoming 3D graphics hardware accelerated mobile terminals.
7. Acknowledgement Part of this work was funded by the IWT (Instituut voor de aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen) and the European Project OZONE (IST-200030026).
Figure 16: Pros (a) and cons (b) of dynamic adaptation
Non-uniform time (% of uniform time)
LOD
Figure 16.a
Figure 16.b
1
87
135
2
75
172
3
42
142
References AKENINE-MÖLLER., STRÖM, J.,2003. Graphics for the masses: A Hardware Rasterization Architecture for Mobile Phones, In proceedings of ACM SIGGRAPH 2003,ACM press/ACM SIGGRAPH BENICHOU, F, AND ELBER, G.,1999, Output Sensitive Extraction of Silhouettes from Polygonal Geometry, in The Seventh Pacific Conference on Computer Graphics and Applications.
Table 1: Pros and cons of dynamic adaptation. In our current framework, view-dependent adaptation is disabled when it decreases the performance. Future research is needed to better exploit the possibilities of view-dependent adaptation.
BROWNE, S., DEANE, C., HO, G., MUCCI, P., 1999, PAPI: A Portable Interface to Hardware Performance Counters, Proceedings of Department of Defense HPCMP Users Group Conference
6. Conclusion
FUNKHOUSER, A. T., SEQUIN, C. H., 1993. Adaptive Display Algorithm for Interactive Frame Rates During Visualization of Complex Virtual Environments., In proceedings of ACM SIGGRAPH 1993., 247-254.
We have presented an adaptation framework, which allows finegrained control of the time needed to decode and render multiresolution 3D surfaces through adaptive wavelet subdivision. This allows high quality 3D content, which is originally designed for high performance terminals, to be accessed and adapted to the mobile device’s processing capabilities. To control this adaptation, we have derived an analytical model for the rendering times of the Mesa-3D and PocketGL pipelines. A good trade-off between estimation overhead and accuracy has been found: parameters from the base mesh are extrapolated to the higher resolution meshes for accurate execution time estimation. To further enhance the subjective quality of the rendered objects, a non-uniform subdivision of the base mesh is performed. Thus reduces the triangle cost but preserves: (i) intrinsic shape properties of the 3D objects, e.g., creases and curvatures, and (ii) dynamic shape properties e.g. silhouette. To limit the overhead of non-uniform subdivision control, we have developed a control based on a subdivision code, which allows the detection of important regions on the base mesh and drives their selective refinement to obtain the higher resolution meshes. With all aforementioned trade-offs, a high rendering quality is obtained with a control overhead limited to 10%, while the difference between the workload model estimations and the measured execution times is also below 10%.
116
E, 1999. Time-Critical GOBBETI, E., AND BOUVIER, Multiresolution Scene Rendering., In proceedings of IEEE Visualization 1999., 123-130. LAFRUIT, G., NACHTERGAELE, L., DENOLF K., AND BORMANS J. 2000. 3D Computational Graceful Degradation. In Proceedings of ISCAS - Workshop and Exhibition on MPEG4, III-547 - III-550. LEROY, P., PocketGl, www.sundialsoft.freeserve.co.uk/ KHODAKOVSKY, A, SHRÖDER, P., SWELDENS, W., 2000, Progressive Geometry Compression, In Proceedings of ACM SIGGRAPH, pp. 271-278. PAUL, B., Mesa-3D, www.mesa3D.org MORÁN, F., 2001, Hierarchical Modelling of 3D Objects with Subdivision Surfaces, PhD thesis, Technical University of Madrid. PHAM NGOC, N., VAN RAEMDONCK, W., LAFRUIT, G. DECONINCK, G. AND LAUWEREINS R," A QoS Framework for Interactive 3D Applications ," The 10-th International Conference on Computer Graphics and Visualization'2002, WSCG'2002 , pp. 317-324 , Feb. 2002 VAN RAEMDONCK, W., LAFRUIT, G. STEFFENS, E.F.M, OTERO PÉREZ, C.M., BRIL, R.J." Scalable 3D Graphics Processing in Consumer Terminals," ICME, 2002
WOO, R. CHOI, S. SOHN, J.H. SONG, S.J,. BAE, Y.D, YOON, C.W NAM, B.G. WOO, J.H KIM, S.E. PARK, I.C SHIN, S. YOO, K.D. CHUNG, J.Y. AND YOO, H.J. A 210mW Graphics LSI Implementation Full 3D Pipeline with 264Mtexels/s Texturing for Mobile Multimedia Applications. In proceedings of ISSCC 2003, 2003, pp 44-45. POWERVR, www.powervr.com
117
STEVENS, A., ARM 3D Graphics Solutions White Paper, available at: www.arm.com SWERVE3D, www.swerve3D.com WIMMER, M., WONKA, P. 2003. Rendering time estimation for Real-Time Rendering, In proceedings of the Eurographics Symposium on Rendering, pages 118-129. June 2003
118