IEEE Trans. on Image Processing,. 3(3):572{588, 1994. 7] William B. Pennebaker and Joan L. Mitchell. JPEG. Still Image Data Compression Standard. Van Nos-.
ICIP'97, IEEE International Conference on Image Processing, Vol. I
57
EMBEDDED CODING OF 3D GRAPHIC MODELS Jiankun Li and C.-C. Jay Kuo
Integrated Media Systems Center and Department of Electrical Engineering-Systems University of Southern California
ABSTRACT
A noval progressive compression method which encodes a 3D graphic models into an embedded bit stream is investigated in this research. The coder rst encodes the coarsest resolution of the model, and then includes the information of ner details gradually. A rate-distortion model is used in the integration of dierent types of bit streams into a single embedded bit stream to optimize the overall coding performance. This embedding property can be applied to progressive transmission, multi-resolution editing and level-of-detail control. Numerical experiments are provided to demonstrate the excellent rate-distortion performance of the proposed method.
1. INTRODUCTION Since the advent of 3D laser scanning system and the boom of the virtual reality modeling language (VRML) for graphic description, 3D graphic models become more accessible to general end users. In most existing commercial hardware and software, the polygonal mesh remains to be the most popular building block of object description. Since a polygon is a planar primitive, a complex object with many ne details can only be faithfully represented with thousands or even millions of polygons. Thus, good 3D graphic compression methods is very critical. Another important consideration in the design of 3D graphic coders is to allow a multiresolution representation of the object. That is, the information is coded in the order of importance. The most important information is encoded rst, and then ner details are added gradually. As a result, the decoder can construct the models of dierent resolution, from the coarsest approximation to the nest replica, depending on the application requirement. This progressive coding property, which is also referred to as the embedding property, nds wide application in robust error control, progressive transmission/display, and level-ofdetail control, etc. In this work, we propose a new embedded 3D graphic coding scheme, which combines state-of-the-art tech-
niques in graphic simpli cation and progressive coding to construct a new compression method. This scheme progressively compresses an arbitrary polygonal mesh into a single bit stream. Along the encoding process, every output bit contributes to reduce the distortion, and the contribution of bits decreases according to their order of position in the bit stream. At the receiver end, the decoder decodes from the bit stream the most important information rst and then ner details. The decoder can stop at any point while giving a reconstruction of the original model with the best rate-distortion tradeo. A series of models of continuous resolution can thus be constructed from that single bit stream. This property, which is also referred to as the embedding property since the coding of a coarser model is embedded in the coding of a ner model, can be widely used in robust error control, progressive transmission and display, and level-of-detail control, etc.
2. PROPOSED ALGORITHM In this section, we describe the embedded 3D graphic coding algorithm in detail. First, a procedure to construct a hierarchical mesh is presented in Section 2.1. Then, we examine the coding of the structure and the attribute data in Sections 2.2 and 2.3, respectively. Finally, the coded bit streams of the two types of data are multiplexed into a single bit stream by using a ratedistortion model in Section 2.4.
2.1. Construction of Hierarchical Mesh
Even though the developed compression scheme can be applied to an arbitrary polygon mesh in principle, we will focus on a mesh composed by triangles for simplicity in this work. There are many graphic simpli cation schemes [1] [2] [3] [4] we adopt the vertex decimation method described in [1]. Multiple passes are made over all vertices in the mesh. Each vertex and its local topology are checked for the eligibility of removal. According to its local topology, the vertex is classi ed into three dierent types:
58
Oct., 1997 Santa Barbrara
simple: A simple vertex is surrounded by a complete cycle of triangles and each edge connected to the vertex is shared by two triangles.
complex: If one of its connecting edges is not
shared by two triangles, or not forming a complete cycle of triangles.
boundary: A boundary vertex is on the boundary of a mesh, i.e. within a semi-cycle of triangles.
In order to preserve the topology, only simple and boundary vertices are candidates for removal. The removal of a vertex and all triangles depending on the vertex result in a hole in the mesh. This hole is lled by local re-triangulation as shown in Fig. 1. The delete- ll operation is repeated until the removal of any vertex could result in a topological violation. The resulting mesh serves as the base mesh which is the coarsest approximation of the original mesh connectivity.
removal addition
Figure 1: Removal and addition of a vertex. For every removed vertex, it is always associated with a list of neighboring vertices. By neighboring, we mean the neighborhood of a vertex with respect to the current updated mesh rather than the original mesh structure. For the rst vertex removed from the original mesh, its neighboring vertices are truly its nearest vertices. However, every following removed vertex may has some neighboring vertices which are not its nearest vertices since some of its nearest vertices may have been removed beforehand and the local topology has been changed. A hierarchical representation of the connectivity of the mesh is then constructed by this neighborhood structure. By coding the base mesh and each vertex addition step, we can completely record the entire original mesh. The base mesh can be stored with a regular graphic format. As to the coding of vertex addition, we consider two dierent types of data. One is the structure data which specify where and how the mesh topology is locally updated. The other is the attribute data which describe the individual information of the new vertex, such as its position, normal and color.
2.2. Coding of Structure Data
We explain the procedure to encode the local topology update with a simple example shown in Fig. 1, where the addition (or removal) of one vertex in a local region is illustrated. The rst step is to delete all edges inside the region. Then, a new vertex is added somewhere inside the region and connected to all neighboring vertices. This local region is called the neighborhood of the central vertex. This vertex addition (or removal) process can be encoded by recording a list of boundary vertices (or composing triangles) in updated meshes. However, this approach requires a lot of coding bits. In our algorithm, we consider a dierent approach by implementing a pattern lookup table, where the region is speci ed by one local index and one global index. The local index determines the pattern of the neighborhood topology while the global index locates the neighborhood in the entire mesh. This approach can be justi ed as follows. There is a xed number of topological patterns associated with a neighborhood. Even though the number of patterns grows very rapidly with the number of edges (NOE) of a neighborhood, the number of edges for most neighborhoods is between 5 and 7. Only a small fraction of neighbors have NOE larger than 10. We nd that there are in total 231 patterns with NOE less than 10, an 8-bit index set can be used to represent these patterns and stored in both the encoder and the decoder. We impose a constraint in the vertex removal process that only vertices with NOE less than 10 are eligible for removal so that every neighborhood thus generated can be uniquely encoded with the 8-bit index set. The index of the rst triangle in the neighborhood partition is used as the global index to locate the pattern in the updated mesh.
2.3. Coding of Attribute Data
One essential attribute datum is the vertex position. Other commonly used attribute data include the normal and the color of the vertex. Generally speaking, attribute data bear a strong correlation among a local neighborhood. Due to the irregularity of the mesh structure, it is very dicult to perform the transform such as DCT or the wavelet transform for energy compaction. Instead, we perform a prediction of a certain attribute of a vertex by simply averaging the same attribute of its neighboring vertices. Thus, each attribute datum of each vertex has a prediction residue. Residues of the same attribute are then arranged into a 1-D array in the same order as the sequence of vertex addition. and compressed by successive quantization [5], [6] and contexted arithmetic coding [7].
ICIP'97, IEEE International Conference on Image Processing, Vol. I
2.4. Integration
59
?1. Then, the attribute date of all existing vertices, i.e., old vertices introduced in the previous layers and new vertices introduced in the layer , are further quantized and encoded up to threshold . This nishes the coding of information at layer . The same procedure can be repeated for all layers. After all the vertices are added back, only the coding of attribute data is conducted.
Ti
The purpose of integration is not simply mixing two bit streams of both structure and attribute data together. The bit stream of structure data provides information about the mesh structure while the bit stream of attribute data provides information about the model geometry. The quality of the reconstructed model depends on both the number of vertices and the precision of attribute data. The more bits of each bit stream is decoded, the more similar of the decoded model to the original model. It is desirable that this property is preserved in the nal bit stream as well. This problem is similar to the merge of two arrays, which have been sorted in order according to a certain measurement individually, into one array by the same sorting measurement. In this case, the measurement is distortion reduction. A rate-distortion model is built for each bit stream to study the average distortion reduction per bit so that these two bit streams can be multiplexed in the proper order. In the coding of structure data, a vertex deleted later is most likely to have a larger residue, since a vertex is removed when it has the smallest residue. Rigorously speaking, this is not absolutely correct since the vertex removal is performed locally and the update can change the mesh structure and the residue of neighboring vertices. However, generally speaking, the distortions reduced by a vertex addition are most likely in a decreasing order. In the coding of attribute data, the quantization residue of each vertex is uniformly distributed in the interval [? , ] for a certain threshold . Consequently, the distortion reduction at a certain vertex is random. It is not correlated to that of the preceding or following vertices. According to these features, we propose an multiplexing scheme as follows. We rst nd the prediction residue of every vertex to be removed, and determine the maximum magnitude of all residues. Vertices are added back in a layer-by-layer fashion. Two set of thresholds, and , are chosen to control the vertex addition and coding of attribute data, respectively. Thresholds are de ned as Ti
i
Ti
i
3. EXPERIMENTAL RESULTS We have tested our algorithm on various 3D graphic models. All models include 6 dierent attribute data: 3 for the vertex position and 3 for the vertex normal.
(a) original mesh and its wireframe
Ti
Ti
T
Si
Ti
Ti
0 = 2 ; and T
T
while
i +1 = 2 for
Ti
T
i
=1 2 3 ;
;
; : : :;
is a monotonically decreasing sequence 1 = ?1 0 1 At layer , all vertices with prediction residue in the interval [ ?1 ] are added back to the mesh in order. For each newly added vertex, its attribute data are immediately coded progressively up to the quantization layer ? 1, which is controlled by threshold Si
S
i
Si
; Si
i
> S
> S
>
:
(b) 100:1 model and its wireframe Figure 2: Compression of the dinosaur. To give a rough idea of the performance of the proposed progressive 3D graphic coder, we show the original dinosaur model and its wireframe structure in Figs. 2 (a). The the 100:1 compressed model and its wireframe are shown in Fig. 2 (b). For this particular case, the compressed model contains around one tenth vertices and triangles of the original model. By comparing Figs. 2 (a) and (b), it is clear that the compressed model has a much simpler wireframe structure. Also, the vertex position has only a coarse representation, where each coordinate is represented with 4 quantization layers (in contrast with the 32-bit full resolution). With such a mesh structure and the precision of vertex positions, we are able to render a 2D image with a very small distortion. More experimental results are
60
Oct., 1997 Santa Barbrara
listed in Table 1, where #v is the number of vertices in the model, #t is the number of triangles, CR is the compression ratio and SNR stands for the signal-tonoise ratio. The compression ratio (CR) is de ned to be the ratio of the decompressed le size over the original le size. According to our experience, a very high quality rendered image with resolution 512 512 pixels can be obtained from a compressed graphic model of SNR in the range of 40 45dB. object
original #v #t dinosaur 2832 5647 tube 6292 12584 spock 16386 32768 bunny 34835 69473
compressed #v #t 1091 2178 2627 5254 5182 10369 5688 11217
cr
snr
20 20 20 40
41.39 44.81 45.93 45.03
Table 1: Comparison of the compression performance. 50
of the models. The bunny model is approximately a round shape without many visually important details in the body. Its total number of vertices is 34,835. The dinosaur is more complicated due to the four legs, two ears, two horns, and some special features around the mouth region. However, its total number of vertices is only 2,832.
4. CONCLUSION A progressive compression algorithm, which encodes a 3D geometric model into an embedding bit stream, was studied in this work. A 3D geometric model consists of two kinds of data: structure data and attribute data. Structure data characterize the connectivity information among vertices while attribute data describe other relevant information of vertices such as positions, colors and normals. They are encoded separately according to their importance and then integrated into a single bit stream. The decoder can stop decoding at any point while giving a reasonable reconstruction of the original model.
5. REFERENCES
45
[1] William J. Schroeder. Decimation of triangle meshes. In Computer Graphics Proceedings, Annual Conference Series, pages 65{70. ACM SIGGRAPH, Jul. 1992. [2] Greg Turk. Re-tiling polygon surfaces. In Com-
SNR
40
35 spock 30
bunny dinosaur tube
25 20
30
40
50 60 70 Compression Ratio
80
90
100
Figure 3: Rate-distortion performances. The tradeo curves between the compression ratio and SNR for the four test graphic models are shown in Fig. 3. Since the proposed coding scheme is embedded, we are able to encode one 3D graphic model into a single bit stream to obtain all results given in one curve. At the decoder end, as more bits are received and decoded, the reconstructed model becomes more accurate. Correspondingly, SNR gradually increases and the compression ratio decreases. Dierent compression performances are achieved for dierent models due to the dierent degree of redundancy existing in the each model. For example, the bunny model has a very high degree of redundancy and can be compressed most while the dinosaur model has a very low degree of redundancy and can be compressed least. This observation is in fact consistent with the visual perception
puter Graphics Proceedings, Annual Conference Series, pages 55{64. ACM SIGGRAPH, Jul. 1992. [3] Hugues Hoppe. Progressive meshes. In Computer Graphics Proceedings, Annual Conference Series,
[4]
[5] [6] [7]
pages 99{108. ACM SIGGRAPH, Aug. 1996. Jonathan Cohen, Amitabh Varshney, Dinesh Manocha, Greg Turk, Hans Weber, Pankaj Agarwat, Frederick Brooks, and William Wright. Simpli cation envelopes. In Computer Graphics Proceedings, Annual Conference Series, pages 119{128. ACM SIGGRAPH, Aug. 1996. J. M. Shapiro. Embedded image coding using zerotrees of wavelet coecients. IEEE Trans. on Image Processing, 41(12):3445{3462, 1993. D. Taubman and A. Zakhor. Multirate 3d subband coding of video. IEEE Trans. on Image Processing, 3(3):572{588, 1994. WilliamB. Pennebaker and Joan L. Mitchell. JPEG Still Image Data Compression Standard. Van Nostrand Reinhold, New York, 1993.