Connectivity-Based Segmentation for GPU-Accelerated Mesh ...

4 downloads 205144 Views 4MB Size Report
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, ... We present a novel algorithm to partition large 3D meshes for GPU-accelerated decompression. .... workable for meshes with high degree vertices.
Zhao JY,Tang M, Tong RF. Connectivity-based segmentation for GPU-accelerated mesh decompression. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(6): 1110–1118 Nov. 2012. DOI 10.1007/s11390-012-1289-x

Connectivity-Based Segmentation for GPU-Accelerated Mesh Decompression Jie-Yi Zhao (赵杰伊), Min Tang∗ (唐 敏), and Ruo-Feng Tong (童若锋), Member, CCF College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China E-mail: {su27, tang m, trf}@zju.edu.cn Received September 5, 2012; revised October 6, 2012. Abstract We present a novel algorithm to partition large 3D meshes for GPU-accelerated decompression. Our formulation focuses on minimizing the replicated vertices between patches, and balancing the numbers of faces of patches for efficient parallel computing. First we generate a topology model of the original mesh and remove vertex positions. Then we assign the centers of patches using geodesic farthest point sampling and cluster the faces according to the geodesic distance to the centers. After the segmentation we swap boundary faces to fix jagged boundaries and store the boundary vertices for whole-mesh preservation. The decompression of each patch runs on a thread of GPU, and we evaluate its performance on various large benchmarks. In practice, the GPU-based decompression algorithm runs more than 48x faster on NVIDIA GeForce GTX 580 GPU compared with that on the CPU using single core. Keywords

1

parallel decompression, mesh segmentation, connectivity compression, GPU, Edgebreaker

Introduction

In computer graphics and animation, the data of 3D meshes are growing rapidly, motivated by the need for more details and higher accuracy in representing and creating objects. Different algorithms[1-5] have been proposed to compress these meshes. These algorithms can be categorized into lossless or lossy according to whether the original mesh can be exactly reconstructed from the compressed data. Lossless algorithms compress the connectivity information and reduce repeated references to vertices that are shared by many faces. However, in time and space sensitive situations, these algorithms cannot reach ideal balance between speed and compression ratio. The recent trend in computer architecture is developing parallel commodity processors, including multicore CPUs and many-core GPUs (graphic processing unit). It is expected that the number of cores would increase fast annually. Given the trend, many parallel algorithms have been proposed for accelerating mesh decompression on GPUs. In this paper, we mainly deal with designing algorithms that can exploit thread-level parallelism of GPUs.

Contributions. We present a novel connectivitybased segmentation algorithm that can partition a large mesh into many patches to match the architecture of GPU for decompressing. Our formulation proves to accelerate mesh decompression significantly with little increase of the compressed data. We test it with several benchmarks and evaluate the performance on an NVIDIA GeForce GTX 580 GPU. In practice, the GPU-based decompression algorithm runs more than 48x faster than the sequential algorithm on CPU. Organization. The rest of the paper is organized as follows. Section 2 gives a brief survey of prior work. We present the mesh connectivity segmentation algorithm in Section 3, discuss the determination of segmentation strategy in Section 4, and present the implementation details and highlight the performance in Section 5. Finally, we analyze our approach and point out some of its limitations in Section 6. 2

Related Work

2.1

Mesh Connectivity Compression

Turan[3] proposed an algorithm using the spanning

Regular Paper The work is supported in part by the National Basic Research 973 Program of China under Grant No. 2011CB302205, the National High Technology Research and Development 863 Program of China under Grant No. 2012BAD35B01, the National Natural Science Foundation of China under Grant No. 61170140, and the National Natural Science Foundation of Zhejiang Province of China under Grant No. Y1100069. ∗ Corresponding Author ∗∗ The preliminary version of the paper was published in the Proceedings of the 2012 Computational Visual Media Conference. ©2012 Springer Science + Business Media, LLC & Science Press, China

Jie-Yi Zhao et al.: Mesh Segmentation for GPU Decompression

trees of edges to encode planar graphs and stored 12 bits per vertex (bpv). Keeler[4] improved Turan’s algorithm to store 9 bpv for encoding planar graphs and 4.6 bpv for triangle meshes. Taubin[5] proposed the Topological Surgery algorithm that stores around 4 bpv. Rossignac’s Edgebreaker[6] and its later improvements gives the best compression rate for triangle mesh connectivity. The Edgebreaker algorithm has five operators to include triangles into a boundary: C, L, E, R and S, and the algorithm can guarantee bit rate to 3.67 bpv. Gumhold and Strasser[7] introduced a triangle mesh compression algorithm that is similar to the Edgebreaker algorithm. However, these mesh algorithms decompress through sequential encoding, which cannot be used for random access mesh traversal. Topraj[8] proposed an efficient algorithm that can be used for random access mesh traversal, which uses simple data structure for representing the connectivity of manifold triangle meshes. The algorithm provides the option to store on average either 26.2 bits per triangle. But compared with Edgebreaker which can store about 2 bits per triangle, the compression rate of such algorithms are not ideal for space sensitive uses. 2.2

Mesh Segmentation

Mesh segmentation is widely used in many areas such as 3D shape retrieval, compression, texture mapping, deformation, and simplification. There are mainly two kinds of mesh segmentation algorithms: patchtype and part-type. Patch-type algorithms partition meshes into disk-like patches. Segmentation algorithms cluster small regions with similar attributes into large regions. There are mainly three schemes: regiongrowing scheme, hierarchical-clustering scheme, and the k-means based clustering scheme. The k-means based scheme starts from k centers of k clusters and assigns each element to one of the clusters. Then the centers are updated in each iterations until they stop changing[9] . Geodesic distance[10] , Random walks[11] and isotropic re-meshing based[12] techniques are also paid much attention to in recent years. There are also algorithms that can partition a 3D mesh into a set of disjoint, meaningful parts with prescribed number of segmentation[13] . 2.3

GPU Computing

Modern GPUs are regarded as high-throughput processors, which have a theoretical peak performance of a few Tera-Flops. Most of these GPUs operate on an SIMD (single-instruction multiple data) basis and the computations are performed simultaneously by executing a large number of threads. At a broad level, GPUs consist of several multi-processors, while each of

1111

them contains a number of streaming processors and a small shared memory unit. For example, the NVIDIA GeForce GTX 580 GPU has 16 streaming multiprocessors (SMs), each SM contains 32 CUDA cores and each CUDA core can run 48 threads. The computing power of GPUs scales almost linearly with the number of cores. To fully exploit the computational capabilities of modern GPUs, good task decomposition scheme needs to be designed. GPU techniques have been widely used for rendering acceleration[14-16] , and this paper would like to explore the utilization of GPU for geometry processing purpose. 3 3.1

Mesh Connectivity Segmentation Motivation

Recently there are many studies[8,17] focus on mesh connectivity compression, and the best case of the compression that can be randomly accessed stores 26 bits per triangle[8] . Because of the limited memory of GPUs and the increasing size of 3D models, the compression and decompression of certain large 3D models needs to be processed in non-random accessed algorithm, such as Edgebreaker which can store 3 bits per triangle. Our algorithm partitions 3D meshes into many patches, which will match the multi-thread architecture of GPUs. Then we use connectivity compression algorithm to compress each patch, thus the decompression can be highly accelerated to meet speed priority needs. The patches after the segmentation will share vertices, which will store more than the original 3D mesh. So we need to make the result of segmentation contain less replicate vertices. What is more, we need to make each patch have balanced number of faces so that the decompression speed will be optimized. 3.2

Algorithm Overview

The overview of the algorithm is as follows. • Evolution of Topology. First we transform the original mesh into a topology model by removing vertex coordinates, representing only the topology information, and redefine the distance between faces. • Assignment of Centers. We assign the center of patches using the topology model, and the centers are distributed uniformly on the surface with geodesic farthest point sampling. • Clustering the Faces. We cluster the faces of the mesh according to their distances to the center points assigned in the second step. A patch is composed of the faces that are nearest to its center point in geodesic distance. • Refinement of Faces. After the clustering of faces, there will appear some jagged boundaries of the

1112

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

patches. We need to refine the jagged boundary in order to reduce the number of boundary vertices so that there will be less repeated data. • Storage of Boundary Vertices. Our segmentation result should be equivalent to the original mesh after the decompression. So we should store the boundary information in order to combine multiple patches together. The coordinates of boundary vertices are stored in an extra array, while each patch only stores the indices of the vertices. 3.3

Evolution of Topology

Because our segmentation deals only with mesh connectivity, we can simplify the original mesh into a topology model without vertex positions. As Fig.1 shows, the center of a face is in the geometric center of the equilateral triangle. The distance between points are defined by the geodesic distance of the topology model, and we use 1 instead of the length of edges for the geodesic distance calculation.

Fig.1. Evolution of topology. In the topology model the length of each edge is defined as 1 and vertex coordinates are removed.

The topology model cannot represent an actual mesh, but it can be used to judge the relationship among faces and vertices. As Fig.2 shows, the distance in the topology model is computed by geodesic distance. The geodesic distance on the topology model is calculated by directly replacing the input of length of edges to 1 in geodesic distance algorithms[18] , so it is also workable for meshes with high degree vertices.

Given two faces, the distance between them is defined as the geodesic distance between their centers, where the center is in the geometric center of the equilateral triangle. Triangle mesh with holes can also be transformed into topology model by such distance definition. 3.4

Assignment of Centers

To obtain parallel execution of decompression across different threads of GPU, the mesh need to be uniformly segmented so that the number of faces of each patch is balanced. If one or more patches have many more faces than the others, the whole parallel decompression performance will slow down. The algorithm to assign the centers of the patches is taken directly from previous work on geodesic remeshing[20] . A uniform sampling of points on a surface is obtained using a greedy farthest point sampling. A first point is picked then the geodesic distances to this point are computed. The Fast Marching algorithm[21] which was presented for finding 2D paths or for 3D extension and improvements is used to find the farthest point. The farthest point is selected as the next sampling point, then distance map is updated using a local propagation. The geodesic farthest points are generated from the distance map respectively. The geodesic farthest point sampling procedure runs on the topology model, after that the sampled points will be refined to its nearest center of face. An example of center assignment is shown in Fig.3. 50 points are uniformly distributed on the surface of the Armadillo 3D mesh, and these points make the centers of the following clustering step.

Fig.3. Assignment of centers[19] . Using geodesic farthest point Fig.2. Distance in the topology model. The distance is defined

sampling on the topology model, 50 points are uniformly posi-

by the geodesic distance between the centers of faces.

tioned on the Armadillo 3D mesh.

Jie-Yi Zhao et al.: Mesh Segmentation for GPU Decompression

3.5

1113

Clustering the Faces

After the assignment of centers of the patches, we can cluster the faces using the distance between them and the centers. We use the distance defined in Subsection 3.3, and for each face of the model we find the nearest center point. As Fig.2 shows, face A is clustered into the patch in which point 1 is the center, because it has the shortest distance with point 1. Then we put it into the cluster that represents this center. This clustering procedure can be done with massive number of threads of the GPU. The clustering procedure can also run in multi-level to make the segmentation more flexible and accurate. For instance, if we want to partition the mesh into M patches, we can first assign M × N centers where N is an integer. Then we can use clustering algorithm such as k-Means or Affinity Propagation[22] to further cluster the faces which will make higher correctness and quality of the segmentation. Fig.4 shows the partition example of the Armadillo 3D model in 50 clusters.

Fig.5. Refinement of faces. The faces on the border of 2 partitions need to be swapped, while those on the border of 3 partitions need not.

neighbor faces belonging to more than two patches, there is no need to swap the patch of this face because the number of boundary vertices will not decrease in this circumstance. Fig.6 shows the partition result of a 3D mesh before and after the refinement of faces on the jagged boundary.

Fig.6. Refinement of faces. The faces on the jagged boundary Fig.4. Clustering result of the Armadillo 3D model. It is partitioned into 50 patches .

3.6

Refinement of Faces

After the clustering step, the mesh is partitioned into several patches. Because the clustering only considers the distance between the faces and centers, the patches are likely to have jagged boundary. We need to refine such faces so that the replicate vertices can be further decreased. As Fig.5 shows, if a face of one patch has two neighbor faces that belongs to another patch, we will swap the patch it belongs to. In the swapping procedure we compare the indices of the two patches to avoid conflicts. If the index of the original patch is larger than the other, we will swap the face; otherwise we will not. For the faces on the jagged boundary that have

need to be swapped in order to reduce replicate vertices.

3.7

Storage of Boundary Vertices

The goal of our algorithm is to generate mesh that is equivalent to the original one after parallel decompression. The patches need to connect with each other, accordingly the overall boundary information needs to be stored. We store the boundary vertices in a separate array. For each patch of the segmentation, the boundary vertices are stored in format of their indices, while the other vertices are stored directly with their vertex coordinates. When parallel decompression procedure runs, the boundary vertices are directly indexed, while other vertices are indexed after a prefix-sum operation[23] . In this situation, the indices of the boundary vertices will be stored two more times, which will make the result mesh larger than the initial one before segmentation.

1114

4 4.1

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

Determination of Segmentation Strategy Partition Number

As the size of each partition grows, the ratio of redundant data will be smaller. We can make various strategy to determine the partition number, and for different meshes and purposes we can choose to optimize for speed or for size. In CUDA programming structure, the recent NVidia GTX580 GPU can run 65 536 × 65 536 × 512 threads simultaneously in theory. However, the NVIDIA GeForce GTX 580 GPU has 16 streaming multiprocessors (SM), each SM contains 32 CUDA cores and each CUDA core can run 48 threads, so the peak thread of GTX580 is 24 576. Because of the memory latency and limited size of graphics memory, the optimal value is in most cases much lower than 24 576. The OPENCL devices such as ATI RadeonHD series graphics card are similar. We should also take low-end GPUs into account, and the peak threads of these GPUs are much lower than GTX580. In general, the partition number should be close to multiples of 512, and as large as possible to increase the decompression performance. For applications using this algorithm, users may choose speed-first or size-first, and they may also specify the characteristic of GPUs they are going to run on. The number of threads of GPUs will also affect the best partition number. 4.2

Evaluation of Redundant Data

As mentioned above, the indices of boundary vertices need to be stored twice or more if the mesh is segmented into many partitions. Assume every partition has a hexagon shape as shown in Fig.7. As √ a hexagon if the partition has P faces, it will have 6P boundary vertices, and six of the boundary vertices will have three neighboring parti√ tions. So for a hexagon partition the indices for 6P

vertices will be√stored twice and six vertices stored three times, that is 6P × 2 + 18 additional indices. If there are F faces in p the whole mesh, there will be F/P partitions, and 2F/ P/6 + 18 × F/P additional indices are needed to store the replicate vertices. As the vertex number p V is merely F/2 for triangle meshes, there will be 4 6/P + 36/P additional indices for each vertex in triangle meshes. So we can estimate the situation for common cases, as the number of faces of each partition grows, the bpv (bits per vertex) value for redundant vertices will decrease. When there are hundreds of thousands faces on each partitions, the bpv value to store redundant vertices can be neglected. What is more, there are many algorithms to compress indices for the redundant vertices. However, for nowadays graphics cards and 3D models, the graphics memory is quite limited. For 3D models that are segmented into many partitions, the face number of each partition is relatively small. 4.3

Large Model of Several Objects

For large models made of several objects, the strategy is similar to that of one mesh. If there are a few partitions that are less than the others, the overall performances will not be affected. But if there are a few partitions that are extremely larger than the others, the performance will decrease much. So for large models made of several objects, the overall partition number will be calculated, then the partition number for each mesh is evaluated. The partition process is done on each mesh in order to balance the number of patches. If the partition number of the large model is S, the partition number is N , then each partition has S/N triangles. So for a mesh that has M faces, the partition number of it will be b(M × N/S)c. However, the final partition number may not exactly be the premeditated number, which will have little effect on the overall performance. Fig.8 shows an example that three meshes

Fig.7. Repeated vertices. If a mesh is segmented into several partitions, there will be redundant vertices whose indices will be stored twice or more times. A hexagon shape case is evaluated

Fig.8. Large model made of several objects. The clustering result

to demonstrate such situation.

of a 3D scene, which has three models.

Jie-Yi Zhao et al.: Mesh Segmentation for GPU Decompression

compose a large 3D model which is segmented into multiple patches. 4.4

Special Mesh Topology

A potential problem with the current patch decomposition method is whether all the patches can have disk topology. Nevertheless, the number of patches is large and the input mesh is usually simple and also transformed into topology model. Also for multi-thread decompression, the small irregular patches will not affect the overall performance. Connectivity-based compression algorithms such as the Edgebreaker algorithm also can deal with special cases. The algorithm can be modified to fit manifold meshes with holes and handles, non-triangle meshes, and non-manifold meshes[24] . In order to deal with holes and non-manifold situations efficiently, there should be additional preprocessing and post-processing procedures. We can convert non-manifold situations to manifold ones by replicating edges and vertices. Meshes with holes are treated specially in these connectivity-based compression algorithms, and each additional hole is plugged by a fan of dummy triangles incident upon a dummy vertex. The IDs of the dummy vertices should be encoded, and after decompression the dummy vertices and their incident triangles are removed. 4.5

Acceleration of Mesh Segmentation

Our approach mainly focuses on the speed of decompression. However, the acceleration of the segmentation and compression steps can also be taken into account. The computation of geodesic distance is accurate but not that efficient. Simply connecting the center of each face with its vertices and using the Dijkstra shortest path algorithm instead can accelerate this computation much. When using Dijkstra distance on large 3D models, the result is not too much different but the segmentation speed is much faster. In the clustering and farthest geodesic distance sampling steps, the performance can also be improved using multi-threads in GPU, because the operation on every patch does not affect each other and these operations can run in parallel. After segmentation, the compression of each patch can also execute in parallel. 5

Implementation and Performance

In this section, we describe our implementation of the compression and decompression on GPU, and highlight the performance of our algorithm on various benchmarks. We use Matlab 2010 for the center assignment and

1115

Visual C++ for partitioning and lateral processes. We use CUDA toolkit 4.0 as the development environment for GPU. We use NVIDIA Visual Profiler to compute the kernel execution time, the data input/output time between GPU and the host memory. The sequential version of decompression runs on a standard PC (AMD PhenomII 2.8 Ghz CPU with 6 cores, though we use a single core for the CPU-based Edgebreaker algorithm, 12 GB RAM). 5.1

Compression

After the connectivity segmentation, the original mesh is partitioned into many patches. The segmentation result is stored in a specified file format, so that there are many patches in one file. Then for each patch, we use the Edgebreaker algorithm to compress its connectivity information. The Edgebreaker algorithm[6] encodes the connectivity of triangle meshes homeomorphic to a sphere with a guaranteed 2 bits per triangle or less. The compression produces an op-code describing the topological relation between the current triangle and the boundary of the remaining part of the mesh. Each triangle of the mesh is visited in a depth-first order using five different operations called C, L, E, R, and S. Each triangle is labeled according to the operation that processes it. The resulting CLERS string is a compact encoding of the connectivity of the mesh. The operations of Edgebreaker encoding algorithm is shown in Fig.9.

Fig.9. Edgebreaker compression. In the encoding algorithm, the CLERS string is used to represent the connectivity of the mesh.

The whole compression procedure runs on GPU and the execution of Edgebreaker compression of each patch runs parallel across different threads. 5.2

Decompression

We use the Edgebreaker decoding algorithm to decompress the compressed mesh file. The decompression procedure of each patch runs on a single thread of the GPU. Fig.10 shows an example of GPU decompression procedure, where Edgebreaker runs on each single partition of the Lucy 3D model. After the decompression procedure, we use the prefix-sum operator[23] to mark the vertices, so that

1116

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

gets higher when there are more partitions, and we can get up to 48.5x speedup when the Dragon model is partitioned into 8 192 patches. Because the unprocessed Happy Buddha is randomly partitioned and no jagged boundaries are swapped, the acceleration rate of it is much lower. The acceleration rate of the scene benchmark is lower than the 3D models with similar size.

Fig.10. Edgebreaker decompression. A large 3D model is partitioned into many patches. The Edgebreaker algorithm runs parallel on each patch on GPU.

different patches and faces of the mesh can be combined into a whole mesh. The state of the decompression procedure on GPU is shown in Fig.11, and the final result is the whole mesh.

Fig.12. Performance. Acceleration rate of the parallel decompression algorithm on GPU over the sequential algorithm on CPU. Happy (*) means unprocessed Happy Buddha 3D model.

The model files are stored in text format with only vertex coordinates and face indices while the compressed files are stored with vertex coordinates and corner operators for each patch. Table 1 shows the face number of the models, the size of the compressed data resulting from the sequential Edgebreaker algorithm, the parallel algorithm with 512, 2 048 and 8 192 patches. The result of the unprocessed Happy Buddha model is less optimized than the processed one. The vertex number of these models are about half the face number. Table 1. Comparison of Size of Compressed Data for the Benchmarks Using Different Number of Segmentation

Fig.11. Decompression procedure on GPU. Each thread handles a patch of the mesh, and the patches are combined into a whole mesh using prefix-sum operator.

5.3

Performance

We have tested several 3D models for segmentation and decompression, including the 41.6 MB Happy Buddha model (Happy), the 162MB Dragon model, 428 MB Statue model and a 355 MB scene model (Scene) consists of six models. The acceleration rate is measured only about the decompression procedure on the GPU. Fig.12 shows the acceleration rate of our parallel decompression over the sequential CPU Edgebreaker algorithm. The speedup

3D Model

Face Sequential Parallel Algorithm Number Edgebreaker 512 2 048 8 192 (M) Algorithm (MB) (MB) (MB) (MB) Happy (41.6 MB) 1.1 6.8 7.0 7.2 Happy (Unprocessed) 1.1 6.8 7.6 8.5 Dragon (162 MB) 3.6 20.5 21.6 22.9 25.4 Statue (428 MB) 10.0 44.9 47.1 49.2 52.7 Scene (355 MB) 8.4 37.8 40.1 42.9 46.1

For the 3D models segmented into 8 192 partitions, there are about 20% increase in the compressed size. This is because each patch has relatively too few faces. As mentioned in Subsection 4.2, if the patches have hundreds of thousands of faces, the influence of repeated vertices can be neglected.

Jie-Yi Zhao et al.: Mesh Segmentation for GPU Decompression

6

Comparison and Analysis

Comparing with the sequential Edgebreaker decompression algorithm, we can get up to 48x decompressing speedup which can make further benefit for time sensitive uses. However, the segmentation procedure is relatively slow compared with that of decompression. We used several minutes to partition and compress these large 3D models. 6.1

Analysis

Our algorithm maps well to current GPUs and we have evaluated its performance and obtained high acceleration rate. The benefits of our approach include: • By mapping the decompressing process to GPU’s architecture, our algorithm can fully exploit the parallelism on commodity GPUs. • The performance scales almost linearly with the number of patches. • The data upload and download time from GPU to host memory is relatively a small part of the whole time for large 3D models. If the decompression result is directly used in GPU, the decomposition will get further benefit. • Because the large 3D models are segmented into small patches, the partition and compression results can be used for out-of-core decompression for computers that do not have enough GPU memory. A typical situation of good and bad cases of a single patch is shown in Fig.13, in both √ cases there are P faces. In the left case there are 6P boundary vertices, while in the right case there are P + 2 boundary vertices.

1117

such as a single patch that has many more faces than other patches, the decompression speed will also severely slow down. 6.2

Limitations

Our approach has some limitations. • Though the decompression speed is accelerated much, the segmentation step is relatively slow due to the center assignment and geodesic distance computing process. • The size of the compressed data will increase compared with the sequential algorithms as there are many replicate boundary vertices whose indices will be stored twice. But this will go better when the size of each patch grows larger. • The segmented patches are meaningless. If we can generate more meaningful partitions, the result will be more convincing. However, we may not get meaningfulness and balance together in most cases. 7

Conclusions and Future Work

We presented an algorithm for parallel decompression of mesh connectivity on GPU. Our formulation focuses on the average segmentation of mesh connectivity for parallel connectivity decompression, which needs less boundary vertices replication and balanced face number for each patch. Our approach is flexible and the decompression procedure maps well to commodity GPUs. In practice, our algorithm can improve the performance of mesh decompression on current GPU architectures. We observed up to 48x speedups over the sequential CPU algorithm. There are many avenues for future work. We would like to make the segmentation step faster. We would consider improvement that can get both meaningfulness and balance in the partitioning. What is more, we can extend our partitioning approach to accelerate other compression and decompression algorithms. References

Fig.13. Analysis of partitioning. Example of good and bad segmentation results for a single patch.

If the number of faces of patches is not balanced,

[1] Ahn J K, Lee D Y, Ahn M, Kim C S. R-D optimized progressive compression of 3D meshes using prioritized gate selection and curvature prediction. The Visual Computer, 2011, 27(6/8): 769-779. [2] Lee H, Lavou G, Dupont F. Rate-distortion optimization for progressive compression of 3D mesh with color attributes. The Visual Computer, 2012, 28(2): 137-153. [3] Turan G. On the succinct representation of graphs. Discrete Applied Mathematics, 1984, 8(3): 289-294. [4] Keeler K, Westbrook J. Shortencodings of planar graphs and maps. Discrete Applied Mathematics, 1995, 58(3): 239-252. [5] Taubin G, Rossignac J. Geometric compression through topological surgery. ACM Transactions on Graphics, 1998, 17(2): 84-115. [6] Rossignac J. Edgebreaker: Connectivity compression for triangle meshes. IEEE Transactions on Visualization and Com-

1118 puter Graphics, 1999, 5(1): 47-61. [7] Gumhold S, Straßer W. Real time compression of triangle mesh connectivity. In Proc. the 25th SIGGRAPH, Jul. 1998, pp.133-140. [8] Gurung T, Luffel M, Lindstrom P, Rossignac J. LR: Compact connectivity representation for triangle meshes. ACM Transactions on Graphics, 2011, 30(4), Article No.67. [9] Yamauchi H, Lee S, Lee Y, Ohtake Y, Belyaev A, Seidel H P. Feature sensitive mesh segmentation with mean shift. In Proc. SMI, Jun. 2005, pp.238-245. [10] Agathos A, Pratikakis I, Perantonis S, Sapidis N. Protrusionoriented 3D mesh segmentation. The Visual Computer, 2010, 26(1): 63-81. [11] Lai Y K, Hu S M, Martin R R, Rosin P L. Rapid and effective segmentation of 3D models using random walks. Computer Aided Geometric Design, 2009, 26(6): 665-679. [12] Lai Y K, Zhou Q Y, Hu S M, Martin R R. Feature sensitive mesh segmentation. In Proc. SPM, Jun. 2006, pp.17-25. [13] Zhang J Y, Zheng J M, Wu C L, Cai J F. Variational mesh decomposition. ACM Transactions on Graphics, 2012, 31(3), Article No.21. [14] Liu F, Hua W, Bao H J. GPU-based dynamic quad stream for forest rendering. Science China Information Sciences, 2010, 53(8): 1539-1545. [15] Xu K, Ma L Q, Ren B, Wang R, Hu S M. Interactive hair rendering and appearance editing under environment lighting. ACM Transactions on Graphics, 2011, 30(6), Article No.173. [16] Xu K, Jia Y T, Fu H, Hu S M, Tai C L. Spherical piecewise constant basis functions for all-frequency precomputed radiance transfer. IEEE Transactions on Visualization and Computer Graphics, 2008, 14(2): 454-467. [17] Yoon S E, Lindstrom P. Random-accessible compressed triangle meshes. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(6): 1536-1543. [18] Xin S Q, Wang G J. Improving Chen and Han’s algorithm on the discrete geodesic problem. ACM Transactions on Graphics, 2009, 28(4), Article No.104. [19] Zhao J Y, Tang M, Tong R F. Mesh segmentation for parallel decompression on GPU. In Proc. CVM, Nov. 2012. [20] Peyr´ e G, Cohen L D. Geodesic remeshing using front propagation. International Journal of Computer Vision, 2006, 69(1): 145-156. [21] Sethian J A. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge, England: Cambridge University Press, 2000. [22] Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976.

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6 [23] Harris M, Sengupta S, Owens J D. Scan primitives for GPU computing. In Proc. HWWS, Aug. 2007. [24] Rossignac J. 3D compression made simple: Edgebreaker with zip&wrap on a corner-table. In Proc. SMI, May 2001, pp.278283.

Jie-Yi Zhao is a Ph.D. candidate in the Department of Computer Science, Zhejiang University, China. He received his B.Eng. degree of computer engineering from Chu Kochen Honors College, Zhejiang University in 2005. His research interests include computer graphics, virtual reality and parallel computing on GPU. Min Tang received his B.Sc. degree in 1994 and Ph.D. degree in 1999 from Department of Computer Science and Engineering of Zhejiang University, China. He was a visiting scholar at Department of Computer Science, Wichita State University in 2006, and UNC-Chapel Hill in 2008. Currently, he is an associate professor at Zhejiang University, China. His research interests include collision detection, parallel computing, GPU-based real-time rendering, volume rendering algorithm. Ruo-Feng Tong is a member of CCF. He received the B.Sc. degree from Department of Mathematics, Fudan University, China in 1991 and Ph.D. degree from Department of Mathematics, Zhejiang University, China in 1996. He continued his research as a postdoctoral researcher at Intelligent Systems and Modeling Laboratory, Hiroshima University, Japan. Currently, he is a professor of Department of Computer Science and Engineering, Zhejiang University. His research interests include computer graphics and CAD, medical image reconstruction, virtual reality.

Suggest Documents