IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1073
PAPER
3D Animation Compression Using Affine Transformation Matrix and Principal Component Analysis Pai-Feng LEE†a) , Chi-Kang KAO† , Nonmembers, Juin-Ling TSENG†† , Student Member, Bin-Shyan JONG††† , and Tsong-Wuu LIN†††† , Nonmembers
SUMMARY This paper investigates the use of the affine transformation matrix when employing principal component analysis (PCA) to compress the data of 3D animation models. Satisfactory results were achieved for the common 3D models by using PCA because it can simplify several related variables to a few independent main factors, in addition to making the animation identical to the original by using linear combinations. The selection of the principal component factor (also known as the base) is still a subject for further research. Selecting a large number of bases could improve the precision of the animation and reduce distortion for a large data volume. Hence, a formula is required for base selection. This study develops an automatic PCA selection method, which includes the selection of suitable bases and a PCA separately on the three axes to select the number of suitable bases for each axis. PCA is more suitable for animation models for apparent stationary movement. If the original animation model is integrated with transformation movements such as translation, rotation, and scaling (RTS), the resulting animation model will have a greater distortion in the case of the same base vector with regard to apparent stationary movement. This paper is the first to extract the model movement characteristics using the affine transformation matrix and then to compress 3D animation using PCA. The affine transformation matrix can record the changes in the geometric transformation by using 4 × 4 matrices. The transformed model can eliminate the influences of geometric transformations with the animation model normalized to a limited space. Subsequently, by using PCA, the most suitable base vector (variance) can be selected more precisely. key words: computer animation, mesh decomposition, principal component analysis, affine transformation matrix
1. Introduction The advancements in computer graphics technology has given rise to an increase in the number of high-quality 3D animation models. However, a single 3D static model requires a large storage space for the vertex information and the adjacency relationship of triangles (vertex+connectivity). The amount of storage space required would be even more voluminous if the model changed with time ((frame×vertex)+connectivity). Therefore, the model must be simplified in order to reduce the required storage space. A few effective methods [6], [18] have been proManuscript received November 24, 2006. Manuscript revised March 12, 2007. † The authors are with the Dept. of Electronic Engineering, Chung Yuan Christian University, Taiwan. †† The author is with the Chin Min Institute of Technology, Taiwan. ††† The author is with the Dept. of Information and Computer Engineering, Chung Yuan Christian University, Taiwan. †††† The author is with the Dept. of Computer and Information Science, SooChow University, Taiwan. a) E-mail:
[email protected] DOI: 10.1093/ietisy/e90–d.7.1073
posed in this field. The direct application of the simplification algorithm of the static 3D model [18] to the 3D animation model could achieve excellent reduction in the storage space; however, the overall storage space could be incredibly large since the adjacency relationship of each frame will have to be recorded. On the other hand, if all the frames use the same reduction sequence [6], the resulting animation might not be able to retain the characteristics of the model despite the retention of the adjacency relationship of the triangles. Since a direct reduction in the number of triangles could result in the inconsistency of their adjacency relationship, the compression of the storage space is proposed as an alternative method to reduce the storage space of the 3D animation model. Using the strength of the adjacency relationship of the triangles, only the vertex coordinates of the continuous frames are analysed, and the storage space of the vertices is compressed by geometric prediction method [5], [15], wavelet transformation [8], [10], or PCA [4], [14], [16], [20]–[23], [25], [27]. Although such a method requires considerable calculation, it provides a desirable level of compression. This paper mainly concerns itself with the use of PCA. The PCA technique originated from statistics; it simplifies several related data into a few independent main factors, which can be integrated close to the original data by using linear combinations. Therefore, PCA can select relatively fewer base vectors to compress the 3D animation data. However, in order to geometrically transform 3D animation models in practice, PCA cannot maintain certain distortion due to uncertain reference space. Hence, this paper proposes a new 3D animation compression method for dealing with geometrically transformed animation models by first using the affine transformation matrix to normalize the movement animation. Subsequently, the approximate animation is achieved by PCA compression. Therefore, the PCA distortion caused by geometric transformation will be effectively avoided since the distortion will be confined to an acceptable range. The selection of the base is still a subject of research. This study develops an automatic PCA selection method that employs the separated PCA axes to select the number of suitable bases for each axis.
c 2007 The Institute of Electronics, Information and Communication Engineers Copyright
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1074
2. Related Work In this section, we review the most promising techniques developed recently within the emerging field of dynamic mesh compression. The underlying principles of the different approaches are presented and their advantages and limitations are analysed and discussed. Vertex-based predictive techniques: The Interpolation Compression scheme [5] was recently proposed as an interpolation procedure to exploit the redundancies between successive key frames. The principle consists of reducing the amount of data by sub-sampling the initial sequence of key frames. The Dynapack approach [15] exploits a similar prediction similar to [5] involving more elaborate local spatiotemporal predictors. In this study, the connectivity compression scheme described in [13] guides the traversal of the mesh vertices. The Dynapack approach [15] proposes two different predictors - the so-called Extended Lorenzo Predictor (ELP) and Replica. The ELP perfectly predicts translations, while the replica predictor further predicts rigidbody motions and uniform scaling transformations. The above studies attempted to predict all the vertex positions or displacements and, finally, to encode the residual errors. The predictive approaches possess the advantages of simplicity and low computational costs, making them well adapted for real-time decoding applications. However, being based on a deterministic traversal of mesh vertices, these works cannot support more advanced functionalities such as progressive transmission and scalable rendering. This weakness is overcome by using multiresolution representations, which are based on the wavelet techniques. Wavelet-based approaches: Wavelet techniques have been widely used in still image and progressive video compression applications. In [8], Briceno et al. presented an original approach. The technique was to project each frame onto a 2D image and then encode the entire 2D image sequence with some well-known video techniques. Recently, a wavelet-based compression method was presented by Guskov and Khodakovsky [10]. They proposed to apply a multi-resolution analysis to the frames to exploit the spatial coherence of individual frames. Each frame was consequently transformed into several sets of details, namely, the wavelet coefficients, and the resulting details were progressively encoded with a predictive coding scheme. The main challenge of the above approach concerns with the wavelet filter coefficients, which represent the parametric information that is determined and optimized for the first frame. In order to ensure good compression results, these parameters should be adequate for all the frames in the sequence, which is not always the case in practice. To fully exploit the multi-resolution character of the representation, this approach is specifically adapted for large meshes for which the maximum amount of details can be obtained.
Time-varying geometries: Studies concerning the compression of animated sequences of 3D meshes with a fixed connectivity were carried out by [1], [12], [19] applying the affine transformations. In [12], Lengyel proposed to split a mesh into several sub-meshes, and considered each submesh to have a rigid-body motion. Each sub-mesh will be transformed independently and linearly combined to form a predictor. Each frame in the animation is represented as the sum of a predictor and a residual. The residual and the transformation coefficients are quantized for compression. A novel approach to modelling deformable objects was proposed by M¨uller et al. [19]. There are three deformation modes that meshless deformation allows: Rigid, Linear and Quadratic. When the transformation matrix only contains rotation and translation operations then we have a rigid body simulator. When the transformation matrix also allows shearing to occur we have linear deformations and finally, when the transformation matrix also allows bending and twisting we have what is called quadratic deformations. Shamir and Pascucci [1] proposed an approach based on the affine transformations, simultaneously exploiting a multi-resolution approach. Their technique was to find the best affine transformations between each frame and the first one, and to encode the temporal prediction errors. The solution allows for both temporal and spatial level of details to be combined in an efficient manner. The separation of the low and high frequency temporal information makes it possible to create very fast coarse updates in the temporal dimension. By using a temporal directed acyclic graph (TDAG) structure to encode the residual, it can be adaptively refined for obtaining greater details in the spatial dimension. Shamir and Pascucci obtained different and good level-ofdetail (LOD) effects by applying different updates (e.g. spatial and temporal); however, their encoding is not very compact, and it is actually larger than the original mesh size. Their study is complementary to Lengyel’s [12] and therefore it is possible to combine these two studies to produce a more compact encoding. PCA-based approaches: PCA [11] is a powerful tool that has become a widely accepted animation technique for representing motion as the linear sum of principal components [4], [14], [16], [20], [22], [23], [25], [27]. The first PCA-based approach for 3D animation compression was introduced by Alexa and M¨uller [16]. The vertex locations at each frame (columns) are decomposed into basis and weights. Compression is achieved by discarding the less important bases. The PCA approach is efficient only for long animation sequences of small meshes for which the number of frames f is much larger than the number of vertices n. When this condition is not satisfied, the PCA approach yields poor compression results. On the other hand, the singular value decomposition (SVD) is usually computed using a batch O(n f 2 + n2 f + f 3 ) time algorithm [7], which means that all the data must be processed simultaneously, and the computational complexity increases as the cube of the number of frames. His method is computationally costly, particularly for large frames; it also suffers from serious distor-
LEE et al.: 3D ANIMATION COMPRESSION USING AFFINE TRANSFORMATION MATRIX AND PRINCIPAL COMPONENT ANALYSIS
1075
tions if important bases are discarded. However, no formal method is available for base selection. Karni and Gotsman enhanced the PCA approach by further exploiting the temporal coherence and finally encoding the PCA coefficients using a second-order linear predictive coding (LPC) technique [27]. Combining the PCA with LPC techniques makes it possible to better exploit the temporal redundancies between successive frames, and the compression ratio can reach a value that is greater 1.7 times than that of PCA alone. For the animation data, Sattler et al. [20] used a variation of the PCA to segment the mesh into parts that are coherent over time prior to compression. The segmentation is achieved applying the modified Lloyd algorithm described in [21] to the trajectories of the mesh vertices. The obtained clusters are compressed independently by applying the PCA technique. However, the analytical results presented by Sattler et al. [20] and Lee et al [25] indicated that the cutting boundary may deviate from the deformable regions. Our approach is based on the affine transformation matrix and PCA. We find that the PCA compression is sensitive to the movement of the animation model. If the original animation model is integrated with the transformation movements such as translation, rotation, and scaling (RTS), the resulting animation model will have a greater distortion in the case of the same base vector with regard to apparent stationary movement. This paper is the first to extract the model movement characteristics using the affine transformation matrix [1] and normalize the animation model, making it suitable for PCA techniques [16].
mula can be revised as follows: v1,1 · · · v f,1 v 1,2 · · · v f,2 A = [V × F] = . .. .. ··· . v1,n · · · v f,n
V is the set of vertices and F is the set of frames. S denotes an n-by-f diagonal matrix, in which the diagonal values equal the singular values arranged in decreasing order. B is an n-by-n unitary matrix representing the base vector of the corresponding singular value (known as the base). U T is an f -by-f unitary matrix representing the frame information of the corresponding singular value. Such a factorization is termed the SVD of A. Mathematically, PCA signifies the reduction in dimensions. Determine the subspace of dimension k in the linear space of the original data and then simulate the original data by using the k base vectors for the linear combinations. In other words, the research on the PCA of 3D animation involves the identification of k roots with the highest eigenvalues and the sum of the principal components approach the original data on the basis of Ax = λx to reduce the storage space while containing the error. To investigate the error, four error measurement methods are proposed. The average distortion D f , which is defined as the L2-norm of the original (Xti + Yti + Zti ), and the approximate vertex position (Xti + Yti + Zti ) of frames, and defined as follows: Df =
f t=1
3. Suitable Bases Selection for PCA The most important part in PCA is determining the variables with the highest variation to explain most of the variation in the original data while using the fewest possible variables. The original data with high pertinence are transformed into mutually independent variables, which are also known as the principle components, and these function as comprehensive indices to explain the data. Consequently, the most important process of PCA is selecting and transforming the original variables into the comprehensive indices. According to Alexa and M¨uller [16], who first proposed the compression of 3D animation by PCA in 2000, all frames should be analysed to determine a frame B1 that has the minimal mean square error among all the frames. This frame is designated as the reference frame. Find the next reference frame Bk+1 that conforms to A − ki=1 Bi and has the smallest mean square error. Determine A = B1 + B2 + B3 + · · · + B f , where f is the number of frames in the animation and the mean square error of Bi is less than Bi+1 . With regard to mathematical calculations, the SVD can conform to the above requirements. Apart from B1 , the mean square error of the other frames can be acquired from their singular values. A higher singular value indicates a smaller mean square error. Therefore, the 3D animation for-
T = B · S · U
=
f
Dt / f n i=1
(Xti −Xti )2 + (Yti −Yti )2 + (Zti −Zti )2 (diagonal o f bounding box)t
t=1
/f
A distortion factor Da , which is similar to that of [27], which measures the quality of the reconstruction of the vertex position for the complete animation. Da is defined as follows: Da = 100
A − A A − E(A)
where A denotes a matrix with dimensions V×F and con denotes the same taining the original animation sequence, A matrix following the compression and reconstruction stage, and E(A) is a matrix where each column comprises the average vertex positions for all the frames. The distortion factor Dn is defined as follows: Dn =
f n
(1 − cos−1 (nti − nti ))/ f × n
t=1 i=1
where Dn denotes the difference in the normal vector, nti denotes the normal vector of the original triangle i of frame t, and nti is the normal vector of the approximate triangle i of frame t. Because the 3D animation model presents a series of frames and each frame is formed by a triangle mesh, the normal difference between the reconstructed model and the
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1076
original models can be used to display the variation in the model shape. Besides using D f and Da to measure errors in coordinate changes, we also used Dn to measure the error in the curvature. A discrete shape operator (DSO) can be used to detect the surface variation of the 3D model [2], including the variation in curvature [26] and torsion [3]. In this paper, the D s is proposed to measure the distortion in the animation features. Suppose the adjacent vertices of vertex v are represented as {v1 , v2 ,· · ·, vk }, N(v) is the unit normal vector on v, and N(v1) , N(v2) ,· · ·, and N(vk) individually represent the unit normal vectors of vertices v1 ,v2 ,· · ·, and vk . Therefore, the DSO of the vertex v can be obtained as: S (v) =
k
(N(vi ) − N(v))/
i=1
k
(vi − v)
i=1
Further, the distortion of animation features D s is defined as follows: DS =
f n
(S (vti ) − S (vti ))
t=1 i=1
where S(vti ) denotes the DSO of the approximate vertex v of frame t and S(vti ) denotes the DSO of the original vertex v of frame t. The compression ratio C is expressed in terms of bits per vertex per frame (bpvf ). Compression ratio C is defined as [14]: C = B0 /Bc where Bc and B0 represent the sizes of the compressed stream and the original sequence, respectively. B0 and Bc are described in kilobytes (kb) and are measured as follows: B0 = f × 3n × rbytes × 2−10 where rbytes = 4 bytes, and represents the floating point.
Fig. 1
In order to be able to perform an objective comparison, we have considered as data set the animation sequences used by the majority of studies reported in the literature. Table 1 shows the properties of model. Figures 1 and 2 show the PCA method applied to the Chicken and Cow animation using different number of bases. The selection of the principal component factor remains a subject of further research. Table 2 shows the compression ratios(c) for the Chicken and Cow animation. Selecting a large number of bases could improve the precision of the animation and reduce distortion when maintaining a large data volume. Hence, a formula for base selection is required. The number of the suitable bases can be calculated automatically using the singular values. The diagonal matrices S x , S y , and S z are obtained by analysing the X, Y and Z axes. The matrices are then transformed into unidimensional vectors [S x1 , S x2 , · · · , S x f ], [S y1 , S y2 , · · · , S y f ], and [S z1 , S z2 , · · · , S z f ]. The L2-norm ( S 2xi + S yi2 + S zi2 , i = 1 ∼ f ) of the three vectors are then calculated. The logarithms of the L2-norm are then obtained to represent the contribution E = [e1 , e2 , · · ·], where ei are arranged in descending order. Finally, k is obtained from the vector E, ensuring that ek ≤ 0 and ek−1 > 0. When the log of the L2-norm is less than zero, it indicates that the corresponding singular value is less than one. A singular value that is less than one is multiplied by the corresponding base vector and may make the characteristics represented by the base vector less obvious. This study adopts the use of logarithm Table 1 Model Horse Cow Chicken
PCA applied to the Chicken animation.
Properties of the animated mesh sequence. Vertices 8431 2904 3030
Triangles 16843 5804 5664
Frames 192 204 400
Size(kb) 19425 7108 14544
LEE et al.: 3D ANIMATION COMPRESSION USING AFFINE TRANSFORMATION MATRIX AND PRINCIPAL COMPONENT ANALYSIS
1077
Fig. 2 Table 2 Chicken PCA size encoding (kb) original 14544 30 base 1138.8 10 base 379.6 5 base 189.8 3 base 113.88
PCA applied to the Cow animation.
Compression ratios(c). ratio (c) 12.77 38.31 76.63 127.71
PCA encoding original 30 base 10 base 5 base 3 base
Cow size (kb) 7108 1069.92 356.64 178.32 106.99
ratio (c) 6.64 19.93 39.86 66.43
≥ 0 as the principle for determining the number of bases in order to retain the geometric characteristics of the animation model. The experiments demonstrated that the error could be reduced to a satisfactory level when ek−1 > 0. If the number of bases obtained are used with the condition ek ≤ 0, the influence is insignificant although the error can be reduced. Figure 3 illustrates the relationship between the base selection and the error. Since the number of the selected principal component bases can influence the reduction in data volume and the distortion associated with the animation model, this study measures the contribution e of each base based on the L2-
norm. The bases with contributions less than zero are omitted. In the case of the Chicken animation, the contribution of the 42 bases is greater than zero. However, the number of suitable bases for each axis depends on the variation on each coordinate axis; the combined contribution of the axes E is separated to provide the contributions E x , Ey , Ez corresponding to each axis, and to the number of suitable bases for each axis is calculated separately. Figure 4 illustrates the relationship between the automatic base selection and the error. And in the case of the Chicken animation, the X-axis has 36 bases with contributions greater than zero; the Y-axis has 37 such bases and the Z-axis has 33, as also revealed in Fig. 4. The standard PCA uses a contribution ratio (Ek ) derived from the sum of absolute singular values as Ek = f ( kj=1 S j / i=1 S i ), where k and f denote the selected number of singular values and the total number of singular values, respectively, and S j is arranged in the descending order [24]. The select proportion is usually set to 90% or 95%, as defined by the user. It is difficult to select suitable bases to represent the model. To solve this problem, this study refers
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1078
Fig. 3 The red points indicate the automatically selected base number, and in practice, the improvement of the error is insignificant with the increase in base number.
Fig. 4 Our method automatically calculates the optimal number of bases and limits the error to a certain range.
to and amends the bases selection method of Kaiser [9], then proposes an algorithm using the logarithm of the L2-norm of singular values as the contribution value. Table 3 shows that the proposed method produces fewer errors than the methods in which Ek = 90% or 95%. Due to the irregular distribution of eigenvalues, the conventional method may select fewer bases and produce a larger distortion (the average distortion of Ek = 90% is 146.7 and the average distortion of Ek = 95% is 103.79). The proposed automatic base selection method produces an average distortion is 16.96, and selects approximately 97.55% sufficient bases on an average.
Table 3 Comparison between the proposed method and the methods in which Ek = 90% and 95%. Model Cow Df Chicken Df Horse Df
Our method X Y Z 9 9 9 38.216 36 37 33 1.9693 5 14 13 10.710
Ek = 90% Y Z 6 10 54.7017 1 5 1 295.512 4 2 3 89.8925
X 7
Ek = 95% X Y Z 11 12 18 21.9285 1 11 2 244.0055 7 4 5 45.4327
LEE et al.: 3D ANIMATION COMPRESSION USING AFFINE TRANSFORMATION MATRIX AND PRINCIPAL COMPONENT ANALYSIS
1079
= (V x +Vy +Vz )×base [base×base] base×frame 4. Exploration of Storage Requirements Using PCA The SVD can decompose a matrix into three matrices. In practice, matrix B (eigenvector) and matrix S (eigenvalue) are merged to reduce the memory storage. In this study, the 3D animation model was decomposed into components along the X, Y, and Z axes, and PCA was applied to each axis. The experimental results indicate that the error was less than that of the undecomposed model. The experimental procedure is summarized below. In the combined axes PCA, the components along the X, Y and Z axes are calculated together: A = B · S · UT Table 4 Storage requirements for the Chicken animation. With close distortion, our method reduces the memory storage. Model Size(kb) Df
Combined axes 61 62 2316(kb) 2354(kb) 2.0065 1.9292
Separated axes 36 + 37 + 33 1454(kb) 1.9693
If the number of the base vectors is k, then the data storage requirement is ((3n × k) + (k × f )) × rbytes × 2−10 (kb). The storage requirements for the separated axes PCA in the 3D animation model is calculated as follows: A x = B · S · U T = Ay = Az
= [V × base] [base × base] base × frame If the number of the base vectors is k, then the data storage requirement is 3((n×k)+(k × f ))×rbytes ×2−10 (kb). With close distortion, the proposed method can achieve the effect of data compression as shown in Table 4. These findings indicate that if PCA is performed in accordance with the X, Y, and Z axes, it can result in a reduced error as compared to the case when the joint calculations are performed. 5. Animation Model with Geometric Transformation Almost all of the 3D animation models developed thus far have characteristics of the apparent stationary movement. In
Fig. 5 Relation between the animation of the Horse model and that of the model with geometric transformation- Rotation, Translation, and Scaling (RTS).
Fig. 6
PCA applied to the Horse animation by using 3 bases.
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1080
other words, when the 3D animation model is played at gallop ahead, the point coordinates in 3D do not actually move forward although the moving effects of the object models can be displayed. Hence, although compression without significant deterioration can be achieved by analysing the apparent stationary movement 3D model using PCA, PCA cannot maintain a lower distortion in the case of the animation model obtained from 3D animation with geometric transformation. Consider the Horse animation as an example; it can be stated from analysing the animation model that in each frame, the X-axis is in the range {−0.121361∼0.123228}; the Y-axis, {−0.0199589∼0.862465}; and the Z-axis, {−0.79931∼0.530979}. Therefore, it is known that each frame is displayed as animation in gallop ahead, and the actual vertex coordinates move within the same bounding box. We believe that the real world 3D animation model should move in real space. Hence, the 3D animation model with geometric transformation is considered in this study since it not only includes movement changes but can also perform transformation movements in 3D space. We find that these 3D animation models with geometric tracks and uncertainly distributed in space are unfavorable for the PCA method to select the main factors, and they may lead to a comparatively larger distortion as shown in Fig. 5. Figure 6 shows the PCA method applied to the Horse animation. The result reveals that the PCA in the case of the apparent stationary movement model can result in distortion without significant deterioration, because the bounding box is immovable. 6. Affine Transformation Matrix By using the transformation matrix, the reference system can be transformed to produce the same effects as that of the geometrically transformed objects. Hence, we can use the space transformation method and the transformation matrix to produce animation models with movements. The transformation matrix allows various space transformations in order to realize rotation, translation, scaling, and various possible movement combinations. The transformation matrix can maintain the relativity between the coordinates in the reference system. Moreover, the necessary mathematical computations can be simplified by matrix representation as shown in the following method: v1,1 · · · v f,1
v1,2 · · · v f,2 A sports = transformation × . . .. .. · · · v1,n · · · v f,n Since almost all of the currently obtained animation models archives are apparent stationary movements, the vertex sets that have been determined can be regarded as being distributed in the same space. Thus, PCA selects suitable bases are selected for data compression. However, after the geometric transformation of the animation models, it can be
found that the reference space of the computation increases, making it impossible for the PCA to select suitable bases and resulting in increased distortion. The authors hope to carry out the coordinate normalization on the vertex coordinates after the transformation by the affine transformation matrix to increase the clustering of the vertex coordinates; this will enable PCA to select suitable bases as well as lower the distortion. There are many papers proposed object transformation methods [1], [12], [19], in this study, we adopt the method proposed by Shamir and Pascucci [1]. Each frame represents a set of model vertices, which is denoted as Ft , at a given time. To determine the relativity between two frames, we suppose that the relativity between Ft and Fr (reference frame) is given by an affine transformation matrix (At ) [1] and its equation is as follows: At × F r = F t Hence, the relativity equation can be inferred as follows: At Fr FrT = Ft FrT At = Ft FrT (Fr FrT )−1 Using Fr as the reference frame, the related affine transformation matrix (At ) of Ft and Fr for each frame can be obtained by the above calculation. This step is termed the normalization of the animation frame. The relative variation of a frame with Fr can be determined by multiplying the reverse matrix of At with Ft . The mathematical equation is as follows:
A−1 t × Ft = Ft
The suitable bases can be selected by performing PCA on the variation Ft after normalization. Finally, 3D animation can be achieved by reverting to At . The mathematical equation is as follows:
At × F t = F t The normalization of the moving frames can increase the clustering level of the vertices as well as enable PCA to select suitable bases that are more suitable and effectively reduce distortion. Table 5 shows the results of the animation model obtained by using the affine transformation matrix. The result of the normalization of the moving frames shows that the error rate can be maintained at a stable level regardless of the space transformations. It can be inferred from the distortion that using the PCA when the original animation with apparent stationary movement can produce compression effects without significant deterioration. However, with the application of geometric transformations, PCA will be ineffective in controlling distortion since the transformations will cause a larger distortion. Therefore, the normalization of the animation model before the execution of PCA can effectively reduce the distortion. Moreover, without the application of geometric transformation, normalization prior to PCA can further improve the distortion status slightly as compared to the case where PCA is directly performed.
LEE et al.: 3D ANIMATION COMPRESSION USING AFFINE TRANSFORMATION MATRIX AND PRINCIPAL COMPONENT ANALYSIS
1081 Table 5 Base 3+3+3
5+5+5
10 + 10 + 10
30 + 30 + 30
Df Da Dn Ds Df Da Dn Ds Df Da Dn Ds Df Da Dn Ds
PCA applied to the animation models with and without normalization.
Horse 70.0663 3.6422 8.4295 355.5 40.5873 1.8769 6.6918 265.5 17.2589 0.7724 3.9287 156.4 0.0009 0.000055 0.0026 0.0323
Normalization 67.2008 3.0976 9.4328 383.2 38.741 1.7594 6.7185 265.7 16.0301 0.732 3.9497 162.1 0.0011 0.000068 0.0031 0.0387
Horse+ RTS 288.782 13.0831 32.9352 1999.3 59.6986 2.9003 7.6798 401.6 22.8614 1.0567 4.8388 241.7 1.8116 0.0955 0.9996 43.6
Normalization 67.059 3.186 9.433 513.5 38.6643 1.8005 6.7187 356 16.0006 0.7501 3.9501 217.2 0.0054 0.0002 0.0257 0.3554
Chicken 117.9299 22.2296 34.2918 11133.1 62.3313 11.5205 25.806 7177.9 25.9256 4.6304 15.7928 3932.9 3.1492 0.5893 7.5826 672.6
Normalization 78.7473 14.4996 32.3714 8652.9 47.1925 8.2156 24.016 5720.5 20.7188 3.9433 14.5917 3432.5 3.0971 0.584 8.9194 742.5
Cow 172.9813 21.8438 48.4576 314.0 87.1811 11.3955 27.7377 150.9 31.2876 4.1767 15.1382 83.1 5.7364 0.7456 6.0509 32.3864
Normalization 87.9919 11.5525 28.5249 132.6 56.9566 7.6543 21.2731 108.3 23.3921 3.0606 14.5811 77.7 5.4382 0.709 5.8509 31.8301
7. Comparison of Representation Methods and Distortion for 3D Animation To produce 3D animation and compress data, we may use the following methods, as show in Fig. 7: 1. Multiply the original 3D model with the geometric transformation matrix and then use PCA. 2. Use PCA on the original 3D model and then multiply with the geometric transformation matrix. 3. Multiply the original 3D model with the geometric transformation matrix, normalize the animation to the same reference system and then use PCA. All the above methods can achieve data compression of the 3D animation models. However, by using method 1, the reference space will increase; this will not be favourable for PCA since it has to multiply the geometric transformation matrix first. Method 2 is used in most of the current papers. First, the animation-producing 3D models are compressed, and the compressed data is then multiplied with the preset track lines resulting in the production of 3D animation with different moving tracks at the same time. This method will produce results without significant deterioration in the sequential operation of 3D animation. This is because the original 3D model is intended for apparent stationary movement with a comparatively smaller reference space and is favorable for PCA to determine suitable bases. Moreover, only one model requires compression for setting up several 3D animations with different sizes and tracks. However, if method 2 is applied in the reversed engineering of the 3D animation, the reference space for the animation tracks produced earlier will increase in the application of PCA, leading to relatively increased distortion. In order to deal with this problem, this paper proposes method 3, which involves the normalization of frames with regard to the same axis; this enables PCA to select more suitable bases and provide reduced distortion. The method proposed in this paper is suitable for use
Fig. 7
Table 6
Production methods for 3D animation.
Distortion for 3 types of animation compressions.
Base 3+3+3
5+5+5
10 + 10 + 10
30 + 30 + 30
Error Df Da Dn Ds Df Da Dn Ds Df Da Dn Ds Df Da Dn Ds
Method 1 288.782 13.0831 32.9352 1248.9 59.6986 2.9003 7.6798 1018.8 22.8614 1.0567 4.8388 479.7 1.8116 0.0955 0.9996 196.3
Method 2 69.9207 3.7347 8.4297 476.4 40.5046 1.9353 6.692 355.8 17.2239 0.8011 3.9292 209.5 0.006 0.000093 0.0626 0.2275
Method 3 67.059 3.186 9.433 513.5 38.6643 1.8005 6.7187 356.1 16.0006 0.7501 3.9501 217.2 0.0054 0.0001 0.0257 0.3554
with a low-resolution equipment. We compare 3 types of animation compressions with the same bases using Horse animation; the error obtained is listed in Table 6. It can be observed from Table 5 and Table 6 that with respect to the data compression of animation models, an average of 70.0663 distortions can be obtained by applying PCA to the 3 bases of the original animation separately for
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1082
Fig. 8 Depiction of the Horse animation frames 42, 64, 73, 90, and 125 by using the 3 methods mentioned in the text. Method 3 obtains the best effect owing to the affine matrix.
the X, Y, and Z axes (including apparent stationary movement only). Method 1 first applies the geometric transformation to the original animation and then uses PCA. For a given base, the distortion can be found to increase to 288.782. Method 2 first applies PCA to the original model of apparent stationary movement and then adds transformation tracks after selecting 3 bases, leading to the reduction of distortion to 69.9207. Finally, as proved by the experiment, our method first turns the moving animation model to the normalization axis and then applies PCA. After selecting 3 bases, it transforms the model back to the original axis; this method can obtain a best distortion of 67.059. In Figure 8, the Horse animation displays using different methods. A comparatively better distortion status can be achieved by first normalizing the animation model with the transformation movement and the cost is ( f − 1) × 16 × rbytes × 2−10 (kb). In practice, the first row of the affine transformation matrix is [1, 0, 0, 0] because of using a homogeneous coordinate system; so the storage space can reduce to ( f − 1) × 12 × rbytes × 2−10 (kb). With close distortion status, the proposed method can be compared with regard to the data compression in the original method [16], as shown in Table 7. The compression ratio is the ratio between the sizes of the original sequence B0 and the compressed stream Bc . The improve ratio, which is the ratio between the sizes of the original PCA method [16] and the proposed method, can reach approximately 2.38 times that of the PCA method used alone, as shown in Fig. 9. Table 8 shows a comparison between the PCA without the affine transformation method and the proposed affine transformation within the PCA in terms of the computation cost, rendering speed, storage size, and error. Affine transformation matrices require additional calculation time;
Table 7 Comparison ratio between the data requirement for application of normalization and PCA for similar distortion. Horse+RTS Normalization Base 12 13 3+3+3 Size(kb) 1223(kb) 1325(kb) 319(kb) Df 68.4374 59.7166 67.059 Da 3.0134 2.5999 3.186 Dn 8.4187 8.0817 9.433 Ds 400 378.2 513.5 Compression ratio = 19425 / 319 =60.89 Improve ratio = 1223 / 319 = 3.83 Base 31 32 10 + 10 + 10 Size(kb) 3160(kb) 3262(kb) 1044(kb) Df 16.5891 15.6535 16.0006 Da 0.7289 0.6876 0.7501 Dn 3.8478 3.7093 3.95 Ds 189.1 181.4 217.2 Compression ratio = 19425 / 1044 = 18.6 Improve ratio = 3160 / 1044 = 3.03
therefore, the computation cost will be larger than that in the PCA without the affine transformation method. The experimental results show that the computation cost for calculating the affine transformation matrices is 10.51% of the proposed method, the average encoding speed is 49 frames per second (fps), and the average decoding speed is 124.5 fps. The results also show that the proposed method is sufficient and can be used for the real-time animation of large 3D meshes. Our method reduces the storage cost by 16% on an average, and improves the error rate by approximately 6.7% (the bases of each model are selected under the optimal condition). PCA compresses individual frames. CPCA [20] and LPCA [27] further carry out second-order compression based on the predictions of frame space and time. Our
LEE et al.: 3D ANIMATION COMPRESSION USING AFFINE TRANSFORMATION MATRIX AND PRINCIPAL COMPONENT ANALYSIS
1083 Table 8 The comparison measured on a Pentium 4 (3.0 GHz) with 2 GB of main memory and implemented with Matlab 6.5. PCA without affine transformation method Model selected base encode encode decode decode size error (x, y, z) (s) (fps) (s) (fps) (kb) (D f ) Cow (9,9,9) 2.014 101.291 0.797 255.96 336 38.216 Chicken (36,37,33) 8.563 46.713 3.016 132.626 1454 1.9693 Horse (5,14,13) 5.532 34.707 2.03 94.581 1104 10.71 Horse+RTS (17,17,18) 5.751 33.385 2.047 93.796 1794 7.6776 Average 5.465 54.024 1.973 144.241 1172 14.643 Proposed affine transformation within PCA Model selected base affine encode encode decode affine decode size error (x, y, z) (s) (s) (fps) (s) (s) (fps) (kb) (D f ) Cow (6,8,9) 0.235 2.015 90.667 0.782 0.125 224.917 295 33.104 Chicken (39,38,25) 0.532 8.484 44.366 3.047 0.296 119.653 1418 2.0477 Horse (5,13,14) 0.672 5.53 30.958 2.031 0.469 76.8 1113 10.519 Horse+RTS (5,13,14) 0.687 5.734 29.902 2.032 0.468 76.8 1113 8.9559 Average 0.532 5.441 48.973 1.973 0.34 124.543 984.75 13.657
Fig. 9 Improve ratio curve for animation models using our method for various bases with close distortion. The Horse animation can reach approximately 3.27 times that of PCA used alone and 1.6 and 2.24 times the Chicken and Cow animation, respectively.
affine transformation matrix method can carry out further compression based on the moving features in the model. In the practical application of our method, the affine transformation matrix performs the normalization and subsequently compressions based on PCA; finally, it performs CPCA or LPCA compression to achieve third-order compression effects. 8. Contributions and Future Direction This study proposes the compression of 3D animation data by PCA, which has following advantages: • PCA effectively reduces the compressed data volume. By determining the contribution of bases, this study presents a method to automatically calculate the required number of bases. • Experiments indicate that with similar data volume, compressing the 3D animation model by conducting PCA separately on the three axes improves both the error rate and the processing speed. • The affine transformation matrices are used to transform the 3D model to the normalized space and to enable PCA to select more precise bases.
This paper applies PCA to analyse the vertex coordinates of the 3D animation model and select suitable bases to compress the huge data involved in 3D animation. Moreover, the movement of 3D animation in real space is realized after a geometric transformation. Due to the reference space change, PCA cannot effectively select suitable bases. This paper uses the affine transformation matrix to transform the moving frame to normalization coordinates, limiting the PCA space within a certain scope. This is achieved at the expense of only 4 × 4 matrices to lower the distortion and restrict it to a certain range. Future research should attempt to reduce the calculation time of PCA to facilitate applications in real-time systems and improve its memory requirements. Such attempts could enable a mobile device to play high-quality 3D animation. In addition, discussions may be conducted on how to effectively select the most suitable bases in the case of a low number of bases to obtain an approximate animation model with the desired amount of stored data. References [1] A. Shamir and V. Pascucci, “Temporal and spatial level of details for dynamic meshes,” Proc. Virtual Reality Systems and Techniques, pp.423–430, 2001. [2] B. O’Neill, Elementary Differential Geometry, Academic Press, 1997. [3] B.S. Jong, J.L. Tseng, and W.H. Yang, “An efficient and low-error mesh simplification method based on torsion detection,” The Visual Computer, vol.22, no.1, pp.56–67, 2006. [4] D. Zhang and Z. Zhou, “2-Directional 2-dimensional PCA for efficient face representation and recognition,” Neurocomputing, vol.69, no.1-3, pp.224–231, 2005. [5] E.S. Jang, J.D.K. Kim, S.Y. Jung, M.J. Han, S.O. Woo, and S.J. Lee, “Interpolator data compression for MPEG-4 animation,” IEEE Trans. Circuits Syst. Video Technol., vol.14, no.7, pp.989–1008, 2004. [6] F.C. Huang, B.Y. Chen, Y.Y. Chuang, and M. Ouhyoung, “Animation model simplification,” Proc. Workshop on Computer Graphics, 2005, http://www.cmlab.csie.ntu.edu.tw/˜robin/plist.html, last viewed date: 2007/5/17. [7] G.H. Golub and C.F. van Loan, Matrix computations, 3rd ed., John Hopkins University Press, Baltimore, London, 1996.
IEICE TRANS. INF. & SYST., VOL.E90–D, NO.7 JULY 2007
1084
[8] H. Briceno, P. Sander, L. McMillan, S. Gotler, and H. Hoppe, “Geometry videos: A new representation for 3D animations,” ACM Symp. Computer Animation, pp.136–146, 2003. [9] H.K. Kaiser, “The application of electronic computers to factor analysis,” Educational and Psychological Measurement, vol.20, pp.141– 151, 1960. [10] I. Guskov and A. Khodakovsky, “Wavelet compression of parametrically coherent mesh sequences,” ACM Siggraph Symposiumon Computer Animation, pp.183–192, 2004. [11] I.T. Jolliffe, Principal Component Analysis, Springer series in statistics, Springer-Verlag, 1986. [12] J.E. Lengyel, “Compression of time-dependent geometry,” ACM Symposium on Interactive 3D Graphics, pp.89–96, 1999. [13] J. Rossignac, A. Safonova, and A. Szymczak, “3D compression made simple: Edgebreaker on a corner table,” Proc. Shape Modeling International Conference, pp.278–283, 2001. [14] K. Mamou, T. Zaharia, and F. Preteux, “A preliminary evaluation of 3D mesh animation coding techniques,” Proc. SPIE Conference on Mathematical Methods in Pattern and Image Analysis, vol.5916, pp.44–55, 2005. [15] L. Ibarria and J. Rossignac, “Dynapack: Space-time compression of the 3D animations of triangle meshes with fixed connectivity,” ACM Siggraph Symposium on Computer Animation, pp.126–135, 2003. [16] M. Alexa and W. M¨uller, “Representing animations by principal components,” Comput. Graph. Forum, vol.19, pp.411–418, 2000. [17] M. Brand, “Incremental singular value decomposition of uncertain data with missing values,” Proc. European Conference on Computer Vision, vol.1, pp.707–720, 2002. [18] M. Garland and P.S. Heckbert, “Surface simplification using quadric error metrics,” SIGGRAPH Conference Proc., pp.209–216, 1997. [19] M. Mueller, B. Heidelberger, M. Teschner, and M. Gross, “Meshless deformations based on shape matching,” Proc. SIGGRAPH, pp.471– 478, 2005. [20] M. Sattler, R. Sarlette, and R. Klein, “Simple and efficient compression of animation sequences,” ACM Siggraph Symposium on Computer Animation, pp.209–217, 2005. [21] N. Kambhatla and T.K. Leen, “Dimension reduction by local PCA,” Neural Comput., vol.9, pp.1493–1516, 1997. [22] P. Glardon, R. Boulic, and D. Thalmann, “PCA-based walking engine using motion capture data,” Proc. Computer Graphics International, pp.292–298, 2004. [23] P. Glardon, R. Boulic, and D. Thalmann, “A coherent locomotion engine extrapolating beyond experimental data,” Proc. Computer Animation and Social Agent, pp.73–84, 2004. [24] S.C. Snook and R.L. Gorsuch, “Component analysis versus common factor analysis: A Monte Carlo study,” Psychological Bulletin, vol.106, no.1, pp.148–154, 1989. [25] T.Y. Lee, P.H. Lin, S.U. Yan, and C.H. Lin, “Mesh decomposition using motion information from animation sequence,” Computer Animation and Virtual Worlds, vol.16, pp.519–529, 2005. [26] Y. Miao, J. Feng, and Q. Peng, “Curvature estimation of pointsampled surface and its applications,” International Conference on Computational Science and Its Applications, Lecture Notes in Computer Science, vol.3482, pp.1023–1032, 2005. [27] Z. Karni and C. Gotsman, “Compression of soft-body animation sequences,” Computer Graph., vol.28, pp.25–34, 2004.
Pai-Feng Lee received the BS degree in computer science from Tamkang University in 1999 and the MS degree from Chung-Yuan Christian University in 2003. He is now a PhD candidate in electronic engineering at ChungYuan Christian University, Taiwan. And he will defend his dissertation in the 2007 academic year. His research interests include computer graphics (reconstruction, simplification) and computer animation.
Chi-Kang Kao received the BS and MS degrees in computer engineering from ChungYuan Christian University, Taiwan. He is now a PhD candidate in electronic engineering at Chung-Yuan Christian University, Taiwan. His research interests include computer graphics and computer animation.
Juin-Ling Tseng received his BS from SooChow University in 1994 and MS/PhD. from Chung Yuan Christian University, Taiwan, in 1996/2006. Currently, he is an assistant professor in the Dept. of Management Information System, Chin Min Institute of Technology, Taiwan. His research interests include the areas of computer graphics.
Bin-Shyan Jong received the BS degree in computer science from Chung-Yuan Christian University in 1978, and the MS and PhD degrees from the Institute of Computer Science at National Tsing Hua University in 1983 and 1989. He is a professor in the Dept. of Information and Computer Engineering, Chung Yuan Christian University. His research interests include computer graphics and computer aided education.
Tsong-Wuu Lin received the BS degree in Computer Science from Tamkang University in 1985. He received the MS and PhD degree in Computer Science in 1987 and 1991 at National Tsing Hua University, Hsinchu, Taiwan. He is now a professor in Dept. of Computer and Information Science, Soochow University. His research interests include computer graphics and computer aided education.