Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
Perceptually Enhanced Multimedia Processing, Visualization and Transmission Irene Cheng and Randy Goebel Dept. of Computing Science, University of Alberta, Edmonton, Canada Contact:
[email protected]
Abstract Data reduction has long been a method for adaptation to limited computational and network resources. But one major concern is the tradeoff between preserving visual quality and reducing data size. Furthermore, the presence of multi-modal data, e.g. visual and aural, is common and therefore distributing competing resources among multi-modal data to achieve optimal visual quality becomes a major. Since humans are typically the penultimate viewer of multimedia data, it is reasonable to take human perception into consideration during the data reduction and resource distribution process, in order to estimate and control the resulting visual quality. Psychophysical experiments reported in the literature have shown that better performance can be achieved in multimedia processing, visualization and transmission, by incorporating perceptual factors. This paper gives an overview of how human perception plays a role in the development of multimedia applications, so as to inspire and inform future research in this direction.
1. Introduction Following the rapid development of hardware, software, and network technology, multimedia content has become more accessible to the public. Voluminous and computational expensive content---otherwise too costly and time consuming to capture, store, process and transmit--- more readily available. However, the increasing demand on high quality multimedia content surpasses the supply of resources, especially in realtime transmission and visualization applications. Despite the promises of super networks and quality of service (QOS) management, latency and poor quality interactivity remain an issue when using high resolution data. Consequently, data reduction
continues to be a necessary and challenging research topic. Many algorithms have been proposed in the literature on data compression and reduction, for both two- and three- dimensional, and time-varying data. This approach to data compression often involves two stages. The first stage converts the original data to another representation which occupies less space and thus can be transmitted faster. The second stage recovers the original data before visualization or other usage. However, there is a trade-off between transmission efficiency and data integrity. Higher compression ratios are often associated with higher data loss. This leads to two research issues: (1) How to obtain an optimal balance between transmission efficiency and visual quality, and (2) How to distribute available resources among multimodal data in order to achieve the best overall visual quality. The traditional approach is to apply data reduction algorithms and then rely on Signal-to-Noise Ratio (SNR) measurement to assess the quality of the resulting data. But recently there is increased interest on aspects of human perception, when developing the data reduction algorithms. Human perception has been studied extensively in the psychology discipline, and it is often beneficial for other disciplines such as computer graphics, image processing, computer vision, and multimedia communication, to share wellestablished psychology methodologies. From the human computer interface (HCI) perspective, aural and visual data can trigger user response consciously or unconsciously. The paradigm of the perceptual approach is to observe that perceived quality is important only if it is perceptible to the human observer. In other words, presenting imperceptible detail to the viewer only wastes computational and network resources. Extensive psychophysical experiments have been conducted to assess the
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
limitation of the Human Visual System (HVS). Experiments show that multimedia applications can take advantage of the human perceptual threshold, and bypass redundant data that fall below such thresholds. They also show that redundant data, which cannot be detected using traditional approaches, can be detected by using a perceptual approach. The perceptual approach is not meant to replace traditional methods, but is to be used in conjunction with other techniques to further reduce redundant data and achieve higher resource efficiency. This paper gives an overview of how human perception plays a role in the development of multimedia applications, and highlights some recent development that exploit the perceptual approach. The goal is to inspire more state-of-the-art research on perceptually enhanced multimedia communication. The rest of the paper is organized as follows: Section 2 reviews some well recognized and commonly used perceptual evaluation methods. Section 3 discusses Just-Noticeable-Difference (JND) in human perception, and its application. Section 4 looks at perceptually driven texture reduction. Section 5 reviews perceptually driven and geometrically driven 3D mesh simplification and refinement. Section 6 discusses perceptually optimized transmission of integrated texture and mesh, taking packet loss into consideration. Section 7 reviews the foveation technique. Section 8 discusses perceptually motivated multimedia education, before the paper is concluded in Section 9.
2. Perceptual evaluation methodology The quality of any presented data affects the understanding of that data, as well as user response to that data. To achieve high quality data presentation is therefore an important issue in the visualization of multimedia content. Different techniques, such as illumination [45], perception of depth [49], and visual masking [22], have been proposed to improve visual quality and thus enhance the realism and (hopefully) understanding of the content in the virtual world. Many evaluation metrics have been suggested to compare the performance of these techniques. Traditional metrics are often based on geometric measurement, but, in recent years, perceptual metrics have gained increasing attention [34] [31] [38] [42] [46] [48]. While geometric metrics provide cues leading to a compression choices, perceptual metrics are more reliable because visual quality is ultimately determined by the HVS. But human judgment is rarely
unbiased, being affected by environmental and psychological factors. Accurate perceptual evaluation experiments, therefore, must use a sufficiently large sample size, and preconditions sometimes have to be set. For example, in some experiments the subjects (judges) are categorized based on age and academic background. The goal is to obtain the general perceptual behavior of that population, based on the statistics drawn from the sample. Rating is an evaluation technique where subjects are asked to rank a stimulus based on a quality scale from 1 to 5, corresponding to very bad, bad, average, good and very good [38]. However, different subjects may have different interpretations on good and bad, which affect the accuracy of the rating technique. The naming time technique [46] is useful in studying visual latency (time between receiving a signal and taking an action), but is unlikely to capture visual quality because judgment is influenced by the subject’s prior knowledge of the stimulus. In other words, the time required to recognize and name the stimulus does not necessarily reflect the quality of the simplified 3D object. An efficient and commonly used perceptual evaluation technique is Two-Alternatives-ForcedChoice (2AFC). In the 2AFC procedure, a subject is forced to choose one of two given stimuli. The subject’s task is to state which stimulus matched some predefined criteria better. The subject’s decision is recorded as either correct or incorrect. The format of the 2AFC technique varies depending on the application: § With or without reference − a reference stimulus can be displayed to guide the subject’s decision. § Temporal or spatial comparison − the stimuli may occur in successive intervals, or they may occur side by side [2]. § Adaptive or random generation − the next pair of stimuli can be generated adaptively based on the current response. For example, if the current response is wrong, the next pair is easier to discriminate; and if the current response is correct, the next pair is more difficult to discriminate. In random generation, the next pair is randomly selected from a predefined set; the subject’s current response does not dictate the choice of the next pair of stimuli. Psychophysical experiments can be used at a verification stage, or can be used at an analysis stage. At a verification stage, psychophysical experimental results are used to support the performance of a proposed algorithm. For example, instead of using
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
geometric measurement, subjective measurement involving human subjects was conducted in [28] to support the proposed method for improving visual quality by removing artifacts introduced during image compression. At an analysis stage, psychophysical experiments are often used in order to locate a threshold, which can be estimated using one of the following: 1. A psychometric function or sigmoid curve − the percentage of correct judgments for each class of stimuli pair is plotted as a psychometric function or sigmoid curve, where the vertical scale denotes the percentage of correct responses and the horizontal scale denotes the class values. The threshold can be located at 75% correctness [2]. The reason is that when the two stimuli are clearly distinguishable, the judgment is correct 100% of the time. If the difference between the stimuli is not apparent to the subject, (s)he is forced to guess, and the possibility of picking the correct answer is 50% on the average. 2. A staircase response graph − the vertical scale of the graph denotes starting values. Based on the correct/ wrong responses, a response graph can be plotted, where stepping up means correct and stepping down means wrong. The tests are designed so that, after a sufficient number of tests have been performed, the graph will reach equilibrium, which corresponds to a threshold value. However, this one-up-one-down staircase version does not reflect the true equilibrium if the subject gives random answers, because random answers generate a horizontal path (equilibrium) staying at the starting value. The two-up-one-down staircase algorithm [47] is found to be more accurate in tracking the true pattern. In this procedure, the graph moves up only after two correct responses and moves down after one incorrect response. This staircase procedure satisfies the equation:
p 2 = (1 − p) + p(1 − p) = 1 − p 2
(1) where p is the probability of a correct answer. The graph converges to an equilibrium for p = 70.7%. Equation (1) guarantees that random responses by observers (p = 50%) do not converge to the equilibrium. Once a threshold or equilibrium is obtained, the threshold value, e.g., just noticeable difference or JND, discussed in the next section, can then be used in algorithms to measure perceptual impact.
3. Just-Noticeable-Difference (JND) and human perception Signal-to-Noise Ratio (SNR) is used in traditional methods to measure the distortion of visual quality in images, which may not be consistent with the way the HVS assesses quality. For example, Mean Square Error (MSE) used to measure the visual distortion of an image is shown to be inaccurate compared with the results from human judgment [44]. This leads to the study of perceptual parameters such as JustNoticeable-Difference (JND). JND is defined as the threshold ∆x, smaller than which the HVS is incapable of discriminating the difference between x and (x + ∆x). JND can be associated with different multimedia content included images, video, sound and 3D objects. For example, a perceptual compression scheme is proposed in [13] to remove visually redundant data from images. A perceptually adaptive color quantizer is used to estimate the error visibility threshold, which defines the perceptually indistinguishable region for each color. Colors in this region can be quantized more coarsely than those in other regions without noticeable visual degradation. Researchers studying these kind of visual perception are inspired by three basic principles: a) Weber’s Law of relative change − the impact on human perception is related to the relative change in magnitude, not the absolute change. b) Weber-Fechner’s Law − The visual impact of a stimulus is determined by the intensity distribution in a region. In other words, given two light spots L1 and L2 of equal brightness B, where L1 is surrounded by light spots of same brightness B, while L2 is surrounded by spots of decreasing brightness. L2 will appear brighter than L1 as perceived by the HVS. c) Contrast sensitivity function (CSF) − Minimum detectable difference (sensitivity threshold) in luminance between a test spot and a uniform visual field, increases linearly with background luminance at daylight levels. However, virtual scenes and stimuli are not uniform, and contain complex frequency content. Outside a small frequency range, the sensitivity threshold drops off significantly. This phenomenon has led to the study of the perceptibility of contrast gratings (sinusoidal patterns that alternate between two extreme luminance values). Threshold contrast at a given spatial frequency is the minimum contrast that can be perceived in a grating. Contrast sensitivity is the reciprocal of threshold contrast.
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
CSF plots contrast sensitivity against spatial frequency. It describes the range of perceptible contrast gratings [31]. Given a spatial frequency, the ability to discriminate a stimulus depends on the luminance contrast. The ability to discriminate moving stimulus decreases as the velocity increases [2]. Visual cues, e.g., brightness and spatial frequency, have been studied extensively by researchers for the purpose of assessing the perceptual effects of different texture properties. More perceptual parameters are discovered and included in studies for color image, video, sound and music. Following the rapid development of the entertainment and games industry, perceptual evaluation is also extended to 3D graphics and animation applications. Examples of how human perception plays a role in multimedia applications are included in the following sections.
4. Perceptually reduction
driven
texture/image
Feature point distribution on 3D surfaces is used as a visual quality predictor [3] on texture. This is based on Weber-Fechner’s Law and the observation that intensity distribution in a region is induced by the underlying geometry represented by feature point distribution. In addition to feature point distribution, 2D properties of the texture can also affect the visual quality of a textured mapped surface. Examples of these properties include the persistence against quantization, texture brightness, and texture pattern complexity. These 2D and 3D factors are classified as geometry driven ρ g and texture driven ρ t visual predictors and are assigned different weights
ωt
ωg
and
respectively [5]. a
b
c
k =1
g =1
t =1
℘ = ∑ ω k ρ k = ∑ ω g ρ g + ∑ ωt ρ t
(2)
An overall prediction value ℘ is computed and is used to estimate the degree of data reduction applied in order to preserve satisfactory visual quality. This visual prediction model ensures that visually important regions are preserved with higher quality. While the data size remains the same, this model produces better visual effect than simply assigning equal quality to the entire image (Figure 1).
Figure 1: (II) shows the original images; (I) shows images with equal quality; and (III) shows images assigned with different qualities based on the visual quality prediction model. The data size is the same in both (I) and (III). It can be seen that the texture property in (III) is preserved better, especially in the baboon region. There are other examples of perceptually driven texture quality assessment. An image fidelity assessor was discussed in [44]; the assessor accepts two grayscale images as input, and produces a distortion value. A visual difference predictor was used to select the appropriate global illumination algorithm [45]. The visibility of differences between two images is used to determine whether a particular area of a synthetic scene needs refinement [50]. Perceptual redundancy in color images is measured in [14] so that perceptually indistinguishable colors are suppressed. Since video is a sequence of image frames, techniques used to evaluate image quality can be applied with additional considerations like inter-frame redundancy and motion attention feature [51]. Experimental results [35] that examine a number of perceptual quality assessment metrics, show that perceptual metrics can capture distortions in video quality that cannot be captured by the Peak-Signal-to-Noise Ratio (PSNR) metric. By taking human perception into consideration, visually redundant data, and thus data size, can be reduced achieving higher efficiency in image compression. In addition to compression, texture data can be further reduced at a global level. It is observed that the HVS can discriminate the degradation when the interpolation of a texture is within a certain percentage [8]. In other words, a smaller image (n∆n)2 is sufficient to map onto a display area of n2 pixels by up-scaling, without noticeable degradation in visual quality. An illustration is given in Figure 2, which compares the original image of 962 pixels (left) with an image interpolated from 822 to 962 pixels (right).
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
Figure 2: (Left) Original image of 962 pixels. (Right) Image of 822 pixels interpolated to 962 pixels. No significant degradation is found in the interpolated image.
5. Perceptually vs. geometrically guided 3D mesh simplification and refinement 3D mesh is another major multimedia content type. An important consideration in designing efficient realtime, especially interactive, systems is to adapt to available resources by simplifying 3D geometry, probably compromising quality. Detailed surveys on mesh simplification, or level-of-detail (LOD) can be found in [24] [40] [17] [34] [39]. The idea of applying LOD was discussed as early as 1976 by Clark [15], and gradually static, dynamic or view-dependent LOD were developed. The static approach preprocesses a number of discrete versions at different resolutions for each 3D object, corresponding to different levels of detail. At runtime, the appropriate version is chosen and rendered. Instead of creating discrete versions, a dynamic approach uses a data structure to encode a continuous spectrum of detail (CLOD); the desired level can be extracted at runtime. The view-dependent approach is an extension of the dynamic approach. Criteria are defined to select the most appropriate level for the current view. Each approach has its advantages and disadvantages, but these conventional methods often rely on geometric metrics and assume that visual quality improves as the mesh resolution increases [25] [23] [27] [29], which contradicts the experimental finding that texture resolution has more significant impact on quality after the mesh resolution has reached a certain threshold [38] [43]. Simplification techniques taking human perception into consideration emerged towards the late twentieth century [18] [33]. Perceptual metrics started to gain attention among researchers because assessment relying on geometric criteria, such as mean square error (MSE) or quadric error [23] is not sufficient because geometrically different objects can be visually indistinguishable to the HVS. A number of visual quality models are discussed in [37]. A perceptual approach was used, by approximating the Contrast Sensitivity Function
(CSF), to simplify details in a scene [41]. The metric derived from the CSF was used to perform simplification operations [31]. Prior approaches were improved by accounting for textures and dynamic lighting [48]. Two visual fidelity algorithms for mesh simplification were discussed in [46]. Reddy approximates the contrast sensitivity function (CSF) in dynamic scenes to optimize the amount of detail removed from the scene without the user noticing [41]. The velocity of the object is incorporated in his model. CSF is also used to derive perceptual metrics in order to measure the perceptibility of visual stimuli [31]. In their approach, only simplification operations inducing imperceptible contrast and spatial frequency are performed. Williams et al. [48] improves prior approaches by accounting for textures and dynamic lighting. These techniques are view-dependent, while the JND approach [4] is view-independent and can be applied to measure the perceptual impact generated by a change on the 3D surface; this makes online transmission more efficient, by suppressing visually redundant mesh data. Furthermore, JND shows that screen rendering does not need to go down to the grid point level on the display because the HVS is unable to discriminate the additional detail far before reaching the grid point. Given a desired display dimension, a mesh of the appropriate resolution can be automatically selected [6]. The JND approach can also be applied to compare and evaluate the performance of different simplification algorithms [19].
6. Perceptually optimized transmission of integrated texture and mesh taking packet loss into consideration Multimedia content often contains more than one component, e.g., 3D mesh and surface texture, and more. Given limited network resources, the question is how to distribute the available bandwidth between the different components while optimizing visual quality. Strategies that take packet loss into consideration [1] [10] [30] focus on either geometric details or images, but not on integrated texture mapped mesh data. This leads to the introduction of a perceptually optimized model and a transmission strategy (discussed below), which can handle packet loss on a mesh mapped with photo-realistic texture.
6.1. Optimization given limited resources and unreliability Since MSE and SNR does not always correlate well to perceived quality based on human evaluation [36], a
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
number of new quality metrics based on the human visual system have been developed [32]. In perceptual evaluations [38], it was observed that: (i) Perceived quality varies linearly with texture resolution; (ii) Perceived quality varies following an exponential curve for geometry. The mathematical model proposed in [38] was later extended to estimate the optimal combination of geometry and texture data given limited bandwidth [20]: 1 Qb, G ,T (t )
=
1 b − t 2T 2.7 (3) 1 + (1 − )(1 − ) 1 + 4t G 1 + 4t
where b is the bandwidth, G is the original geometry data and T is the original texture data, t is the scaling factor on texture. By minimizing the right hand side of Equation (3), the visual quality is maximized. An example curve is shown in Figure 3, based on b = 12 Mbits , T = 20 Mbits, and G = 10 Mbits.
Figure 3: Example of a quality curve for different texture scaling factors (t) given limited bandwidth.
Figure 4: From left to right, 30%, 60% and 80% randomly selected packet loss was applied to a Cow mesh mapped with color.
While most transmission strategies [16] are based on reliable networks, such an assumption is not always true especially when wireless and mobile devices become commonplace. Adapting bandwidth fluctuation and preserving visual quality, using fragmentation and Harmonic Time Compensation Algorithm (HTCA) was discussed in [7], but again reliable transmission was assumed. When multimedia data are transmitted over unreliable networks, the challenge is how to preserve visual quality when a percentage of the data is lost. A 3D Perceptually Optimized Partial Information Transmission (3POPIT) strategy was proposed in [20]. The main idea is to distribute neighboring data in different packets to avoid generating a big void on the mesh (Figure 4) or texture (Figure 5) surface because of lost data. Experimental results show that visual quality can be preserved well for up to 60% loss by using the proposed perceptually optimized model [21].
7. Foveation transmission
based
simplification
and
Spatially varying sensing (foveation) has been used in many different areas of Computer Vision, such as image compression and video teleconferencing and in perceptually driven LOD representations in graphics. Foveation is advantageous for interactive mesh and texture transmission in online 3D applications [12] [53] [54] [55] [56]. Unlike traditional mesh representations, where all 3D vertices need to be transmitted, only a collection of points-of-interest (foveae) and information on only one (rather than three) axis need to be transmitted. Thereby, a threefold data reduction can be achieved in a new 3D model. Foveation can also be used in conjunction with eyetracking techniques: when a region of interest (ROI) is the perceived target of the viewer, the foveation approach is more effective than other LOD techniques. Foveation also leads to the study of predicating and evaluating saliency [26] so that visually more important regions can be displayed in better quality.
8. Perceptually motivated web-based multimedia education and other applications Figure 5: Interpolating and reconstructing Lena image; when 1 (left), 4 (middle) and 8 (right) of 16 packets are received.
A major trend in education is towards online lecture and testing. In recent years, education has been migrating from traditional pen and paper style to online multimedia computer-based learning. Online multimedia education creates new learning
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
opportunities, including remote access and sharing of learning resources, which is unlikely to be accessible or available so promptly without online support. The rich and inspiring multimedia content also far surpasses what can be obtained from pen and paper format. Color content is expensive in printed form but is widely available in e-education. Instead of a number of restrictive presentation formats such as multiple choices, e-education opens up far more innovative presentation formats, including animation, 3D display, audio, video, etc. In general, multimedia content can arouse students’ interest in higher education and skills. A number of innovative learning objects are already implemented, the usage of which requires different types of visual response actions from the students [11]. Adaptive testing can be conducted and the response data from students can be analyzed using the 2AFC methodology [9]. Human perception is not limited to visual and audio, but also touch and smell. Copyright [52] and watermark issues are two other areas that perceptual approach can be applied. Besides, multi-modal data, e.g., audio and visual data, are often present in multimedia content such as video sequences, games and virtual conferences. Their perceptual impacts are inter-dependent. For example, by introducing audio, the viewer’s attention directed to a virtual scene can be distracted and therefore rendering a lower geometric quality is unlikely to cause significant visual degradation. These are just some aspects still open to more advanced research for more years to come.
9. Conclusion We discuss how human perception can play a role in multimedia applications and review recent developments, including some of our contributions, in this research direction. The quantitative aspect of data reduction techniques has been explored extensively in the literature, but the qualitative aspects taking human perception into consideration are still inadequately studied. It is important to note that the perceptual approach is not meant to replace traditional geometric based approach, but to complement previous methods to achieve higher efficiency. However, perceptual impacts cannot be represented by numeric values directly, and the average perceptual behavior of a sample is used to represent the general behavior of the target population. Proving the utility of a perceptual model is therefore very different from proving a
mathematical model. In order to pursue research on perceptual enhancement in multimedia, more in-depth understanding on psychophysical methodologies, as well as more inter-disciplinary collaboration, is thus necessary.
References [1] Alregib G., Altunbasak Y. and Rossignac J., “ErrorResilient Transmission of 3D Models,” ACM Trans. on Graphics, April 2005. (Early version in ICASSP 02.) [2] Boff K., Kaufman L. and Thomas J., “Handbook of perception and human performance, Sensory Processes and Perception,” A WileyInterscience Publication, vol. I, 1986. [3] Cheng I. and Boulanger P., “Feature Extraction on 3D TexMesh Using Scale-space Analysis and Perceptual Evaluation,” IEEE Trans. on CSVT Special Issue, vol. 15, no. 10 (11 pages), October 2005. [4] Cheng I. and Boulanger P., “A 3D Perceptual Metric using Just-Noticeable-Difference,” EUROGRAPHICS 2005 Dublin, short paper (4 pages). [5] Cheng I. and Boulanger P., “A Visual Quality Prediction Model for 3D Texture,” EUROGRAPHICS 2005 Dublin, Short Paper, 4 pages. [6] Cheng I. and Boulanger P., “Automatic Selection of Level-of-Detail based on Just-Noticeable-Difference (JND)” in Proc. SIGGRAPH Poster Session (1 page abstract) 2005. [7] Cheng I. and Boulanger P., “Adaptive Online Transmission of 3D TexMesh Using Scale-space and Visual Perception analysis,” IEEE Transactions on Multimedia, Vol. 8, No. 3, 12 transactions pages, June 2006. [8] Cheng I. and Bischof W., “A Perceptual Approach to Texture Scaling based on Human computer Interaction," EUROGRAPHICS 2006, Vienna, short paper (4 pages). [9] Cheng I. and Basu A., “Improving Multimedia Innovative Item Types for Computer Based Testing,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [10] Z. Chen, B. Bodenheimer and J. Barnes, “Robust Transmission of 3D Geometry over Wireless Networks,” In Web3D, 2003, pages 161-172. [11] Cheng I., Gierl M. and Basu A., "Evaluating Performance Features with 3D Item Types for use with Computer-Based Tests in Education," Frontiers in CS & CSE Education (FECS'06), 7 pages, June, 2006, Las Vegas, USA. [12] Cheng I., “Foveated 3D Model Simplification,” 7th Int'l Symposium on Signal Processing & its Applications (ISSPA), 4 pages, July 2003, Paris, France. [13] Chou C. and Liu K., “Color Image Compression using Adaptive color Quantization,” ICIP 2004, pp. 2331-2334. [14] Chou C. and Liu K., “Perceptually Optimal JPEG2000 Coding of Color Images,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [15] Clark J., “Hierarchical geometric models for visible surface algorithms,” ACM Commumication, 19(10): 547554, 1976.
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
[16] Cohen-Or D., Mann Y., Fleishman S., “Deep compression for streaming texture intensive animations,” in Proc. SIGGRAPH 1999, pp. 261-267. [17] Cignoni P., Montani C., Scopigno R., “A comparison of mesh simplification algorithms,” Computer and Graphics, 22(1): 37-54, 1998. [18] Cohen J., Olano M. and Manocha D., “Appearancepreserving simplification,” in Proc. SIGGRAPH 1998, pp. 115-122. [19] Cheng I., Shen R., Yang X. and Boulanger P., “Perceptual Analysis of Level-of-Detail: The JND Approach,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [20] Cheng I., Ying L. and Basu A., "Packet Loss Modeling for Perceptually Optimized 3D Transmission," IEEE Int'l Conference on Multimedia, 4 pages, July, 2006, Toronto, Canada. [21] Cheng I., Ying L. and Basu A., “A Perceptually Driven Model for 3D Transmission over Unreliable Networks,” 3rd Int’l Sytmposium on 3DPVT, 8 pages, June 2006, North Carolina, USA. [22] Ferwerda J., Pattanaik S., Shirley P. and Greenberg D., “A model of visual masking for computer graphics,” in Proc. SIGGRAPH 1997, pp. 143-152. [23] Garland M. and Heckbert P., “Simplifying surfaces with color and texture using quadric error metrics,” in Proc. IEEE Visualization, pp. 263-269, 1998, North Carolina, USA. [24] Heckbert P., Garland M., “Survey of polygonal surface simplification algorithms,” Conf. Course: Multiresolution Surface Modeling, SIGGRAPH 1997. [25] Hinker P. and Hansen C., “Geometric optimization,” in Proc. Visualization 1993, pp. 189–195. [26] Howlett S., Hamill J. and O’Sullivan C., “Predicting and Evaluating Saliency for Simplified Polygonal Models,” ACM Transaction on Applied Perception 2005, Vol.2, No.3, Jul 2005, PP.1-23. [27] Hoppe H., “Progressive meshes” in Proc. SIGGRAPH 1996, pp. 99-108. [28] Kopilovic I and Szirányi T., “Artifact Reduction with Diffusion Preprocessing for Image Compression”, Optical Engineering, V.44. No.2, 2005. [29] Khodakovsky A., Schroder P. and Sweldens W., “Progressive geometry compression,” in Proc. SIGGRAPH 2000, pp. 271-278. [30] Lee K. and Chanson S., “Packet Loss Probability for Real-time Wireless Communications,” IEEE Trans. on Vehicular Technology, Nov. 2002. [31] Luebke D. and Hallen B., “Perceptually driven simplification for interactive rendering,” in Proc. 12th Eurographics Workshop on Rendering Techniques, pp. 223224, London, UK 2001. [32] Limb J., “Distortion Criteria of the Human Viewer,” IEEE Transactions on SMC, 778-793, 1979. [33] Lindstrom P. and Turk G., “Image-driven simplification,” ACM Trans. on Graphics, vol. 19, no. 3, pp. 204-241, Jul. 2000.
[34] Luebke D., “A Developer’s survey of polygonal simplification algorithms,” IEEE Computer Graphics and Applications, vol. 21, no. 3, pp. 24-35, May/June 2001. [35] Martinez-Rach M., Lopez O., Pifiol P., Malumbres M., and Oliver J., “A Study of Objective Quality Assessment Metrics for Video Codec Design and Evaluation,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [36] Mannos J. and Sakrison D., “The Effects of a Visual Fidelity Criterion on Encoding of Images,” IEEE Trans. on Information Theory, 1974. [37] O’Sullivan C., Howlett S., McDonnell R., Morvan Y. and O’Conor K., “Perceptually adaptive graphics,” STAR State of The Art Report, Eurographics 2004. [38] Pan Y., Cheng I., and Basu A., “Quantitative metric for estimating perceptual quality of 3D objects,” IEEE Trans. on Multimedia, vol. 7, no. 2, pp. 269-279, Apr. 2005. [39] Pauly M., Gross M. and Kobbelt L., “Efficient simplification of point-sampled surfaces,” in Proc. IEEE Visualization, pp. 163-170, 2002. [40] Reddy M., “Perceptually modulated level of detail for virtual environments,” Ph.D. Thesis, University of Edinburgh, 1997. [41] Reddy M., “Perceptually optimized 3D graphics,” Applied Perception, vol. 21, pp. 68-75, September/October 2001. [42] Reitsma P. and Polland N., “Perceptual metrics for character animation: sensitivity to errors in ballistic motion,” ACM Transaction on Graphics, 22(3): pp. 537-542, 2003. [43] Rushmeier H., Rogowitz B. and Piatko C., “Perceptual issues in substituting texture for geometry,” in Proc. of SPIE Human Vis. Electronic V, vol. 3959, pp. 372-383, 2000. [44] Taylor C., Pizlo Z. and Allebach J., “Perceptually relevant image fidelity,” in Proc. of IS&T/SPIE Int’l Symposium on Electronic Imaging Science and Technology, pp. 110-118, 1998. [45] Volevich V., Myszkowski K., Khodulev A. and Kopylov E., “Using the visual differences predictor to improve performance of progressive global illumination computation,” ACM Trans. on Graphics, 19(1): 122-161, 2000. [46] Watson B., Friedman A. and Mcgaffey A., “Measuring and predicting visual fidelity,” in Proc. SIGGRAPH 2001, pp. 213-220. [47] Wetherill G. B. and Levitt H., “Sequential estimation of points on a psychometric function,” British Journal of Mathematical and Statistical Psychology, 18, 1965, pp. 110. [48] Williams N., Luebke D., Cohen J., Kelley M. and Schubert B., “Perceptually guided simplification of lit, textured meshes,” in Proc. SIGGRAPH 2003, pp. 113-121. [49] Nagata S., “How to reinforce perception of depth in single two-dimensional pictures,” in Proc. of the SID, 35(3): 1984. [50] Bolin M. and Meyer G., “A Perceptually based adaptive sampling algorithm,” in Proc. SIGGRAPH 1998, pp. 299-309.
Int’l Symposium on Multimedia, Dec 2006, pp. 509-516
[51] Yang K. C., Clark C. and K. Pankaj, “Human Visual Attention Map for Compressed Video,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [52] Wong A. and Bishop W., “Adaptive Perceptual Degradation based on Video Usage,” IEEE Int’l Symposium on Multimedia Special Track, Dec 2006, (8 pages). [53] Basu A., Cheng I. and Pan Y., “Foveated Online 3D Visualization,” Proceedings IAPR/IEEE Int’l Conference on Pattern Recognition, 4 pages, August 2002, Quebec City.
[54] Reeves T. and Robinson J., “Adaptive Foveation of MPEG video,” ACM Multimedia Conference, pp.231-241, 1996.
[55] Cheng I., Basu A. and Pan Y., “Parametric
foveation for progressive texture and model transmission,” Eurographics, Granada, Spain, September 2003 (4 pages). [56] Basu A. and Wiebe L., “Videoconferencing using spatially varying sensing,” IEEE Transactions on Systems, Man, and Cybernetics, March 1998, 137148. (Earlier pubs. In IEEE SMC & ICPR conferences 1993, 1994.