A Perceptually Based Quantization Technique for MPEG Encoding

22 downloads 0 Views 418KB Size Report
A Perceptually Based Quantization Technique for. MPEG Encoding. Wilfried Osberger,a Anthony J. Maederb and Neil Bergmanna. aSpace Centre for Satellite ...
A Perceptually Based Quantization Technique for MPEG Encoding Wilfried Osberger,a Anthony J. Maederb and Neil Bergmanna a Space Centre for Satellite Navigation, Queensland University of Technology, Brisbane, Australia. b School of Engineering, University of Ballarat, Ballarat, Australia.

ABSTRACT

We present a technique for controlling the adaptive quantization process in an MPEG encoder, which improves upon the commonly used Test Model 5 (TM5) rate controller. The method combines both a spatial masking model and a technique for automatically determining the visually important areas in a scene. The spatial masking model has been designed with consideration of the structure of compressed natural images. It takes into account the di erent levels of distortion that are tolerable by viewers in di erent parts of a picture by segmenting the scene into at, edge, and textured regions and quantizing these regions di erently. The visually important areas are represented by Importance Maps. These maps are generated by combining factors known to in uence human visual attention and eye movements. Lower quantization is assigned to visually important regions, while areas classi ed as being of low visual importance are more harshly quantized. Results indicate a subjective improvement in picture quality, in comparison to the TM5 method. Less ringing occurs at edges, and the visually important areas of a picture are more accurately coded. This is particularly noticeable at low bit rates. The technique is computationally ecient and

exible, and can easily be extended to speci c applications. Keywords: Perceptual Video Coding, MPEG, Importance Maps, Human Visual System, Visual Attention

1. INTRODUCTION

The modeling of human visual processes is an area of active research. Many applications have been identi ed which can bene t from di erent Human Visual System (HVS) models. This is because the human is often the viewer and nal judge of the quality of a process. Applications involving human appraisal should be designed with consideration of the properties of the HVS, rather than with regard to simple objective factors. A good example of such an application is the well known Moving Picture Experts Group (MPEG) video compression standard. MPEG has been designed in a exible and open manner, which allows it to be optimized for particular scenes, applications, and bit rates. In particular, the provision of locally adaptive quantization allows HVS models to be used to achieve perceptually improved coding. Due to the limited computational capabilities of the HVS, much of the data in a typical video scene cannot be processed and can be degraded without any loss in scene understanding or quality. The locations of visually important areas in a scene are determined by our visual attention and eye movements. Studies on attention and eye movements indicate that humans generally only attend to a few areas in an image. These areas are often highly correlated amongst di erent subjects, as long as the subjects are viewing the scene in the same context. As a result, knowledge of the viewers' probable focus of attention can be utilized to vary the level of compression across a scene. In order to automatically determine the parts of an image that a human considers important, we need to understand the operation of human visual attention and eye movements. Research into the attentional mechanisms of the HVS has revealed several low level and high level factors which in uence attention and eye movements. In previous work we have used these factors to develop an Importance Map (IM) for still images, which predicts the perceptual importance of each region in a scene. In this paper, we extend the IMs to video and use them to control the adaptive 1

2

3

Further author information { W.O.(correspondence): E-mail: [email protected] A.J.M.: E-mail: [email protected] N.B.: E-mail: [email protected]

quantization process in an MPEG encoder. We also replace the TM5 activity measure with a spatial masking model more in tune with human perception. The e ect is that coding errors are redistributed. Important areas and regions where compression errors are easily visible undergo a ne quantization, whereas areas of lower perceptual importance and areas capable of strong masking are more coarsely quantized. The organization of the paper is as follows. Section 2 discusses the HVS, focusing in particular on spatial masking and on factors which in uence visual attention and eye movements. In Section 3 we give an overview of the operation of an MPEG encoder, looking in particular at the quantization strategy. We present our method for perceptual quantization in Section 4, and give results of the algorithm on typical scenes in Section 5. Finally, the discussion looks at possible enhancements to our model, and we propose further applications for IMs.

2. IMPORTANT PROPERTIES OF THE HVS

The work of Vision Researchers over the last 35 years has led to a rapid advancement in our knowledge of the operation of the HVS. However, it has not been until more recently that this knowledge has been put into practical use in image processing algorithms. The MPEG standard explicitly takes into account several low level vision properties (e.g. reduced sensitivity to chrominance), and its exibility allows other HVS features to be incorporated. This section discusses some of the most important HVS characteristics which we need to consider when designing a perceptually based MPEG quantizer.

2.1. Frequency Sensitivities

The HVS possesses a varying sensitivity to stimulus (e.g. coding errors) at di erent spatial and temporal frequencies. The relationship at the threshold of visibility for spatial frequency sensitivity is represented by a Contrast Sensitivity Function (CSF), which is approximated by the Quantization Matrix (QM) of an MPEG encoder. The shape of the CSF depends upon the stimulus used in obtaining it. For natural stimuli, it is low-pass or slightly band-pass, with a peak at mid-frequencies (4{8 cycles/degree).

2.2. Masking E ects

Masking refers to our reduced ability to detect a stimulus when the background is either spatially or temporally complex. Thus errors are less visible along strong edges (but only for a few pixels either side of the edge), in textured areas, in fast moving areas which our eyes are unable to track, or immediately following a scene change. Masking has been shown to occur over a wide range of orientations and frequencies. The amount of spatial masking caused by a background depends not only on the background's contrast, but also on the level of uncertainty created by the background. Areas of high uncertainty (e.g. complex areas of a scene) induce higher masking than areas of the same contrast with lower uncertainty. This explains why we can tolerate greater error in textured areas than along edges of the same contrast. MPEG provides for spatial masking by allowing spatially variable quantization through the MQUANT parameter. 4

5

6

7

2.3. Visual Attention and Eye Movements

In order to eciently deal with the masses of information present in the surrounding environment, the HVS operates using variable resolution. Although our eld of view is around 180 degrees horizontally and 140 degrees vertically, we only possess a high degree of visual acuity over a very small area (around 2 degrees in diameter) called the fovea. Thus in order to accurately inspect the various objects in our environment, eye movements are required. Rapid shifts in the eye's focus of attention are called saccades, and occur every 100{500 milliseconds. Visual attention mechanisms are used to control these saccades. Our pre-attentive vision operates in parallel, looking in the periphery for important areas and uncertain areas for the eye to foveate on at the next saccade. Thus a very strong relationship exists between eye movements and attention. Although areas must be attended before we will foveate on them, the converse is not true: we will not always foveate on an object that attracts our attention. However, unless an attended object is close to the fovea, limited processing can be performed on it, so only a low resolution object is required. This suggests that high resolution is required only in areas which we are focusing on. However, care must be taken when reducing peripheral resolution not to remove future visual attractors or create new attractors. 8

2.3.1. Correlation of Eye Movements between Subjects

If there was not a strong correlation between the directions of gaze of di erent people, then eye movements would be impossible to predict, and it would be dicult to make general use of eye movement information. However, studies on human eye movement patterns for both images and video indicate that eye movements are indeed highly correlated amongst subjects. The original work of Yarbus showed that a strong correlation between viewer eye movements exists, as long as the subjects were viewing the image in the same context and with the same motivation. Yarbus also demonstrated that even if given unlimited viewing time, we will not scan all areas of a scene, but will instead attend to a handful of important regions which continually attract our attention. Similar results have been found for video by Stelmach. They recorded viewer eye movements for 15 sequences containing a wide variety of typical television content. Results showed that for all of the scenes, the gaze of over 90% of the viewers was directed at only 2{3 locations in the scene. This suggests that eye movements are not idiosyncratic, and that a strong relationship exists between the direction of gaze of di erent subjects, viewing an image in the same context. 9

1

2.3.2. Factors which In uence Eye Movements and Attention

In order to automatically determine the importance of the di erent regions in an image, we need to determine the factors which in uence human visual attention. Research indicates that our attention is controlled by both high and low level factors. High level factors, sometimes referred to as top-down processes, generally involve some feedback process from memory and may involve template matching. Low level or bottom-up processes are generally fast, feed forward mechanisms involving relatively simple processing. A general observation is that objects which stand out from their surrounds are likely to attract our attention, since one of the main goals of the HVS is to minimize uncertainty. This is also in agreement with Gestalt organization theories. Low level factors which have been found to in uence visual attention include:  Motion. Motion has been found to be one of the strongest in uences on visual attention. Our peripheral vision is highly tuned to detecting changes in motion, and our attention is involuntarily drawn to peripheral areas undergoing motion which is distinct from its surrounds. Areas undergoing smooth, steady motion can be tracked by the eye and humans cannot tolerate any more distortion in these regions than stationary regions.  Contrast. The HVS converts a luminance image into contrasts at an early stage of processing. Region contrast is consequently a very strong low-level visual attractor. Regions which have a high contrast with their surrounds attract our attention and are likely to be of greater visual importance. This has been demonstrated in several studies.  Size. Findlay has shown that region size also has an important e ect in attracting attention. Larger regions are more likely to attract our attention than smaller ones. However a saturation point exists, after which the importance due to size levels o .  Shape. Regions whose shape is long and thin (edge-like) have been found to be visual attractors. They are more likely to attract attention than rounder regions of the same area and contrast.  Colour. Colour has been found to be important in attracting attention. Some particular colours (e.g. red) have been shown to attract our attention. A strong in uence occurs when the colour of a region is distinct from the colour of its background. Other low level factors which have been found to in uence attention include brightness, orientation, and line ends. Several high level factors have also been determined:  Location. Eye-tracking experiments have shown that viewers eyes are directed at the centre 25% of a screen for a majority of viewing material.  Foreground / Background. Viewers are more likely to be attracted to objects in the foreground than those in the background.  People. Many studies have shown that we are drawn to focus on people in a scene, in particular their faces, eyes, mouth, and hands.  Context. Viewers' eye movements can be dramatically changed, depending on the instructions they are given prior to or during the observation of an image. 10,11

12

9,13,14

13

10,14,15

10,11

16

9,14,15

9,15

2.3.3. Considerations when Modeling Visual Attention

Although many factors which in uence visual attention have been identi ed, little quantitative data exists regarding the exact weighting of the di erent factors and their inter-relationship. Some factors are clearly of very high importance. For example, motion is known to be an extremely important visual attractor. Niebur and Koch have weighted the importance of motion as ve times that of any other factor. However it is dicult in general to determine exactly how much more important one factor is than another. Factor X may be more important than Factor Y in one image, while in another image the opposite may be true. Due to this lack of information, it is necessary to consider a large number of factors when modeling visual attention. This caters for the case when not all of the factors are used all of the time. It is also desirable that the factors used be reasonably independent, so that a particular type of factor does not exert undue in uence on the overall importance. High level factors, and context in particular, can be very useful in determining a region's importance. In situations where a template of a target is known a priori, viewer eye movements can be modeled with high accuracy. However, in the general case, little is known about the context of viewing and about the content of the scene, so such high level information cannot be used. 17

11,18{20

21

3. OPERATION OF AN MPEG ENCODER

Currently there are two MPEG standards which have been released: MPEG-1 (1991) and MPEG-2 (1995). The much awaited MPEG-4 standard is scheduled for release in late 1998. It promises to be more exible and provide many new features such as object coding. The basic operation of the MPEG-1 and MPEG-2 standards is similar, and we refer to them generically as MPEG unless otherwise speci ed. Although we have used an MPEG-2 encoder to implement the quantizer, our algorithm could readily be applied to an MPEG-1 encoder. MPEG is a block-based coding method based on the Discrete Cosine Transform (DCT). A sequence is made up of groups of pictures, which contain pictures of three di erent types: I- (intra-coded), P- (predictive-coded) and B(bi-directionally predictive-coded). Each picture is broken into 16  16 pixel regions (macroblocks), which are further broken into 8  8 pixel regions before a DCT transform is applied. This transform signi cantly reduces the spatial redundancy in the frame and allows ecient representation following quantization via a user-de ned QM. The QM is designed both on the statistics of natural images, and on the spatial frequency sensitivity of the HVS. The signi cant temporal redundancy is reduced by allowing predictive P- and B- pictures, as well as independent I- pictures. Thus, through use of motion compensation in P- and B- pictures, only di erences in adjacent pictures need to be coded. For a complete description of the MPEG standard, refer to Mitchell et al. The quantization strategy used in MPEG revolves around the QM. Although we may specify a new QM only once per picture, we would like to use spatially variable quantization within a picture. This is because some areas of the picture can tolerate more severe distortion than others, due to the perceptual factors discussed previously. MPEG has allowed for this via the MQUANT parameter, which allows a scaling of the QM (by a factor of 1 to 31) for each macroblock. MPEG-2 provides further exibility by allowing a non-linear scaling of the QM via MQUANT. The MPEG-2 committee developed a strategy for the usage of MQUANT in their TM5 controller. This however was designed only as a basic strategy, and a more complex strategy can easily be adopted. TM5 involves three basic processes: 22

23

24

1. Target bit allocation for a frame, 2. Rate control via bu er monitoring, 3. Adaptive quantization based on local activity. The adaptive quantization strategy crudely models our reduced sensitivity to complex spatial areas by varying MQUANT in proportion to the amount of local spatial activity in the macroblock. The spatial activity measure used in TM5 is the minimum variance among the four 8  8 luminance blocks in each macroblock. However, this simple activity measure fails to take into account such HVS properties as our di erent sensitivities to edges and textures, and higher level perceptual factors. For these reasons, a more complex model is required to provide compressed pictures with improved subjective quality.

Activity Masking Model

Block Classification

MQUANT Scaling

Original Frame Contrast Size Segmentation

Final MQUANT for Frame

Combine Factors to produce IM

Shape Location Background

Previous Frame

Calculate Motion Vectors

Motion Importance

Figure 1. Block Diagram for MPEG Adaptive Quantization Controller.

4. ALGORITHM FOR PERCEPTUALLY IMPROVED QUANTIZATION

The method of adaptive quantization described here is an extension of our previous work to include the in uence of IMs. The basic operation can be seen in Figure 1. The algorithm has been developed based on the masking and attention mechanisms of the HVS discussed in Section 2. It has also been designed to be computationally inexpensive. The original frame is input, along with the previous frame for motion calculation. In our current implementation, only luminance frames are used, but chrominance factors may readily be incorporated. The upper branch of Figure 1 shows the spatial masking strategy. The image is broken up into 8  8 blocks, and each block is classi ed as either

at, edge, or texture. This is performed using the technique of Gong and Hang. Activity is then measured in each block by computing the variance (as done in TM5). This activity value is adjusted based on the block classi cation as follows: ( 25

26

act0j =

min(act ) j ; actth  act " j actth act th

if region is at if region is edge or texture

(1)

where actj is the adjusted activity for block j, actj is the variance of block j, actth is the variance visibility threshold, " = 0:7 for edge areas, and " = 1:0 for textured areas. We have used a value for actth of 5.0, which is an estimate of the variance allowable in a at block before distortion becomes visible. The adjusted activity is then used to control MQUANT as in TM5: 0 0

2:0  actj + actavg Nactj = act 0 + 2:0  act

j

(2)

avg

where Nactj is the normalized activity for block j, and actavg is the average value of actj for the previous frame. Nactj is now constrained to the range [0.5 2.0]. This technique ensures minimal quantization error in at regions, increases quantization with activity gradually along edges, and increases quantization with activity more signi cantly in textured regions. 0

The incorporation of IMs into the coder can be seen in the lower branch of Figure 1. The spatial importance measure is calculated using the method discussed in our earlier work. The image is rst segmented into regions. Importance factors for each of those regions are then calculated considering a number of factors known to in uence attention. These factors are combined linearly to produce a spatial importance map, scaled to the range [0.0 1.0]. In this paper we extend this method to include a motion importance factor, which is weighted equally with the overall spatial importance. The calculation of motion importance requires the motion vectors for each 8  8 block of the image to be computed. We use the technique of Westen et al., which involves an initial coarse motion vector estimation followed by successive re nement. The result is an estimate of motion which smoothly groups together regions undergoing motion. The importance of the motion is then calculated as: 2

27

8 0:0 > > mot ?mot > > < motpj1 ?motmin min ImpMotj = > 1:0 motmax ?motj > > > : motmax ?motp2 0:0

if if if if if

motj < motmin motmin < motj < motp1 motp1 < motj < motp2 motp2 < motj < motmax motj > motmax

(3)

where motj is the magnitude of the motion vector for block j, motmin is the minimum important motion parameter (set to 0.0 deg/sec), motp and motp are peak motion importance parameters (both set to 10.0 deg/sec), and motmax is the threshold for maximum important motion (set to 20.0 deg/sec). High importance is therefore assigned to regions undergoing medium to high motion, while areas of low motion and areas undergoing very high motion (i.e. untrackable motion) are assigned low motion importance. The spatial and motion importance maps are combined by simple averaging for each 8  8 block in the image. Since we only require a value for every 16  16 pixel macroblock, the highest importance value in the four 8  8 blocks constituting the macroblock is used as the macroblock importance value. This factor is used to control the adaptive quantization in a similar manner as that used by the spatial masking process: 1

2

j + 2:0  impavg Nimpj = imp 2:0  imp + imp j

avg

(4)

where Nimpj is the normalized importance for block j, and impavg is the average value of impj for the previous frame. The nal value of MQUANT is calculated using the results of both the spatial masking and importance maps: MQUANTj = Nactj  Nimpj  Qj

(5)

where Qj is the reference quantization parameter calculated by the MPEG rate control procedures.

5. RESULTS

Figure 2 shows the results of the IM calculation for a frame of the football sequence. The original scene is quite complex, with several objects undergoing large amounts of motion. The 8  8 resolution spatial IM (Figure 2(b)) has correctly identi ed the players as the visually important parts of the scene. The motion IM (Figure 2(c)) has identi ed objects' motion and classi ed its importance based on its velocity. Note the extremely high motion in the upper body of the player at the right of the frame. The motion IM has classi ed this region as being untrackable by the eye, and it is therefore assigned low importance. The nal macroblock (16  16) resolution IM shown in Figure 2(d) was produced by averaging the spatial and motion IMs. It gives a good indication of the visually salient regions in the frame. The results of the IM calculation for a frame of the Table Tennis sequence are shown in Figure 3. Once again, the visually important regions have been accurately identi ed by our technique. We have tested our perceptual quantizer on a wide variety of sequences. The results have been promising, even for relatively complex scenes. The activity masking model provides better subjective quality than TM5's activity measure, since distortions are now transferred to areas which can tolerate large amounts of error without introducing strongly visible artifacts. This is demonstrated by a reduction in ringing near object edges. The incorporation of IMs in the adaptive quantization process provides further subjective improvement, since distortions are now reduced in visually important regions. This is particularly evident at low bit rates.

(a)

(b)

(c)

(d)

Figure 2. Importance Map produced using our perceptual classi cation algorithm for a frame of the Football sequence. (a) original image, (b) spatial Importance Map at 88 resolution, (c) motion Importance Map at 8  8 resolution, and (d) overall Importance Map at 16  16 resolution. Lighter

regions represent higher importance.

(a)

(b)

(c)

(d)

Figure 3. Importance Map produced using our perceptual classi cation algorithm for a frame of the Table Tennis sequence. (a) original image, (b) spatial Importance Map at 8  8 resolution, (c) motion Importance Map at 8  8 resolution, and (d) overall Importance Map at 16  16 resolution. Lighter regions represent higher importance.

(a)

(b)

(c)

Figure 4. Coding results for a frame of the Miss America sequence at 350 kbit/sec. (a) original frame, (b) frame coded using our perceptual quantizer, and (c) frame coded using TM5.

Although it is dicult to demonstrate video quality by showing still frames, we have included some coding results comparing our technique to the standard TM5 rate controller. Low bit rates have been used in order to accentuate the artifacts. Figure 4 shows the results for a frame of the Miss America sequence. Severe blocking can be seen in the visually important facial regions in the TM5 coded scene. However, the results for our coder show reduced distortion in the visually salient areas. For a frame from the Table Tennis scene (Figure 5), the TM5 coder has created ringing artifacts around the hand, arm, racquet, and the edge of the table. Blocking e ects can also be seen in the hand. These distortions are reduced using our technique, at the expense of increased degradation in the (lower importance) background.

(a)

(b)

Figure 5. Coding results for a frame of the Table Tennis sequence at 1.5 Mbit/sec. (a) frame coded using our perceptual quantizer, and (b) frame coded using TM5. The original frame is shown in Figure 3(a).

6. DISCUSSION

In this paper we have proposed a novel technique for controlling the quantization in an MPEG encoder. The method is based on the properties of the HVS, yet is computationally ecient. It has been designed to be exible so that future modi cations and application speci c requirements can be easily incorporated. The results shown in Section 5 indicate improved subjective quality of sequences compressed using this technique when compared to those compressed using TM5. Formalized subjective testing is required to quantify this improvement. Despite the promising results, many areas for improvement still remain. Additional factors which in uence human visual attention such as colour could be included in the IM calculation. In some applications a priori information of scene content may be known (e.g. video-conferencing), or user interaction may be possible. Our model can easily be extended to include such high level factors. Calculation of the motion IM could be improved by including factors relating to the stability and trackability of motion over time, and also by taking into account camera pan and zoom. Our methods for calculating and combining factors is currently quite simple, due to computational considerations and a lack of quantitative HVS data. However such data can be incorporated once available. We have demonstrated the utility of IMs for providing improved perceptual quality for compressed video sequences. However, there are many other applications that could make use of accurate knowledge of perceptually important regions of a scene. These include objective image quality assessment techniques, machine vision, image databases, and the MPEG-4 coding standard. Any application which requires focus on perceptually relevant areas of a scene while reducing resolution in peripheral regions could readily bene t from the use of IMs.

REFERENCES

1. L. Stelmach, W. Tam, and P. Hearty, \Static and dynamic spatial resolution in image coding: An investigation of eye movements," in Proceedings SPIE 1453, pp. 147{152, (San Jose), Feb 1992. 2. W. Osberger and A. Maeder, \Automatic identi cation of perceptually important regions in an image using a model of the human visual system." Submitted to 14th International Conference on Pattern Recognition, Aug 1998. 3. A. Maeder, J. Diederich, and E. Niebur, \Limiting human perception for image sequences," in Proceedings SPIE 2657, pp. 330{337, (San Jose), Feb 1996. 4. J. Foley and G. Boynton, \A new model of human luminance pattern vision mechanisms: analysis of the e ects of pattern orientation, spatial phase and temporal frequency," in Proceedings SPIE 2054, pp. 32{42, (San Jose), Feb 1994. 5. A. van Meeteren and J. Valeton, \E ects of pictorial noise interfering with visual detection," J. Opt. Soc. Am. A 5(3), pp. 438{444, 1988. 6. A. Watson, R. Borthwick, and M. Taylor, \Image quality and entropy masking," in Proceedings SPIE 3016, pp. 2{12, (San Jose), Feb 1997. 7. S. He, P. Cavanagh, and J. Intrilligator, \Attentional resolution and the locus of visual awareness," Nature 383, pp. 334{337, Sep 1996. 8. A. Duchowski and B. McCormick, \Pre-attentive considerations for gaze-contingent processing," in Proceedings SPIE 2411, pp. 128{139, (San Jose), Feb 1995. 9. A. Yarbus, Eye Movements and Vision, Plenum Press, New York NY, 1967. 10. H. Zabrodsky and S. Peleg, \Attentive transmission," Journal of Visual Communication and Image Representation 1, pp. 189{198, Nov. 1990. 11. E. Niebur and C. Koch, Computational architectures for attention. The Attentive Brain, MIT Press, 1997. 12. M. Eckert and G. Buchsbaum, The signi cance of eye movements and image acceleration for coding television image sequences, pp. 149{162. Digital Images and Human Vision, MIT Press, 1993. 13. J. Findlay, \The visual stimulus for saccadic eye movement in human observers," Perception 9, pp. 7{21, Sept. 1980. 14. J. Senders, \Distribution of attention in static and dynamic scenes," in Proceedings SPIE 3016, pp. 186{194, (San Jose), Feb 1997. 15. A. Gale, Human Response to visual stimuli, pp. 127{147. The Perception of Visual Information, Springer-Verlag, 1997.

16. G. Elias, G. Sherwin, and J. Wise, \Eye movements while viewing NTSC format television." SMPTE Psychophysics Subcommittee white paper, Mar 1984. 17. E. Niebur and C. Koch, Control of selective visual attention: Modelling the where pathway, pp. 802{808. Advances in Neural Information Processing Systems, MIT Press, 1996. 18. J. Tsotsos, An inhibitory beam for attentional selection, pp. 313{331. Spatial Vision in Humans and Robots, Cambridge University Press, 1993. 19. X. Marichal, T. Delmot, V. De Vleeschouwer, and B. Macq, \Automatic detection of interest areas of an image or a sequence of images," in ICIP, pp. 371{374, (Lausanne, Switzerland), Sep 1996. 20. J. Zhao, Y. Shimazu, K. Ohta, R. Hayasaka, and Y. Matsushita, \An outstandingness oriented image segmentation and its application," in ISSPA, pp. 45{48, (Gold Coast, Australia), Aug 1996. 21. R. Rao, G. Zelinsky, M. Hayhoe, and D. Ballard, \Eye movements and visual cognition: A computational study." Tech. Report 97.1, University of Rochester, 1997. 22. S. Eckart and C. Fogg, \MPEG software simulation group, MPEG-2 encoder/decoder." Available from: ftp://ftp.mpeg.org/pub/mpeg/mssg/. 23. J. Mitchell, W. Pennebaker, C. Fogg, and D. LeGall, MPEG Video Compression Standard, Chapman and Hall, New York, 1997. 24. \Test model 5." ISO/IEC JTC1/SC29/WG11/No 400, MPEG93/457, Apr 1993. 25. W. Osberger, S. Hammond, and N. Bergmann, \An MPEG encoder incorporating perceptually based quantisation," in Proceedings IEEE TENCON, pp. 731{734, (Brisbane, Australia), Dec 1997. 26. H. Gong and H. Hang, Scene analysis for DCT image coding, pp. 425{434. Signal Processing of HDTV V, Elsevier Science, 1994. 27. S. Westen, R. Lagendijk, and J. Biemond, \Spatio-temporal model of human vision for digital video compression," in Proceedings SPIE 3016, pp. 260{267, (San Jose), Feb 1997.

Suggest Documents