thee flew vi'ctoi tililt is ill thee' directioti of the spatial image gradient can he estilelate(l froici v(i.j, t) = Tius flow vector formulation is sufficient for esti nating t he ...
Charting Image Artifacts in Digital Image Sequences Using Velocital Information Content Gregory J. Powera, Mohammad A. Karim, and Farid Ahme& aAjr Force Research Laboratory, AFRL/SNAT/Target Recognition Branch, 2010 5th Street, Wright-Patterson AFB, Ohio 45433-7001 bUniversity of Dayton, Department of Electrical and Computer Engineering, 300 College Park, Dayton, Ohio 45469-0226 cElectrical Engineering, School of Engineering and Engineering Technology, Pennsylvania State University, The Behrend College, Erie, PA, 16563-1701
ABSTRACT This paper introduces a metric called Velocital Information Content (VIC) which is used to chart quality variations in digital image sequences. Both spatially-based and temporally-based artifacts are charted using this single metric. VIC is based on the velocital information in each image. A mathematical formulation for VIC is shown along with its relation to the spatial and temporal information content. Some strengths and weaknesses of the VIC formulation are discussed. VIC is tested on some standard image sequences with various spatio-temporal attributes. VIC is also tested on a standard image sequence with various degrees of blurring using a linear blurring algorithm. Additionally, VIC is tested using standard sequences that have been processed through a digital transmission algorithm. The transmission algorithm is based on the discrete cosine transform (DCT), and thus introduces many of the known digital artifacts such as blocking. Finally, the ability of VIC to chart image artifacts is compared to a few other traditional quality metrics. VIC offers a different role from traditional transmission-based quality metrics which
require two images: the original input image and degraded output image to calculate the quality metric. VIC can detect artifacts from a single image sequence by charting variations from the norm. Therefore, VIC offers a metric for judging the quality of the image frames prior to transmission, without a transmission system or without any knowledge of the higher quality image input. The differences between VIC and transmission-oriented quality metrics, can provide a different role for VIC in analysis and image sequence processing.
Keywords: Image Quality, Spatial-Temporal Analysis, Digital Transmission, Image Sequence, Quality Metric, Velocital
1. INTRODUCTION Digital is currently marketed to the consumer as higher quality. However, digital image sequences still incur image quality degradation from analog sources and digital sources.1 The analog sources are predominantly from the environment such as atmospherics and from early analog stages of the digital camera such as lens focus. The digital camera and digital transmission systems have eliminated many of the quality degradation problems associated with old analog systems but at the same time, digital image sequences have introduced new image quality problems due to quantization, compression, and transmission.1 The impact of image artifacts can be charted in a sequence using a metric called velocital information content (VIC). The metric is introduced here along with examples of using it on imagery suffering from blur, quantization, compression artifacts, and spurious noise. Other author information: (Send correspondence to G.J.Power) G .J .Power: E-mail: powergj©sensors.wpafb.af.mil M .A.Karim: E-mail: mkarim©engr.udayton.edu F.Ahmed: E-mail: fxa9©psu.edu
Part of the SPIE Conference on Applications of Digital Image Processing XXI . San Diego. California • July 1998 642
SPIE Vol. 3460 • 0277-786X/98/$1O.OO
2. VELOCITAL INFORMATION CONTENT (VIC) 2.1. Introduction to Velocital Metric Research done by the Institute for Telecommunication Sciences (ITS) has shown that spatial information content and temporal information content can be used in formulations to determine image quality for transmission systems given input original and output degraded images. Further, ITS showed that formulations can be made that correlate to subjective quality assessment.2 This paper offers a new velocital information metric that embodies both the spatial and temporal information features in a single metric in order to determine quality based on output images only. This new metric, VIC, is derived by using a sequence of ratios where the numerator is based on the temporal information and the denominator is based on the spatial information. Obtaining the velocital information feature, VI, by dividing the temporal information by the spatial information may be counter-intuitive since velocity itself is defined as a spatial distance divided by a change in time. The relation of this formulation for velocital information becomes clear if one considers the one dimensional discrete intensity image, I(x, t) where temporal information can where A and the spatial information can be defined over distance (x) as be defined over time (t) as represents a discrete change in I, x, and t. Taking the ratio of the temporal information to the spatial information, the velocital information for a particular pixel is written as
LI(x,t)I
Lt _ Ax AI(x, t)l VII x, t _— LI(x)It At
AI(x,t)It
which reduces to reveal the velocity units due to
since the ratio
is unitless.
2.2. Formulas for Spatial and Temporal Information Features Spatial information and temporal information features for digital image sequences have been obtained by determining a spatial standard deviation and a temporal standard deviation. This paper uses the formulations defined in an ANSI standard.3 The spatial content is obtained by first acquiring spatial edge information using Sobel operations4 on each image frame. The spatial edges are expressed mathematically for the horizontal mask as
SIh(i,j,tfl) = where i and j
represent
a particular pixel position in an image and t represents a particular image frame in
time. For the vertical mask the Sobel operation is expressed as
SI(i,j,t) = with the magnitude of the spatial information defined as
SIr(j,j, t) = sI(i,i, t) + SI(i,j, ta). For the total pixels, P, in an image, the standard deviation of each Sobel filtered image is calculated as
S'stdev (ta) =
sI (i, , tn)] SIean (ta),
where
Slmean(tn) = SIr(,j,tn). 643
Slstdev (ta) results in a time series of standard deviations which can be plotted on the spatial axis of the spatiotemporal plot. A similar approach as well as a rationale for using the Sobel operator is documented by Lee.5 To obtain the temporal content, a frame difference is computed for frame Y defined as
TI(i,j,t) = /Y(t) = Y(t) — Y(t_1). The temporal information feature, Tlstdev{tn], 5 then obtained using the standard deviation of each Y(t) calculated as
Tlstdev[tn] =
[
TI2(iitfl)] TIean(tn),
where
Tlmean(tn) = >TI(i,j,tn). The temporal information feature is plotted on the temporal axis of the spatio-temporal plot. The values for this feature increase with increased motion, panning, zooming and scene cuts. Figure 1 shows a spatio-temporal plot based on the first few frames of the Miss America and Table Tennis standard image sequences. The Miss America sequence has lower spatial and temporal values than the table tennis sequence. Evidence that these values differ is shown in Figure 2 which shows more spatial and temporal activity for the table tennis sequence. The table tennis sequence has more spatial content due to textures in the scene, and it has more temporal content due to the ping pong motion and slow pan.
35 30
!::
I: Miss
5
America
00
Figure
1.
2!0
io
40 Spatial Information
80
100
The Spatio-Temporal trajectories for the first ten frames of the Miss America and Table Tennis standard
sequences.
2.3. Formula for Velocital Information Feature A mathematical formulation of VIC is based on velocital information, VI[i, j, ta], which will be shown to be the magnitude of the optical flow map. Numerous techniques exist for obtaining optical flow6 but gradient-based techniques fit well with the spatial and temporal information features defined above. For gradient-based techniques, optical flow is obtained by assuming that an image intensity, I(i, j, remains constant as it moves from frame to frame such
that i and j vary by t. This yields
t)
dl dt 644
winch after applying the chain rule for differentiation, yields the optical flow (qwttioll 31
di
e+
01
dj
V1 3! +=
LLt ilig Al with velocity vector coliiponelits v aiid 15 directioti of the spatial image gradient can he estilelate(l froici
v(i.j, t)
[--
thee
flew vi'ctoi tililt is ill thee'
=
Tius flow vector formulation is sufficient for esti nating t he iieagnitude of the optical flow map. In t cr1115 of sp;Ltiil
and temporal information defined earlier, the velocital information can he defined aS
iie. t,,)
1 'I[i, j, t7j
SI7 (i. .1.
t)
and VIC can he defined as the sequence
1 Isldev [1,7] =
1
'J2(
,
— 'me7171 (ta),
WI iere
l'Jmean(tn) =
I'I(i,j, 17j
Weaknesses with tins formulation of \'I(' are the same as those that, impact tin- gradient—based optical flow techniques such as the assumption that the intensity of a pixel remains constant ai it moves therenighe time. Another weakness is thee ii ideterniiieat e case in thee calceilato e f r I '1 [ij. j, in]. iii practice, a sn call cOflSt;Lnlt is me I le I to
Si (i )' t) to avoid the iridetesnninate case.
Figure 2.
Thee
left two frames show original images. Thee middle fraieees show spatial coret cut. t herougle
Thee rightmost frances show temporal content via frame differencing.
eheel ileeages.
3. CHARTING ARTIFACTS WITH VELOCITAL INFORMATION CONTENT Artifacts can be the result of sudden or gradual changes in the scene. For a fixed camera system, the sudden and gradual changes are the result of changes such as spurious noise and atmospherics. For sequences with editing capabilities from highly variable camera systems, the changes can also be the result of changes such as frame cuts, panning, zooming and fading.
3.1. Charting Sudden Changes The table tennis sequence is 150 frames of high spatial and temporal content with gradual changes represented by panning and zooming and sudden changes represented by two frame cuts at frames 88 and 147. Figure 3 shows the gradual changes as smooth transitions from one area to another area in the spatio-temporal plot while sudden changes such as frame cuts appear as jumps out of the general flow of the scene. VIC is more sensitive to the sudden and less sensitive to the gradual change in this sequence as shown in Figure 4.
130 120 Spatial Information
170
Figure 3. Spatio-temporal information is plotted for the original and a compressed version of the table tennis sequence. Gradual changes are shown as smooth transitions from one area of the plot to another area. Sudden changes are jumps out of the expected area.
3.2. Charting Gradual Changes The gradual changes in VIC are noticeable by observing the change in mean VIC values. The results of using VIC to chart artifacts due to blurred and compressed sequences is demonstrated. For blurring, the tests are limited to linear blurring. For compression, the tests are limited to one representative lossy compression technique, H.261.
3.2.1. Charting Blurred Sequence Figure 5 shows the results of linear blurring on the Miss America sequence. The first 10 frames of the Miss America sequence are blurred and the associated VIC is plotted in Figure 6 showing a definite frame by frame consistent change based on blurring. Figure 7 shows the mean VIC demonstrating the change as the blur gradually changes.
3.2.2. Charting Transmission Sequence To obtain reasonable reception and display of image sequences, digital transmission systems use iossy compression techniques. Techniques such as quantization and the discrete cosine transform are used within the compression and transmission algorithms causing degraded images. The artifacts resulting from the compression techniques are well
646
C
0 ('3
E
0 C
"3 C-)
0
>a)
5o
50
frame number
Figure 4.
Velocital information is plotted for the original and a conipressed version of tlit table tenrus seqlIeli(e.
flie sudden changes are more obvious than the gradual changes.
Figure 5.
A
frame from the Miss America sequence is shown uiihlurrel on the left arid with a I I) pixel linear blur
on the right.
('47
0.5
0.45
C
0 a E 0.4 0 (0 C) 0.35
0
>a 0.3
0.25
0.2
EEE 2
3
4
6 5 frame number
7
8
9
10
Figure 6. VIC result for linear blurring of first 10 frames of the Miss America sequence. The highest VIC plot is the original unbiurred sequence.
6 5 4 Linear Pixel Blur
10
Figure 7. Mean VIC result for linear blurring on Miss America sequence.
648
d()cunlented.' 11.261 is a representative lossv conipreSsion technique that uSes quiantiziltiofl i,uid the discrete (.O51iI( transform. Figure 8 shows one frame of the table tennis sequence before and after the 11.261 lossy coinpressioui with a channel the result,
rate of 400 kb/s. The H.261 coding was modihed to allow for constant bit rate encoding. Figure 9 shows of the niean VIC value for various channel rates on 39 seqiieiitial frames of the table tennis sequence.
lossy coiiipressioii . The right was produced after compression at a channel rate of 400 kb/s Artifacts such as blocking atid mosquito uloise image are evident.
Figure 8. This image is a frame from the table tennis sequence before and alter 11.261
C
0 'a E
0 C
0 0
>a) C
a)
700 Channel Rate (kb/a)
1000
Figure 9. i\ Team i \'l C result for lossv ( oiii ressioui (01 tal )le tel in is se ien (•(
4. COMPARISON TO TRADITIONAL QUALITY METRICS For a transmission system, the Institute for Telecommunication Sciences (ITS) derived a quality metric, , which was formulated from the spatial and temporal content in the original and t lie degraded iniages.tm ITS showed correlation with subjective assessment. Unlike the ITS metric, VIC does noI need an original high (piality ilipuit image to uhart the quality variations in the image sequence. The . metric yields one value for a sequence clip whereby \1C yields a value for each frame. Therefore, it is difficult to compare the tvo nietrics with the exception of siniplv stating that they hot hi use information about the spatial and temporal gradient (lata. h49
Although Peak Signal-to-Noise Ratio, PSNR, has been shown not to be a quality metric that correlates well with the human visual system,8 it is one of the few available that can show a frame-by-frame value. Another variable that has been used in image sequence frame analysis is a spatial information feature.5 To compare metrics using only a single image sequence, the PSNR can be calculated using a sequential frame difference as the error. Therefore, as a comparative example, VIC is compared to PSNR as well as the temporal and spatial information features for the table tennis sequence. The results plotted in Figure 10 show that the PSNR and spatial metric react to the gradual changes but it is difficult to pick out the sudden changes due to all the variations. For this example, the temporal metric picks out the sudden changes but still varies with the gradual changes. VIC reacts best for the sudden changes.
I i o:F 0
50
100
150
200
ct0101
cr
I
41
20 Q_
0
I
I
50
100
150
frame number
Figure 10. VIC, spatial information, temporal information, and PSNR are charted for the table tennis sequence. Gradual changes occur in the scene due to zooming out with scene cuts at frames 88 and 147.
5. CONCLUSION A formulation for calculating velocital information features has been introduced based on gradients. A metric called VIC assigns a single qualitative value to each frame in an image sequence that is a result of temporal and spatial variations in image information content. Sudden changes that are evidenced in image artifacts such as spurious noise, frame cuts, frame repeats, and various compression anomalies are picked out by the VIC metric by observing the sudden jump in the VIC metric. Gradual changes like slow zoom, slow pan, and slow focus (blurring) are evident by gradual changes in the VIC metric mean. VIC shows promise for a variety of uses including applications that might require smart editing to understand where scene cuts, pans and zooms might occur. VIC also shows promise as a comparative quality metric that can improve applications such as object recognition1 that need to filter out frames with sudden unwanted changes.
REFERENCES 1. G. J. Power and M. A. Karim, "Determining a confidence factor for automatic target recognition based on image sequence quality," in Algorithms for Synthetic Aperture Radar Imagery V, E. G. Zelnio et al., eds., Proc. SPIE 3370, April 1998. 2. S. Wolf, "Features for automated quality assessment of digitally transmitted video," Tech. Rep. 90-264, US Department of Commerce, National Telecommunications and Information Administration, June 1990.
650
3. "Digital transport of one-way video signals - parameters for objective performance assessment." ANSI T1.801.031996, February 1996. 4. R. Gonzalez and P. Wintz, Digital Image Processing, Addison-Wesley Publishing Co., Reading, Massachusetts, 2 ed., 1987. 5. D. J. Lee, "Objective quality metrics: Applications for partially compensated images of space objects," Master's thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio, 45433-7765, 1993. 6. A. M. Tekalp, Digital Video Processing, Prentice-Hall, Inc., Upper Saddle River, NJ, 1995. 7. I. Dalgic and F. A. Tobagi, "Constant quality video encoding," in Proc. IEEE ICC 95, (Seattle, WA), June 1995. 8. 5. Daly, "Visible differences predictor: An algorithm for the assessment of image fidelity," in Human Vision, Visual Processing and Digital Display, Proc. SPIE 1666, pp. 2—14, 1992.
651