approximate the original continuous signal. Although some results on the reconstruction of band-limited functions from their irregularly-spaced samples are ...
c 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
To appear in Proc. International Conference on Image Processing, ICIP-2000, Sep. 10-13, 2000 Vancouver, Canada
POCS-BASED IMAGE RECONSTRUCTION FROM IRREGULARLY-SPACED SAMPLES Ryszard Stasi´nski
Janusz Konrad
Høgskolen i Narvik Lodve Langes gt. 2 N-8501 Narvik, Norway
INRS-T´el´ecommunications Place Bonaventure, P.O. Box 644 Montr´eal, Qu´ebec, Canada, H5A 1C6
ABSTRACT This paper presents a method for the reconstruction of a regularlysampled image from its irregularly-spaced samples. Such reconstruction is often needed in image processing and coding, for example when using motion compensation. The proposed approach is based on the theory of projections onto convex sets. Two projection operators are used: bandwidth limitation and sample substitution. The approach is similar to some methods presented in the literature in the past, but differs in the implementation. The bandwidth limitation is implemented in the frequency domain on an oversampled grid thus allowing substantial flexibility in spectrum shaping of the reconstructed image. Additionally, a fast Fourier transform algorithm specifically designed for irregularly-sampled images is used to reduce the computational complexity. A number of experimental results on natural images are presented. 1. INTRODUCTION In image processing and coding, interpolation of intensity and/or color from a set of known samples is a common task. The grids of unknown and known samples can each be defined as a regular (periodic) or irregular sampling structure (grid). There are 4 scenarios for sampling grid combinations of unknown/known samples: 1. regular/regular - simplest case where interpolation filters are space-invariant; for example, interpolating filters in typical image up-conversion, 2. irregular/regular - more difficult case where interpolation filters are space-variant since samples to be recovered have varying positions with respect to the regularly-spaced known samples (the analog interpolating kernel has finite support, and the coefficients of each space-variant digital filter are samples from this kernel; the number of known samples within digital filter mask is fixed regardless of unknown sample location); for example, bilinear or bicubic interpolation [1] in backward motion compensation in video coding, 3. regular/irregular - still more difficult case where interpolation filters are space-variant but may not be easily specified in general case; for example, forward motion-compensation in advanced video coding/interpolation, 4. irregular/irregular - the most general case for which applications have not clearly emerged yet.
processing and coding, the third case has been explored to a lesser degree. The primary reason for this are difficulties associated with the extension of Shannon’s sampling theory to signals defined over irregular sampling grids. In such cases, Shannon’s theory is not applicable and alternative methods must be found to reconstruct or approximate the original continuous signal. Although some results on the reconstruction of band-limited functions from their irregularly-spaced samples are available (e.g., [2]), their usefulness in the case of motion-compensated video coding or interpolation is quite limited; theoretically derived constraints on the maximum spacing of irregular samples under the perfect reconstruction condition cannot be satisfied by arbitrarilydistributed image samples after motion compensation. By relaxing the perfect reconstruction condition, other methods have been proposed such as the polynomial interpolation or iterative reconstruction [3]. In this paper, we propose a method for the reconstruction of images from irregularly-spaced samples that is based on the theory of projections onto convex sets (POCS) [4]. This theory has been successfully used for various image processing/coding applications in the past (e.g., in [5]). The underlying principle of our iterative algorithm is identical to that proposed by Sauer and Allebach [6] and quite similar to that recently developed by Sharaf and Marvasti [3], however its implementation is quite different. We employ frequency-domain processing to reduce computational complexity and oversampled intermediate grids for more flexible spectral shaping of the underlying image. 2. PROPOSED APPROACH
f
2 g
Let g = g (x); x = (x; y )T R2 be a continuous 2-D projection of the 3-D world onto an image plane and let g = g (x); x be a discrete image obtained from g by sampling over a lattice [7]. Let’s assume that g is band-limited, i.e., G(f ) = g =0 for f
where is the Fourier transform, f = (f1 ; f2 )T is R2 is the spectral support of g. a frequency vector and
If the lattice satisfies the multi-dimensional Nyquist criterion [7], the Shannon sampling theory allows to perfectly reconstruct g from g . However, in the case of irregular sampling the theory is not applicable. Therefore, the general goal is to develop a method for the reconstruction of g from an irregular set of sam R2 ; i = 1; :::; K , where is an ples g = g (xi ); xi irregular sampling grid.
f
g
62
F
f
2
Ff g
2
g
Whereas the first two cases have been extensively treated in the literature and have found numerous practical applications in image
2.1. POCS-based reconstruction algorithm
This work was supported by the Natural Sciences and Engineering Research Council of Canada under Strategic Grant STR224122.
We propose to use the POCS methodology [4] to reconstruct image g . This methodology involves a set theoretic formulation, i.e.,
finding a solution as an intersection of property sets rather than by a minimization of a cost function. Each property set is constructed in such a way that it contains all solutions to the problem obeying one particular property. Therefore, an intersection of all property sets, if it exists, obeys all of the properties selected. POCS methods are iterative and typically project partial solutions onto consecutive property sets. We use similar sets to those proposed in [6]:
A0 - set of all images g such that at x 2 , i = 1; :::; K i
(irregular sampling grid) g (xi ) = g (xi ),
A1 - set of all band-limited images g, i.e., such that G(f ) = 0 for f 62 .
Let’s assume that a membership in the set A0 can be assured by a sample replacement operator (to enforce proper image values on ), and a membership in the set A1 – by suitable bandwidth and are prolimitation (low-pass filtering) . Clearly, both jections. The iterative reconstruction algorithm then consists of interleaved application of and :
R B R B R +1 g = BRg = B[g + S (g ? g )]; k
k
k
B
k
(1)
S
where g k is the reconstructed image after k iterations and is a sampling operator that extracts image values (luminance/color) on the irregular grid . Note that equation (1), proposed in [6], results in an approximation rather than interpolation of g ; the last step is that of low-pass filtering. In order to implement equation (1) on a computer, a suitable discretization must to be applied. In [6], equation (1) was implemented as follows:
g+1 = B[g + I (g ? ge )]; k
k
k
=
(2)
B
where the lowpass filtering is implemented over and is a parameter that allows control of convergence and stability of the k algorithm. The symbol g denotes a bilinearly-interpolated image k g permitting the recovery of image samples on . Also, note that an interpolation function = replaces the sampling operator ; it computes image samples on based on those defined on . Note that = is the interpolation that we are seeking. Implemented iteratively by equation (2), = gradually improves k+1 . Sauer and Allebach have studied the final reconstruction g three interpolators = : one derived from bilinear interpolation and two based on triangulation with planar facets [6]. The implementation (2) of the reconstruction algorithm (1) suffers from two deficiencies. First, by processing all images on there is little flexibility in shaping the spectrum of gk ; any practical lowpass filtering on must suppress high frequencies since a slow roll-off transition band must be used to minimize ringing on sharp luminance/color transitions. Secondly, the interpolation operator = , especially the one based on triangulation (better performance), is involved computationally. We propose an alternative implementation of the algorithm (1). Since our goal is the reconstruction of image samples obtained from motion or disparity compensation, a 1/2-, 1/4- or 1/8pixel precision of motion or disparity vectors is usually sufficient. Therefore, we propose to implement (1) on an oversampled grid matching this precision:
e
I
S
I
! I
I
I
g+1 (3) P = B[gP + SP (gP ? gP )]: where B is implemented on , that is a P P -times denser (oversampled) lattice than , and P equals 2, 4, or 8 depending on mok
k
=
=
k
P
tion/disparity vector precision. Clearly, is a sub-grid of P , i.e.,
2
) 2
x P . In (3), P = denotes irregular sampling positions from moved to their nearest neighbors on P . Then, gP = is the nearest-neighbor interpolation of g on P , defined at each xi as follows: x
2
(
k ? k kx ? zk8y; z 2
gP (y) = 0g (x )
if xi y otherwise.
i
=
i
S 2
P
,
Similarly, P = denotes the nearest-neighbor sampling, i.e., sampling on y P that is nearest to xi . In other words, the implementation (3) is performed on a denser lattice P and the positions of the irregular samples from are quantized to the nearest position on P . This allows us to avoid the cumbersome interpolation = in (2) under the assumption that a suitable value of P is selected. We will demonstrate the impact of P in Section 3. The implementation of equation (3) would, in general, require more memory and be less efficient than that of equation (2), however we opt for an implementation in the frequency domain in order to reduce the computational complexity. Let
2
I
e = gP ? gP k
k
=
be the reconstruction error defined on P . Then, each iteration of the reconstruction algorithm (3) consists of 3 steps: 1. Fourier transform of the error
E (f ) = Ff k
X
e
k
sampled on P = :
e (x )(x ? x )g; k
i
i
xi
2 = ;
(4)
2
(5)
P
i
where is the Kronecker delta. 2. Update and bandwidth limitation:
G
k
+1 (f )
(
= G (f ) + E (f ) 0 k
k
for f , otherwise,
with G0 = 0. Note that the smaller the , the fewer the samples of E k (f ) that need to be computed. This allows significant reduction of memory and computational requirements by means of a pruned FFT [8] (see Section 2.2). A simple bandwidth limitation by zeroing parts of the spectrum leads in the spatial domain to oscillations at sharp luminance/color transitions. In Section 2.3 we discuss the design of a lowpass filter that minimizes these effects.
3. Computation of g k+1 on P = :
g
k
+1 (x
i
) = F ?1 fG
k
+1 (f )gj
x=xi 2P = :
(6)
This operation can be very efficiently implemented by computing the pruned inverse Fourier transform [8]. 2.2. Memory management The main practical problem for methods operating on oversampled images is the amount of needed memory, e.g., for P = 8, 64 times more intermediate memory is needed than for the final reconstructed image. On the other hand, the sampling theorem implies that the number of non-zero Fourier-domain samples for an oversampled signal does not depend on the oversampling coefficient P ; only 1=P 2 DFT coefficients of the oversampled signal lay in the spectral support (2-D bandwidth) of the non-oversampled signal. Since in practice the operator is a lowpass filter with a non-zero
B
transition band, the number of non-zero DFT samples after filtering is 1=P 2 m2 , where m is the filter bandwidth excess (Section 2.3). For practical values of m the Fourier-domain operations require much less memory than the space-domain ones. Note that the zero DFT coefficients need not be computed, and can be ignored at the input of the inverse FFT (step 3). In [8] it was shown that a computation of no more than K consecutive coefficients of an N -point DFT, where K is a divisor of N , requires only O(logK ) operations per sample of the N -point data vector, in comparison with O(logN ) operations needed for the complete transform. What is even more important, during computations the input data are split into independently-processed segments of size K that are simply added up in the final step of the algorithm. This means that if the data are not yet in the computer memory, only two vectors of size K can be used: one for processing consecutivelyloaded data blocks, and the other to accumulate results. Similarly, the inverse DFT algorithm consists of independent calculation of consecutive data blocks of size K from K frequency-domain samples. This can be exploited in the 2-D case as follows. Let’s assume that an M N -point vector can hold all samples of the (finite) lattice ; the oversampled image is of size (P M ) (P N ). The vector sizes should be chosen in such a way that there exists an efficient L K -point DFT algorithm, where L is a divisor of P M such that L m M to accommodate all nonzero DFT samples after filtering, and K is a divisor of P N , K m N . The segmentation of the oversampled image is performed as follows:
g [m; n] = g[i + m (P M=L); j + n (P N=K )]; i = 0; 1; : : : ; P M=L ? 1; j = 0; 1; : : : ; P N=K ? 1; m = 0; 1; : : : ; L ? 1; n = 0; 1; : : : ; K ? 1; where i; j are the segment indices [8]. i;j
As in the 1-D case, the algorithm requires only two data vectors of the segment size, i.e., of size K L. Moreover, accumulation of results can be combined with data update in (5), hence, we need not a separate vector for keeping frequency-domain data. The k space-domain oversampled image g P need not be formed at all. Processing of a data segment starts with filling it with zeros, then at positions on P = we put P 2 ek (xi ). The factor P 2 is due to the multiplication of error terms by Kronecker deltas in (4). In the inverse DFT, we collect the currently-generated segment samples on P = and discard the rest. Then, apart from two vectors of size L K we need three space-domain vectors: with addresses of irregularly-positioned samples ( P = ), with samples of the original image g , and with space-domain results g k+1 on P = . Usually, the space-domain vectors are even smaller than M N . For comparison, in space-domain methods each oversampled image requires nearly P 2 M N data words. In the last iteration, the reconstructed image is obtained by collecting regularly-positioned samples at the output of the inverse DFT. For one sample of the oversampled image the algorithm requires O(log (L K )) operations, which is comparable to the computational cost of direct filtering by a short-kernel separable FIR filter. Moreover, for processing L K -point data segments, we can use specialized DFT hardware [8].
2.3. Filter design The equation (5) implements low-pass filtering by zeroing DFT samples outside , which is equivalent to brick-wall filtering in 1D case. This results in, often objectionable, ringing at sharp luminance/color transitions in the filtered image. Moreover, the effect
is cumulative due to the iterative nature of the algorithm. Therefore, we need to design a filter with as narrow transition band as possible, quickly-decaying impulse response and few parameters to adjust. A narrow transition band is obviously needed in order to minimize detail loss in the image without introducing excessive aliasing, but also in order to stay as close as possible to the idempotent nature of (i.e., = ); must be idempotent to be a projection. We verify in Section 3 that a controlled departure from this property does not overly affect the convergence of the algorithm. A quick-decay impulse response is needed for perceptual reasons (ringing). For simplicity, we will use separable 2-D filters with the 1D prototype obtained by convolving rectangular window of width N=(2P ), where N is the DFT size, with a Gaussian window defined as follows:
B
w (n) = e?(2 G
n=N
BB B B
G )2 ; n = ?NG + 1; :::; ?1; 0; 1; :::; NG ? 1;
under suitable normalization. Although this is a finite-support Gaussian window, the amplitude at its edges is less than 3% of wG (0) for NG > 8. Clearly, the sole design parameter to be determined is the ratio of window widths: = NG =(N=2P ). This means that the filter transient band extends between (1 )(N=2P ) and (1+ )(N=2P ). The term m = 1+ tells us how much filter bandwidth exceeds the Nyquist limit N=2P , hence, we shall call it filter bandwidth excess.
?
3. EXPERIMENTAL RESULTS The proposed reconstruction algorithm has been tested experimentally on several synthetic and natural images with various irregular sampling grids . First, we have tested the convergence of the algorithm on two images: zoneplate (synthetic circular pattern with radially varying frequency) and bouquet (natural image). The images were processed twice by using synthetic displacement fields to induce irregular sampling: forward rotation by =8 followed by reverse rotation with the same angle, and 1.5 zoom-in followed by 1.5 zoom-out. The PSNR was computed between the original and the reconstructed images for pixels that remained “visible” throughout the process. For =1.0, P =4 and m=1.25, we obtained PSNR values between 36 and 41dB; the reconstructed images were virtually indistinguishable from the originals. Then, we have tested the algorithm in a more realistic scenario where the irregular grid was generated by disparity compensation. Subsequently, the luminance and color of g were computed from the ITU-R 601-resolution image flower using bicubic interpolation [1]. We have tested the algorithm for P =4, 8 and 16 and a variety of ’s. We have established experimentally that m=1.22 gave a good compromise between detail loss and aliasing. Fig. 1(a) shows the evolution of PSNR for =0.2. Clearly, the algorithm is stable and rapidly improves PSNR in the first few iterations but begins to saturate above 10 iterations. The PSNR gain between P =4 and P =8 is non-negligible (about 1dB), while it is much smaller between P =8 and P =16. Since the stability of the algorithm depends on the coefficient , we have experimentally established the highest stable for each P : 0.4, 0.3 and 0.2, for P =4, 8 and 16, respectively. Fig. 1(b) shows the PSNR evolution for each P and its optimized . Note that higher P ’s require smaller values of to assure stability, but this impedes algorithm’s rate of convergence; P =8 with =0.3 gave the best performance, although our quantization of the ’s was quite coarse.
32 31 30 29
PSNR [dB]
28 27 26 25 P=4 P=8 P=16
24 23 22 21
2
4
6
8 Numbers of iterations
10
12
14
(a) Reconstruction error after 1 iteration (20.7dB).
(a) Impact of P for =0.2. 32 31 30 29
PSNR [dB]
28 27 26 25 P=4 P=8 P=16
24 23 22 21
2
4
6
8 Numbers of iterations
10
12
(b) Reconstruction error after 12 iterations (31.5dB).
14
(b) Best performance obtained for each P .
Fig. 2. Example of reconstruction error magnified by 8 for P =4.
Fig. 1. Evolution of PSNR of the reconstructed flower image. Visually, the reconstructed images are practically indiscernible from the originals. Fig. 2 shows the reconstruction error (magnified by 8) for P =4 after 1 and after 12 iterations. Note the dramatic reduction of the error; after 12 iterations only errors at sharp luminance transitions remain and those are largely invisible due to the masking effect of the human visual system. After about 15 iterations, the gains are much smaller as the algorithm saturates. The lowpass filtering causes some minor ringing in smooth image areas next to high-contrast details. We have observed that for m close to 1.2 the extent of ringing (in 1-D) is about two times smaller than that for a brick-wall filter, i.e., for m=1.
[3]
[4] [5]
[6]
4. REFERENCES [1] R.G. Keys, “Cubic convolution interpolation for digital image processing,” IEEE Trans. Acoust. Speech Signal Process., vol. 29, no. 6, pp. 1153–1160, Dec. 1981. [2] H. Feichtinger and K. Gr¨ochenig, “Theory and practice of irregular sampling,” in Wavelet: Mathematics and Applications,
[7]
[8]
J. Benedetto and M. Frazier, Eds., chapter 8, pp. 305–363. CRC Press, Boca Raton FL, 1994. A. Sharaf and F. Marvasti, “Motion compensation using spatial transformations with forward mapping,” Signal Process., Image Commun., vol. 14, pp. 209–227, 1999. P. Combettes, “The foundations of set theoretic estimation,” Proc. IEEE, vol. 81, no. 2, pp. 182–208, Feb. 1993. H. Chen, M. Civanlar, and B. Haskell, “A block transform coder for arbitrarily-shaped image segments,” in Proc. IEEE Int. Conf. Image Processing, Nov. 1994, pp. 85–89. K.D. Sauer and J.P. Allebach, “Iterative reconstruction of band-limited images from nonuniformly spaced samples,” IEEE Trans. Circuits Syst., vol. 34, no. 12, pp. 1497–1506, Dec. 1987. E. Dubois, “The sampling and reconstruction of time-varying imagery with application in video systems,” Proc. IEEE, vol. 73, no. 4, pp. 502–522, Apr. 1985. R. Stasi´nski, “FFT pruning - a new approach,” in Signal Process. III: Theories and Applications (Proc. Third European Signal Process. Conf.), 1986, pp. 267–270.