IMAGE PREDICTION BASED ON NON-NEGATIVE MATRIX FACTORIZATION Mehmet T¨urkan, Christine Guillemot INRIA/IRISA Campus Universitaire de Beaulieu, 35042 Rennes, France
[email protected] ABSTRACT This paper presents a novel spatial texture prediction method based on non-negative matrix factorization. As an extension of template matching, approximation based iterative texture prediction methods have recently been considered for image prediction. These approaches rely on the assumption that the given basis functions (atoms) span the signal residue space at each iteration of the algorithm. However, in the case of signal prediction with a support region approximation, the atoms may not approximate residue signals very well even though the dictionary has been well adapted in the spatial domain. The underlying main idea is to consider a factorization based algorithm in which the given atoms approximate the signal without going further into signal residue space. The proposed spatial prediction method has first been assessed against the prediction methods based on template matching and sparse approximations. It has then been assessed in a compression scheme where the prediction residue is transform encoded. Experimental results obtained show that the proposed method outperforms the template matching and sparse approximations based techniques in terms of encoding efficiency. Index Terms— Image compression, texture prediction, nonnegative matrix factorization, template matching, sparse approximations 1. INTRODUCTION In image and video compression algorithms, closed-loop intra prediction plays an important role in minimizing the encoded information. E.g., in H.264/AVC, there are two intra prediction types called Intra-16x16 and Intra-4x4 respectively [1]. The Intra-16x16 type supports four intra prediction modes while the Intra-4x4 type supports nine modes. Each 4x4 block is predicted from prior encoded pixels from spatially neighboring blocks. In addition to the so-called “DC” mode which consists in predicting the entire 4x4 block from the mean of neighboring pixels, eight directional prediction modes are specified. The prediction is done by simply propagating (or interpolating) the pixel values along the specified direction. This approach is suitable in the presence of contours, when the directional mode chosen corresponds to the orientation of the contour. However, it fails in more complex textured areas. An alternative spatial prediction algorithm based on template matching (TM) has been described in [2]. In this method, the block to be predicted of size 4x4 is further divided into four 2x2 subblocks. Template matching based prediction is performed for each sub-block accordingly. The best candidate sub-block of size 2x2 is determined by minimizing the sum of absolute distance (SAD) between the template and the candidate neighborhood. The four best match candidate sub-blocks constitute the prediction of the 4x4
978-1-4577-0539-7/11/$26.00 ©2011 IEEE
789
block to be predicted. This approach has later been improved in [3] by averaging the multiple TM predictors, including larger and directional templates, as a result of more than 15% coding efficiency in H.264/AVC. Any extensions and variations of this method are straightforward. In the experiments reported in this paper, 8x8 block size has been used without further dividing the block into sub-blocks. It has recently been shown by [4] that sparse signal approximation (e.g., matching pursuit (MP) [5], orthogonal matching pursuit (OMP) [6]) based prediction method outperforms the TM. The principle of the approach, as initially proposed in [7], is to first search for a linear combination of basis functions (atoms) which best approximates the known pixel values in a causal neighborhood (template or support region), and keep the same linear combination of atoms to approximate the unknown pixel values in the block to be predicted. Since a good representation of the support region does not necessarily lead to a good approximation of the block to be predicted, the sparsity level (or the corresponding iteration number) which minimizes a chosen criterion, needs to be transmitted to the decoder. The considered criteria are the mean square error (MSE) of the predicted signal and a rate-distortion cost function. The basic drawback of this approach is that the atoms may not span the signal residue space –of the block to be predicted– at each iteration as the correlation between the template and the unknown block gets weaker and weaker along with the iterations of the sparse approximations algorithm. As a result, the minimization of encoded residual information has not been optimized even if the signal prediction seems sufficient in terms the chosen criterion. In this paper, a novel spatial texture prediction method based on non-negative matrix factorization (NMF) [8] has been considered. Given a fixed non-negative dictionary (basis functions), the underlying main idea is to first obtain an NMF representation of the support region and keep the same representation parameters to approximate the unknown pixel values in the block to be predicted. By considering locally adaptive dictionaries as defined in [7], the non-negativity constraint is satisfied for the factorization since the values in spatial domain range between 0 and 255. The proposed spatial prediction method has first been assessed comparatively to the TM and sparse approximations based techniques in terms of the prediction quality. It has then been assessed in an image coding scheme in which the residue blocks are encoded with JPEG standard. The encoding PSNR/bit-rate performance shows significant gain (up to 3 dB) when compared with the TM and sparse approximations based prediction. 2. SPATIAL PREDICTION Let S denote a region in the image containing a block B of n × n pixels and its causal neighborhood C used as approximation sup-
ICASSP 2011
ˆ is then simply assigned by the pixel values The extrapolated signal b ˆ = ajopt . of the candidate ajopt as b 2.2. Sparse approximations Given A ∈ RN×M and b ∈ RN with N