A new image representation algorithm inspired by ... - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Jul 14, 2005 - Nikhil Balakrishnan, Karthik Hariharakrishnan, and Dan Schonfeld, Senior Member, IEEE. Abstract—We develop a new biologically motivated ...
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 27, NO. 9,

SEPTEMBER 2005

1367

A New Image Representation Algorithm Inspired by Image Submodality Models, Redundancy Reduction, and Learning in Biological Vision Nikhil Balakrishnan, Karthik Hariharakrishnan, and Dan Schonfeld, Senior Member, IEEE Abstract—We develop a new biologically motivated algorithm for representing natural images using successive projections into complementary subspaces. An image is first projected into an edge subspace spanned using an ICA basis adapted to natural images which captures the sharp features of an image like edges and curves. The residual image obtained after extraction of the sharp image features is approximated using a mixture of probabilistic principal component analyzers (MPPCA) model. The model is consistent with cellular, functional, information theoretic, and learning paradigms in visual pathway modeling. We demonstrate the efficiency of our model for representing different attributes of natural images like color and luminance. We compare the performance of our model in terms of quality of representation against commonly used basis, like the discrete cosine transform (DCT), independent component analysis (ICA), and principal components analysis (PCA), based on their entropies. Chrominance and luminance components of images are represented using codes having lower entropy than DCT, ICA, or PCA for similar visual quality. The model attains considerable simplification for learning from images by using a sparse independent code for representing edges and explicitly evaluating probabilities in the residual subspace. Index Terms—Computer vision, feature representation, statistical models, clustering algorithms, machine learning, color.

æ 1

INTRODUCTION

T

HE

human visual system has developed efficient coding strategies to represent natural images, given the fact that we are able to make sense of millions of bytes of data every day in a seemingly effortless manner [1]. Visual information is transduced by the rods and cones of the retina and carried to the brain by the optic nerve which are the axons of cells in the ganglionic layer. The fibers synapse in the lateral geniculate body of the thalamus before terminating in the primary visual cortex or V1 [2]. Sensory information is represented using the output of an array of neurons. The type of stimulus is inferred from the location and amplitude of neurons that show maximal activity. Encoding information using the activity of a large number of neurons is known as population coding [3]. Mathematical models of the visual system have been approached from various viewpoints. These can be broadly classified into three types—1) single cell models, where the attempt is to model responses of single cells in the visual

. N. Balakrishnan is with the Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, (M/C 063), Chicago, IL 60607. E-mail: [email protected]. . K. Hariharakrishnan is with Motorola India Electronics Limited, # 66/1, Plot No. 5, 6th Main, Hoysala Nagar, Bangalore 560093, Karnataka, India. E-mail: [email protected]. . D. Schonfeld is with the Department of Electrical and Computer Engineering, University of Illinois at Chicago, 851 S. Morgan Street, (M/C 154), Chicago, IL 60607. E-mail: [email protected]. Manuscript received 8 Jan. 2004; revised 30 Sept. 2004; accepted 22 Nov. 2004; published online 14 July 2005. Recommended for acceptance by E.R. Hancock. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-0021-0104. 0162-8828/05/$20.00 ß 2005 IEEE

cortex to various stimuli [4], 2) visual processing models— where the focus is on modeling higher order functions of the visual pathway such as detecting and representing contours, edges, surfaces, etc. [5], and 3) statistical models, which represent the visual pathway as effecting sensory transformations for achieving diverse objectives like redundancy reduction and learning [4], [6], [7], [8]. These models are often interlinked and complementary as outputs of simple cell models are used for representing higher order structure in a visual scene by visual processing models. It is increasingly clear that sense perception and learning are interlinked and learning involves evaluating probabilities of different hypothesis from sensory data [8]. In this sense, neural representations need not be seen as merely transformations of stimulus energies, but as approximate estimates of probable truths of hypotheses in the current environment [8]. In this paper, we develop a mathematical model for visual representation which is consistent with these diverse perspectives. The algorithm is formulated by the use of an independent component analysis (ICA) model for edge representation followed by a mixture of probabilistic principal components analyzers (MPPCA) model for surface representation. Section 2 briefly describes the salient mathematical features of the individual models described above. Section 2.1 discusses mathematical models of simple and complex cells, Section 2.2 discusses data streams in image representation, and Section 2.3 discusses neural representation in the framework of redundancy reduction. We describe the salient mathematical features of our algorithm in Section 3. In Section 4, we compare the performance of our model against representation obtained using an ICA basis, a PCA basis, and a discrete cosine transform (DCT) basis. The results of comparisons are presented in Section 5. We discuss our Published by the IEEE Computer Society

1368

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

model in light of current trends in neuroscience and learning theory in Section 6. Section 7 presents a summary and suggests directions for future work.

2

BACKGROUND

AND

NEURAL MOTIVATION

2.1 Structure and Organization of the Visual System V1 neurons are classified into simple and complex cells. A systematic representation of the visual space is formed on the cortical surface. Neurons that share the same orientation preference are grouped together into columns and successive columns rotate systematically in preferred angle [4]. Mathematically, the receptive fields of simple cells of the visual cortex are modeled as Gabor wavelets which are parameterized as n o fðx; yÞ ¼ exp ½ðx  xo Þ2 =2o þ ðy  yo Þ2 =o2  ð1Þ expf2i½uo ðx  xo Þ þ vo ðy  yo Þg; where (xo , yo ) specifies position in the image, (o , o ) specifies the filter’s effective width, and (uo , vo ) specifies the filter’s frequency. The real and imaginary parts of this complex filter function describe associated pairs of simple cells in quadrature phase [4]. The 2D Fourier transform F ðu; vÞ of a 2D Gabor wavelet is given by n o F ðu; vÞ ¼ exp ½ðu  uo Þ2 20 þ ðv  vo Þ2 02  ð2Þ expf2i½xo ðu  uo Þ þ yo ðv  vo Þg: The power spectrum of a 2D Gabor wavelet takes the form of a bivariate Gaussian centered at (uo ; vo ), having magnitude !o ¼  1=2 and phase o ¼ tan1 vo =u0 . Therefore, the peak uo 2 þ vo 2 response occurs for an orientation of o and an angular frequency of !o corresponding to the excitatory field [4]. Complex cells form the other major cell type in V1. Mathematically, they are modeled as summing the squared magnitudes of simple cell inputs, thus enabling phase invariance and limited translation invariance. These cells become important in higher order tasks, like pattern recognition, which take place in adjacent parts of the brain, like V2.

2.2 Data Streams in Visual Processing This section describes image representation from a functional perspective. Neurophysiology and psychophysics data suggest that visual object representation is organized in parallel interacting data streams which follow different computational strategies. Attributes of an image, such as color, edges, boundaries, luminance, etc., are extracted and processed separately. This is referred to as the boundary contour system/feature contour system (BCS/FCS) [5]. Boundary formation proceeds by linking oriented contrast measures (small edges) along smooth contours. BCS performs edge detection, edge completion, and sends outputs to FCS. FCS performs diffusion of uniform color or brightness in perceptually similar areas and inhibits diffusion of uniform luminance or color across boundaries [5]. The major input to FCS is from BCS. Thus, surfaces are formed from edges. Computational strategies for the generation of perceptual surface qualities generally follow one of three basic strategies [5]—1) filtering and rule-based symbolic interpretations [9], 2) spatial integration, inverse filtering, and labeling models which follow the operations sequence

VOL. 27,

NO. 9,

SEPTEMBER 2005

filtering=differentiation ! boundaries=thresholding ! integration [10], [11], and 3) filling-in models which follow the sequence filtering ! boundaries ! filling-in [12], [13]. Differentiation and subsequent thresholding operations are intended to detect salient changes in the luminance signal which serve to create region boundaries and, subsequently, trigger integration from local luminance ratios. The latter two approaches begin with local luminance ratios estimated along boundaries. Surface properties are computed by propagating local estimates into region interiors to generate a spatially contiguous representation. See Neumann and Mingolla [5] for a review of the topic. This concept also underlies color processing. Variations in wavelength of light are processed distinct from variations in intensity [14]. The continuous color space is partitioned into discrete color categories [14]. Color is transduced by differentially wavelength-sensitive cones in the retina. Cones are classified as L, M, and S (long, medium, and short) types based on their peak wavelength sensitivity. There are three distinct channels conveying color information from the retina to V1. The M-channel contributes to perception of luminance and motion, but does not convey wavelength coded signals. The P-channel conveys long and medium wavelength information and fine detail. The K-channel conveys information regarding color sensations. K and P channels innervate cytochrome oxidase stained areas called “blobs” in V1. P and M channels innervate, remaining regions, known as “interblobs” [14]. See Kentridge et al. [14] for a review of the topic.

2.3

Information Theory Perspective—The Redundancy Reduction Principle A close link exists between this observed anatomy and physiology of the visual pathway and data compression principles of information theory. Classical models in computational neuroscience assume efficient coding in the sensory system has been attained through Barlow’s principle of redundancy reduction [6], [7]. According to this principle, sensory transformations should reduce redundancies in the input sensory stream [1] with the aim of achieving a more parsimonious representation in the brain. Therefore, visual image processing should serve to transform the input visual data into statistically independent components so that there is no overlap in the information conveyed by different components. The Karhunen Loeve (KL) transform has been used extensively in the literature for decorrelating or “whitening” multivariate input data. The KL transform removes redundancies due to linear pairwise correlations among image pixels. However, natural images have an abundance of features like oriented lines, edges, curves, etc., which give rise to higher order statistical dependencies between pixels than just linear piecewise correlations [1]. ICA refers to a broad range of algorithms exploiting different objective criteria to effect a transformation of a vector into statistically independent components, a stronger form of independence between components than the linear independence resulting from PCA. An image patch x is represented as a linear superposition of columns of the mixing matrix A, the columns of A being such that the components of the vector s are statistically independent. x ¼ As:

ð3Þ

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY...

1369

Fig. 1. (a) Activity profile of a typical source in response to natural image patches and (b) corresponding histogram. The histogram reveals a peak at 0 and a sharp fall at higher activations, suggesting a super-Gaussian probability density function.

3

HYBRID ICA—MIXTURE

OF

PPCA ALGORITHM

3.1 ICA Model We need to estimate a matrix A such that s ¼ A1 x has components that are statistically independent. “Sparseness” is an additional condition imposed on s. A sparse activation pattern is modeled as a Laplacian density characterized by high kurtosis. Kurtosis is defined as the fourth order cumulant of a random variable and is 0 for a Gaussian random variable. Therefore, to determine A, the nonGaussianity of s can be used as a cost function. Kurtosis, which serves as a measure of the non-Gaussianity of a random variable, can be approximated using negentropy [15]. Negentropy of a random variable y is defined as the difference between the differential entropy of y assuming a Gaussian pdf and its differential entropy estimated using its actual density, for the same covariance [15].   JðyÞ ¼ H ygauss  HðyÞ: ð4Þ Since a Gaussian pdf has the greatest entropy for a given variance, negentropy is always positive and equal to zero if and only if y is Gaussian [15]. Negentropy of a random variable s is approximated as [15] JðsÞ 

 2 1 1 E s3 þ kurtðsÞ2 ; 12 48

ð5Þ

where “kurt” denotes the kurtosis of the random variable in the argument. We use the FastICA Matlab package [16] to estimate the independent components of natural images. This is a fast fixed-point algorithm that achieves a sparse, independent components transformation by iteratively maximizing an approximation to the negentropy. The estimated matrix A is projected into the manifold of orthogonal matrices at each step so that (3) is constrained to be an orthogonal transformation. The resulting filters (columns of A) resemble Gabor functions and wavelets and function as edge detectors which respond maximally in the presence of a corresponding edge in the visual field. The vector s is a population code of corresponding neuronal activations and is obtained by s ¼ AT x:

ð6Þ

The elements of s (referred to as sources) with large magnitudes signify the presence of corresponding “sharp

features” in an image patch—features such as edges, oriented curves, etc. The components of s with smaller magnitudes represent the summation of a number of weak edges to yield less interesting features in the image patch. The columns of A and the components of the vector s are determined up to a permutation. Independent subspace analysis (ISA) and topographic independent component analysis (TICA) are algorithms that allow limited dependencies between components of s to allow for local clustering of cells with similar receptive fields [17], [18]. In these models, there is local dependence between the components of a subset of neurons (vector s) and independence of the subset with respect to the rest of the vector. This allows for neurons with similar orientation and frequency selectivity to be grouped together. Fig. 1a shows the activation profile of a source in response to different image patches. Fig. 1b shows the corresponding histogram of source activity. The histogram gives a nonparametric approximation of the probability density. The peak occurs at 0, with sharp tails at higher values of activation, suggestive of a sparse super-Gaussian activation pattern. The kurtosis was estimated to be 9:11, which confirms the superGaussian nature. Fig. 2 shows the same source after a threshold is applied on activation. As seen in Fig. 2, the source is significantly active only a fraction of the time, which, in this case, is 27/256 or 10.5 percent of the time. From trial and error, an absolute value of 1.5 is set for a component of s to be considered significantly active. Let ssharp denote the result of thresholding vector a for an image patch x, where all components of s that have an absolute value less than 1.5 are set equal to zero. Denoting the reconstructed image vector by xsharp , we get xsharp ¼ Assharp :

ð7Þ

The residual subspace is given by xresidual ¼ x  xsharp :

ð8Þ

The residual subspace nevertheless plays an important role in attaining a smooth representation and cannot be neglected entirely. Fig. 3 shows the sharp and residual subspaces of two images. The sharp features image would correspond to the BCS discussed in Section 2.2. We use a latent variable model formulation to arrive at an efficient basis for the residual subspace, described in Section 3.2.

1370

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 27,

NO. 9,

SEPTEMBER 2005

factor analysis model (9) can be simplified by assuming an isotropic noise model  k  N ð0; 2k IÞ. For the one cluster case ðK ¼ 1Þ, the conditional density of x given s is given by   1 2 ; ð10Þ x  Ws   pðxjsÞ ¼ ð22 Þd=2 exp k k 22 where   1 pðsÞ ¼ ð2Þq=2 exp  sT s : 2

Fig. 2. Source activity following application of threshold on activation, suggesting a sparse activation profile.

3.2 MPPCA Model The residual image is assumed to originate by sampling from a limited number of lower dimensional self-similar clusters. Surfaces having a particular orientation originate from a single cluster for which an efficient basis can be constructed. A cluster k is first selected at random, following which an observation is generated using the linear model xk ¼ Wk sk þ  k þ  k ; k ¼ 1; . . . ; K; d

The marginal distribution of x is obtained by the integral Z pðxÞ ¼ pðxjsÞpðsÞ ds ð12Þ   1 ðx   ÞT C1 ðx   Þ ; ð13Þ ¼ ð2Þd=2 jCj1=2 exp 2 where the model covariance is given by C ¼ 2 I þ WWT : The posterior density of the latent variables s given x is given from Bayes rule by pðxjsÞpðsÞ pðxÞ   1=2 1 T 2 z  Mz ; ¼ ð2Þq=2 2 M exp 2

pðsjxÞ ¼

ð9Þ

q

where x 2 IR is the observation, s 2 IR is the lower dimensional source manifold assumed to be Gaussian distributed with zero mean and identity covariance, d > q,  k is the cluster observation mean,  k  N ð0;  k Þ is Gaussian white noise with zero mean and diagonal covariance. Wk , sk ,  k , and  k are hidden variables that need to be estimated from data; hence, the term latent variable model [19]. Tipping and Bishop formulate estimating the parameters of the model in a maximum-likelihood framework [19]. We describe the salient features of their algorithm. The statistical

ð11Þ

ð14Þ ð15Þ

where z ¼ s  M1 WT ðx   Þ and the posterior covariance matrix is given by 2 M1 ¼ 2 ð2 I þ WT WÞ1 : The Gaussian assumption governing the density of s and the conditional density of x given s is exploited to obtain solutions

Fig. 3. Image decomposition into (b) edge and (c) residual subspaces. The sharp features image corresponds to the BCS, while the residual subspace corresponds to FCS. Thus, visual information is split into two complementary subspaces, each following different computational strategies. Residual subspaces may be common across diverse images. Image (c) of (a) the Lions (top) and Forest (bottom) images are very similar, though edges—(b) are very different, and contain discriminating information.

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY...

1371

Fig. 4. Space and frequency domain characteristics of MPPCA filters. (a) Space domain—surface orientation of MPPCA filters. (b) Frequency domain—Fourier transforms of MPPCA filters. In the space domain, the MPPCA basis resembles tiles at specific orientations. The corresponding Fourier transforms are low pass and oriented.

of (12), (14) in closed form. Further details of parameter estimation for the model are presented in Appendix A. In the PPCA framework (derived in Appendix A.1), an observation vector x is represented in latent space not by a single observation as in conventional PCA, but by a posterior conditional density which is a Gaussian given by (15). Therefore, the best point estimate for the nth observation in lower dimensional latent space is the posterior mean given by [19] hsn i ¼ M1 WT ðxn   Þ:

ð16Þ

The optimal least squares linear reconstruction of an observation from its posterior mean vector hsn i is given by [19]  1 ^ n ¼ W WT W Mhsn i þ  : x

ð17Þ

Extending this to the MPPCA framework (Appendix A.2), each observation has a posterior density associated with each latent space. The corresponding posterior mean for an observation within a cluster i is given by  1 hsi i ¼ 2i I þ WTi Wi WTi ðx   i Þ:

ð18Þ

Reconstruction of an observation from its latent space representation can be obtained by substituting the corresponding Wi in (17). Therefore, sparse code obtained by ICA represents edges and sharp discontinuities, while the MPPCA basis codes for smooth manifolds in an image using a latent variable model. Representing the residual subspace involves first performing pattern recognition, followed by coding a pattern using a basis for that cluster. We refer to this method as the Hybrid ICA-Mixture of PPCA algorithm (HIMPA).

4

EXPERIMENTS

Model parameters were estimated using samples from a training set of 13 natural images downloaded with the FastICA package [16]. Each image was first normalized to 0 mean and unit variance. 16  16 block samples were drawn randomly from the training set, vectorized, following which dimensionality reduction to 160  1 was done using PCA. ICA estimation was then performed on the reduced dimension vectors. Once the ICA mixing matrix was determined, sharp features were extracted and residual image vectors were obtained. During practical implementation, the MPPCA

model failed to converge for the original 256  1 vectors. Therefore, residual image vectors were converted into 64  1 vectors and a model having eight clusters with four principal components in each cluster was estimated using the EM algorithm with parameter updates outlined in Appendix A. We used the assistance of the NETLAB Matlab package for estimating the MPPCA parameters [20].

4.1 Nature of Obtained Solutions Figs. 4a and 4b show the solutions obtained for the different clusters in space and frequency domains, respectively. The figures were obtained by summing the columns of the individual Wi s in ratio of their eigenvalues. In the space domain (Fig. 4a), the filters resemble tiles at various orientations. Fig. 4b shows the corresponding Fourier transform magnitudes. In the frequency domain, the filters possess significant magnitudes only at low frequencies and specific orientations. Functionally, such neurons belong to FCS discussed in Section 2.2. 4.2

Performance Comparisons with Conventional Basis Systems Model parameters obtained from training were used to represent images in the test set using the same sequence of operations outlined above. Hard clustering was performed to assign each observation in residual space to the cluster that leads to the least reconstruction error. For each image patch, the relevant parameters to be stored are the 160  1 ICA features, the local mean, the MPPCA cluster identifier, and a 4  1 vector of probabilistic principal components. Since the different coefficients have very different distributions, the entropies were estimated separately and added to arrive at a measure of bits per pixel. Appendix B gives a brief outline of the method used to compute entropies. We compare the performance of our neurally inspired model with DCT, ICA, and PCA. The DCT forms the basis of JPEG, one of the most popular image compression algorithms in use [21]. We examine the performance of our algorithm for different attributes of vision—color and luminance. The DCT is a fixed orthogonal basis where the cosine basis functions are not dependent on data. A PCA basis is data dependent (basis vectors are eigenvectors of the data covariance matrix) [21]. We use image coding efficiency measured in bits/pixel as an index of performance. Comparisons are made by compressing the same image using HIMPA, DCT, ICA, and PCA to the same level of visual quality and similar signal to

1372

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 27,

NO. 9,

SEPTEMBER 2005

Fig. 5. Color representation using (a) original, (b) DCT, and (c) HIMPA. The figures show that color can be well represented as a surface attribute.

TABLE 1 Comparison of HIMPA and DCT for Chrominance Representation for Images in Fig. 5

noise ratio (SNR). SNR alone was not found to be a good measure of performance as, in many cases, a high SNR did not correspond to good visual quality and vice versa.

5

RESULTS

We discuss results for color and luminance in the following sections.

5.1 Chrominance Representation A true color RGB image is converted into a Y IQ (luminance-inphase-quadrature) image using the linear transformation [21]. 0 1 0 10 1 Y 0:30 0:59 0:11 R @ I A ¼ @ 0:60 0:28 0:32 A@ G A: Q 0:21 0:52 0:31 B RGB denote the intensities of the red, green, and blue colors, respectively. The Y component contains luminance information, while the I and Q components contain color information. We consider color a “surface” attribute of vision, i.e., only smooth approximations are needed in an image for preserving high quality perception. In these experiments, the luminance component was kept constant and the representation of chrominance components using DCT and HIMPA was studied. Color was represented using a single cluster PPCA model having four or less components with little variation within a 8  8 image patch. Excellent visual appearance and SNR were obtained for virtually all natural images using

this method. We show two images in Fig. 5 covering a broad spectrum of color. Table 1 shows the corresponding entropies and combined SNR of the I and Q components for HIMPA and DCT. This can be extended to an MPPCA model to yield probabilistic partitions of the color space for color discrimination.

5.2 Luminance Representation Edges are important for this component for preserving quality of representation. Fig. 6 shows examples of the performance obtained on four test images which are very diverse from one another. Fig. 7 shows the representation obtained using ICA alone. SNR (dB) and entropy (bits/pixel) for images in Figs. 6 and 7 are displayed in Table 2. The results show that HIMPA codes have less entropy for comparable SNR and visual quality. Although ICA attains a marginally higher SNR than HIMPA, it is evident from comparing Figs. 6 and 7 that HIMPA images have a better visual appearance. This is most apparent in the “landscape,” “parrot,” and “bees” images and less discernable in the “tulips” image. The MPPCA component of HIMPA attains much better representation of relatively smooth areas in an image when compared to ICA codes alone. Images rich in smooth surfaces like grass, sand, etc., and lacking edge information are not represented well using HIMPA. In such images, the residual subspace is the dominating feature and the MPPCA model which encodes this space projects the images onto an overly smooth manifold leading to blocking. Thus, HIMPA performs best in images involving a blend of edge and surface attributes.

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY...

1373

Fig. 6. Natural image representation using (a) original (tulips, landscape, bees, and parrots), (b) HIMPA, and (c) DCT. Representation using an ICA basis only is shown in Fig. 7. Images in this experiment are very different from one another and HIMPA attains comparable visual quality and SNR at lower entropy (Table 2).

Fig. 7. Representation of images shown in Fig. 6 using a nature adapted ICA basis. The “Tulips” image, which is rich in edges, is well represented. Smooth regions of other images are poorly represented, resulting in poor visual quality at similar SNR. See Table 2. (a) Tulips, (b) bees, (c) landscape, and (d) parrots.

6

DISCUSSION

HIMPA achieves decomposition of an image into parallel data streams. This is similar to the BCS/FCS paradigm

discussed in Section 2.2 and is shown schematically in Fig. 8. The HIMPA architecture allows higher order data transformations on individual streams to be performed in relative

1374

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 27,

NO. 9,

SEPTEMBER 2005

TABLE 2 Comparison of HIMPA, DCT, and PCA for Luminance Representation for Images in Fig. 6

isolation from other streams. This is a significant advantage from the viewpoint of the data processing inequality of information theory. The data processing inequality states that, for a random variable x undergoing successive transformations x ! y ! z, the mutual information between x and z is bounded from above by the mutual information between x and y [22]. Thus successive transformations reduce mutual information between an observation and its neural representation. In the series-parallel architecture of HIMPA, successive transformations on one attribute do not degrade mutual information in another stream. Thus, hypothesis testing and learning can be performed on individual data streams and combined to arrive at coherent conclusions given visual observations.

6.1

General Discussion of HIMPA in Relation to DCT, ICA, and PCA Figs. 5 and 6 show that different attributes of vision require different levels of descriptive precision for high quality perception. HIMPA exploits the strengths of different basis systems to represent different attributes of an image. This is exploited to a limited extent in JPEG by the limited bits allotted for the chrominance components [21]. However, the DCT does not yield a partition of an image into residual surfaces and sharp features for the luminance component. PCA, too, does not achieve this partition and achieves better feature representation with increasing dimensionality. An ICA basis has limitations in representing relatively smooth areas in an image, evident from Fig. 7. The coding length of an observation x is related to the probability of observation by  1 : ð19Þ L / log pðxÞ For the residual subspace, under the assumption of Gaussianity,   the probability of an observation pðxÞ / exp error2 . Therefore, assigning each observation to the cluster with the least reconstruction error maximizes the probability of the observation and, thus, leads to the minimum coding length by (19). The method is in agreement with the minimum description length principle which is believed to govern neural coding strategies [23].

6.2 HIMPA and Learning Sense perception blends imperceptibly with learning. Learning is understood in a Bayesian sense as evaluating probabilities of truths of hypothesis about the current environment to decide upon an optimum course of action

Fig. 8. Data transformations and data streams in the HIMPA algorithm. HIMPA channels an incident image into parallel data streams using an ICA basis for edges and MPPCA basis for surfaces. This resembles the BCS/FCS paradigm discussed in Section 2.2.

[8]. In Appendix C, we derive the following bound on the joint probability of a hypothesis H and an observation x. Y Y P ðsied ¼ 0jHÞ P ðsled jHÞ P ðH; xÞ  P ðHÞ i2S 0

þ

K X

l2S 0

ð20Þ P ðHjsksur;m ÞP ðsksur;m jxsur ÞP ðxsur jkÞk :

k¼1

In this equation, x represents the observation having edge component xed and surface component xsur . sed is the sparse edge component and sksur;m is the posterior mean of x in the kth cluster and the product indices in the first term of the equation are over components of sed . S 0 is the set of components of sed which are inactive for the observation, and S 0 denotes the complementary set. HIMPA explicitly evaluates terms in the second half of (20) for all values of k using (13) and (15). P ðsed jHÞ and P ðHjsksur;m Þ terms in (20) are context dependent and depend solely on the hypothesis being evaluated. Equation (20), which relates P ðH; xÞ to P ðH; sÞ, achieves great simplification for hypothesis testing. The first term evaluates the joint probability in terms of edge information and the second in terms of residual information. Being sparse, the overwhelming majority of indices in the edge code (around 85 percent) belong to set S 0 , the first product in (20). This term can be estimated easily as it evaluates the probability of an edge being inactive under a certain hypothesis without requiring the level of activity to be discerned. Being factorial, a complex decision can be obtained quite simply by a product of simpler decisions. The residual subspace may contain very little discriminatory information. Fig. 3 shows two very different images and the information subspaces in each of them. The residual subspaces of the “Lions” and “Forest” image are very similar and it is very unlikely that this subspace would contribute much toward evaluating a hypothesis about the environment. The edge subspaces clearly are very different and hold discriminating information. In such a situation, the latter term in (20) can be ignored and Bayesian hypothesis testing can be performed using edge information alone, which possesses all the desirable properties discussed in [8]. In general, weighted combinations of the two terms in (20) are highly desirable as their relative importance is hypothesis dependent. This is perhaps the most important difference between HIMPA, ICA, DCT, or PCA. HIMPA performs image

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY...

decomposition and coding in a manner that blends image perception and learning. In comparison, none of the other basis systems yield parallel data streams where the relative contribution of a visual attribute to a task at hand is flexible and hypothesis dependent.

6.3

Recent Viewpoints about Redundancy in Neural Systems In a recent paper [8], Barlow re-examined the redundancy reduction paradigm advanced in the 1950s and 1960s [6], [7]. The paper is critical of redundancy reduction for several reasons and concluded that redundancy in sensory messages conveys crucial information about the environment and its importance does not lie in compressive coding. Barlow defines redundancy in image data to be manifest or nonmanifest (also referred to as hidden) [8]. Manifest redundancy, caused by unequal probabilities of primary message elements, is benign. Nonmanifest redundancy, caused by unequal joint probabilities of higher order combinations of elements, is believed to be of critical importance and is considered an important source of knowledge about the environment and vital for estimating probabilities of hypothesis about the environment [8]. Redundancy in living systems is also understood to have a temporal element to it as those features of sensory stimulation that are accurately predictable from knowledge acquired from experience become redundant [8]. Compressive representations, which reduce redundancy, have high activity ratios of individual neurons in response to a stimulus. Such representations are easily degraded by noise and learning from such a representation is difficult owing to the high degree of overlap in neuronal activities in response to different stimuli [8]. Barlow bases some of his conclusions on empirical evidence revealing greater than expected neuronal activity at higher levels of visual processing. However, the nature of redundancy that is added is not discussed. Redundancy added in the form of error correcting codes to prevent signal degradation at higher levels of the nervous system is considerably different from redundancy in perceived stimuli. This distinction is vital for understanding the importance of redundancy in high-level processing of visual information. In the following sections, we demonstrate that our proposed algorithm is consistent with the new paradigm proposed by Barlow. 6.3.1 HIMPA Separates Manifest Redundancies in an Image and Makes Nonmanifest Redundancies Easier to Identify In our model, the residual subspace captures manifest redundancies in a visual scene. This manifold lacks structure and the neural system is able to generate an internal representation for it through knowledge acquired from prior experience using the generative model given in (9). Once these have been identified and removed, the remaining source activations are sparse and contain dominant edge information from which higher order redundancies governing the cooccurrence of edges can be learned. Edge co-occurrence statistics have been shown to be vital for deriving probabilistic rules governing grouping of edges to form longer contours and boundaries (BCS discussed in Section 2.2) [24], [25]. The sparse ICA code no longer has a high activity ratio and two different patterns that need to be distinguished are less likely

1375

to overlap in their representations, thus enabling more robust detection and probability evaluation. Therefore, HIMPA codes do not suffer from the limitations of compressed representations in general, as discussed in [8]. HIMPA does not discard nonmanifest redundancies, which is undesirable according to Barlow [8], rather it helps extract them from the manifest redundancies in which they are buried. By extracting and preserving manifest redundancies in a separate stream, HIMPA improves the signal to noise ratio of important signals in the environment, improving their detection and identification. This is a desirable goal of sensory coding, as discussed in [8]. This is important in reinforcement learning as well because all reinforcement learning is a response to the statistical structure of sensory signals.

6.3.2 HIMPA Codes Are Well-Suited for Learning in Neural Systems The advantages of a sparse independent code for learning were discussed in Section 6.2. In addition to this property, HIMPA edge codes span a limited range (between -8 and +8, as shown in Fig. 1b). A small dynamic range confines discrete approximations of neuronal activity to a small alphabet, making probabilistic estimates of distinct activation levels easier, a highly desirable property according to Barlow in [8]. Further, HIMPA uses closed form expressions to evaluate probabilities in residual space which can be evaluated easily.

7

SUMMARY

AND

FUTURE WORK

We present a hybrid algorithm that uses an ICA basis to represent edges and a MPPCA basis to span the residual subspace. The residual subspace combines a large number of individually insignificant events in a concise summary, which is vital for adapting to changing goals. The method has its origins in current mathematical models of the visual cortex and visual data processing. We demonstrate application of the model in representing chrominance and luminance components of natural images. Further, HIMPA codes greatly simplify estimating probabilities of truths of hypothesis in the environment, a task intimately connected to learning. The model can be extended to data sets other than natural images using a mixture of ICA formulation. The selection of active versus inactive sources can be extended beyond the simple threshold rule employed by us based on experimental data which would help discern which subset of low source activities is worth preserving for good visual reconstructions. Estimation of the residual subspace parameters can be formulated in a Bayesian framework and solved using approximation methods, like the Monte Carlo method or variational learning, to improve generalization.

APPENDIX A MAXIMUM-LIKELIHOOD ESTIMATION IN THE MPPCA MODEL

OF

PARAMETERS

A.1 One Cluster Case Assuming independence of the observations, for the one cluster case, the log likelihood of observing the data under the model is given by

1376

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,



N X

lnf pðxn Þg

n¼1

¼

 N  dlnð2Þ þ lnjCj þ trðC1 SÞ ; 2

ðA:21Þ

N 1X ðxðnÞ   ÞðxðnÞ   ÞT N n¼1

N 1X xn : N n¼1

ðA:22Þ

The maximum-likelihood solution of W is given by 2

WML ¼ Uq ð  q   IÞ

1=2

R;

A.2 Multicluster Framework In this case, the complete data log likelihood is given by " # N X K X ð pðxn jkÞk Þ ; ðA:23Þ L¼ k¼1

where pðxn jkÞ is the probability of observing xn given that it originated from cluster k, k are the prior mixing P probabilities and are subject to the condition k ¼ 1. An iterative Expectation Maximization (EM) algorithm is used to estimate the model parameters. In a manner analogous to estimating the Gaussian mixture model, closed form expressions can be derived for updating the model parameters [19]. We denote the posterior probability of membership of observation xn to cluster i—pðijxn Þ by Rni . By application of Bayes rule, Rni is given by Rni ¼ pðijxn Þ ¼

pðxn jiÞi : pðxn Þ

ðA:24Þ

The maximum-likelihood update equations for the mixing probabilities and cluster means are given by [19] N 1X Rni N n¼1 PN n¼1 Rni xn ~i ¼ P  : N n¼1 Rni

~i ¼

N X ~i ¼ 1 ~ i Þðxn   ~ i ÞT Rni ðxn   S ~i N n¼1

ðA:27Þ

Entropy denotes the average number of bits needed to encode one symbol of the source [15], [22]. Outputs of filters (ICA or MPPCA) are first quantized, thereby mapping a continuous random variable to a discrete set A. The entropy of a discrete random variable is then estimated according to the formula X H¼ pk log2 pk ; k2A

where the columns of Uq are the eigenvectors of S corresponding to the first q eigenvalues,  q is a q  q matrix containing the principal eigenvalues of S arranged in descending order and R is an arbitrary q  q orthogonal rotation matrix [19]. This is the probabilistic principal components analysis framework (PPCA) for modeling a mapping from latent space to the principal subspace of the observed data. The maximumPd 1 likelihood estimator for 2 is given by 2 ¼ dq j¼qþ1 j , where qþ1 to d are the smallest eigenvalues of S [19].

i¼1

SEPTEMBER 2005

APPENDIX B ENTROPY ESTIMATION

is the sample covariance matrix. The parameters in the model are determined such that the likelihood (A.21) is maximized. The maximum-likelihood estimator of the parameter  is given by the mean of the data [19].  ML ¼

NO. 9,

using the same eigendecomposition outlined above (Appendix A.1) for the single cluster case. The above equations are updated iteratively until convergence.

where S¼

VOL. 27,

ðA:25Þ ðA:26Þ

The updates for Wi and i are obtained from the local responsibility weighted covariance matrix [19]

where pk denotes the probability of observing the symbol in A indexed by k and the sum is over all the symbols in set A. (pk is determined by dividing the number of observations of the symbol indexed by k by total number of observations). The number of symbols needed to encode an image patch is divided by the total number of pixels in the patch to arrive at a measure of bits/pixel. Let HICA , Hmean , HP P CA , HClusterID denote the entropies of the ICA codes, the local means, the MPPCA codes, and identity of the MPPCA clusters, respectively. Therefore, from Sections 3 and 4, the entropy in bits/pixel for the I component is given by

160 Hmean þ Bits=P ixel ¼ HICA 256 256

4 HClusterID þ : þ HP P CA 64 64 Entropies for chrominance component, DCT, and PCA are obtained in a similar manner.

APPENDIX C EVALUATING JOINT PROBABILITY OF A HYPOTHESIS AND AN OBSERVATION USING HIMPA Let x represent the actual observation, which is decomposed into an observed edge component xed and surface component xsur . Let sed and ssur be the sparse edge component and surface component generated by HIMPA, respectively. Let H be the hypothesis being evaluated. Since HIMPA partitions an observation into its edge and residual surface components, P ðH; xÞ—the joint probability of hypothesis H and observation x is the sum of the joint probability of the hypothesis and the corresponding edge and surface components. P ðH; xÞ ¼ P ðH; xed Þ þ P ðH; xsur Þ:

ðC:28Þ

Assumming surface xsur can originate from K clusters, the total probability becomes P ðH; xÞ ¼ P ðH; xed Þ þ

K X k¼1

P ðH; xsur ; kÞ:

ðC:29Þ

BALAKRISHNAN ET AL.: A NEW IMAGE REPRESENTATION ALGORITHM INSPIRED BY IMAGE SUBMODALITY MODELS, REDUNDANCY...

P ðsed jHÞ ¼

Let sksur be the latent space representation of xsur in cluster k. Therefore, the joint probability can be expressed as P ðH; xÞ ¼ P ðH; xed Þ þ

K X X k¼1

P ðH; xsur ; sksur ; kÞ:

ðC:30Þ

sksur

Using the chain rule, P ðH; xsur ; sksur ; kÞ can be expressed as

Y

P ðsied jHÞ;

1377

ðC:40Þ

i

where sied denotes the ith component of the edge subspace representation sed . Substituting (C.40) in (C.39), we obtain an approximation to the total posterior probability as Y P ðH; xÞ  P ðHÞ P ðsied jHÞ i

P ðH; xsur ; sksur ; kÞ ¼ P ðHjsksur ; xsur ; kÞ P ðsksur jxsur ÞP ðxsur jkÞP ðkÞ:

ðC:31Þ

þ

P ðH; x; sksur ; kÞ ¼ P ðHjsksur ÞP ðsksur jxsur ÞP ðxsur jkÞk : ðC:32Þ

ðC:41Þ We can split the product in (C.41) into two sets—the set S 0 of indices i, where sied ¼ 0, and the complementary set S 0 , where sled 6¼ 0. Therefore, (C.41) can be expressed as Y Y P ðsied ¼ 0jHÞ P ðsled jHÞ P ðH; xÞ  P ðHÞ

Substituting (C.32) in (C.30), we get

i2S 0

P ðH; xÞ ¼ P ðH; xed Þ K X X

þ P ðHjsksur ÞP ðsksur jxsur ÞP ðxsur jkÞk :

From Bayes rule, P ðH; xed Þ can be expressed as ðC:34Þ

For a linear, invertible transformation, xed ¼ Ased , the pdfs are related by [15] 1 P ðsed jHÞ: P ðxed jHÞ ¼ jdetAj

ðC:36Þ

ðC:37Þ

It therefore follows that log jdetAj  0 or jdetAj  1. Substituting this lower bound for jdetAj in (C.34), we get the following inequality P ðH; xed Þ  P ðsed jHÞP ðHÞ;

i2S 0

þ

ðC:38Þ

where P ðHÞ denotes the prior probability of the hypothesis under consideration. Substituting (C.38) in (C.33), we get

þ

P ðHjsksur;m ÞP ðsksur;m jxsur ÞP ðxsur jkÞk :

k¼1

Equation (C.43) evaluates the joint probability of a hypothesis and an observation in terms of its neural representation.

ACKNOWLEDGMENTS The authors express their gratitude to the associate editor and five anonymous reviewers for helpful suggestions and corrections which helped improve the quality of the manuscript in great measure. N. Balakrishnan would like to express his profound gratitude to Dr. L.R. Chary and Mr. S. Easwaran for introducing him to the world of signal processing.

REFERENCES [1] [2]

P ðH; xÞ  P ðsed jHÞP ðHÞ K X X

K X

l2S 0

ðC:43Þ

For a linear tranformation, H(x) and H(s) are related by [15] HðxÞ ¼ HðsÞ þ log jdetAj:

ðC:42Þ P ðHjsksur ÞP ðsksur jxsur ÞP ðxsur jkÞk :

The inner sum over sksur can be dropped if we evaluate the probability of the hypothesis only at the posterior mean of ssur (denoted by sksur;m for a particular choice of cluster k) given xsur using (16). Therefore, (C.42) simplifies to Y Y P ðH; xÞ  P ðHÞ P ðsied ¼ 0jHÞ P ðsled jHÞ

ðC:35Þ

The mixing matrix A was determined to maximize the negentropy, as discussed in Section 3.1. Since x is assumed to be multivariate Gaussian, from the definition of negentropy in (4) it follows that HðxÞ  HðsÞ:

K X X

l2S 0

k¼1 sksur

ðC:33Þ

k¼1 sksur

P ðH; xed Þ ¼ P ðxed jHÞP ðHÞ:

P ðHjsksur ÞP ðsksur jxsur ÞP ðxsur jkÞk :

k¼1 sksur

P ðkÞ is the prior mixing probability of cluster k, denoted by k in (A.25). Given sksur , P ðHjxsur ; sksur ; kÞ is conditionally independent of xsur and k. Therefore, the joint probability can be simplified as

þ

K X X

[3]

P ðHjsksur ÞP ðsksur jxsur ÞP ðxsur jkÞk : [4]

k¼1 sksur

ðC:39Þ Being independent, the joint probability of activation of individual representational elements in a distributed network can be obtained quite simply by multiplication of individual probabilities, making Bayesian inference about hypothesis much easier. Therefore, the likelihood of sed conditioned on H can be written as

[5] [6] [7]

B.A. Olshausen and D.J. Field, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?” Vision Research, vol. 37, no. 23, pp. 3311-3325, 1997. R.S. Snell, Clinical Neuroanatomy for Medical Students, pp. 702-720. Philadelphia: Lippincott Williams and Wilkins, 1997. Neural Codes and Distributed Representations: Foundations of Neural Computation, L. Abbott and T.J. Sejnowski, eds. Cambridge, Mass.: MIT Press, 1999. J. Daugman, “Gabor Wavelets and Statistical Pattern Recognition,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., pp. 457-463. Cambridge: MIT Press, 2003. H. Neumann and E. Mingolla, “Contour and Surface Perception,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., pp. 271-276. Cambridge: MIT Press, 2003. H.B. Barlow, “Possible Principles Underlying the Transformations of Sensory Messages,” Sensory Comm., W.A. Rosenblith, ed., pp. 217-234. Cambridge: MIT Press, 1961. H.B. Barlow, “Unsupervised Learning,” Neural Computation, vol. 1, pp. 295-311, 1989.

1378

[8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

[18] [19] [20] [21] [22] [23] [24] [25]

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

H.B. Barlow, “Redundancy Reduction Revisited,” Network: Computation in Neural Systems, vol. 12, pp. 241-253, 2001. J.A. McArthur and B. Moulden, “A Two-Dimensional Model of Brightness Perception Based on Spatial Filtering Consistent with Retinal Processing,” Vision Research, vol. 39, pp. 1199-1219, 1999. E.H. Land and J.J. McCann, “Lightness and Retinex Theory,” J. Optical Soc. Am., vol. 61, pp. 1-11, 1971. A. Hurlbert, “Formal Connections between Lightness Algorithms,” J. Optical Soc. Am., vol. 3, pp. 1684-1693, 1986. S. Grossberg and D. Todorovic, “Neural Dynamics of 1-D and 2-D Brightness Perception: A Unified Model of Classical and Recent Phenomena,” Perception & Psychophysics, vol. 43, pp. 723-742, 1988. M.A. Paradiso and K. Nakayama, “Brightness Perception and Filling-In,” Vision Research, vol. 39, pp. 1221-1236, 1991. R. Kentridge, C. Heywood, and J. Davidoff, “Color Perception,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., pp. 230-233. Cambridge Mass.: MIT Press, 2003. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: John Wiley & Sons, 2001. A. Hyva¨rinen, “Fast ICA Matlab Package,” http://www.cis.hut.fi/ projects/ica/fastlab, Apr. 2003. A. Hyva¨rinen and P. Hoyer, “Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces,” Neural Computation, vol. 12, no. 7, pp. 1705-1720, 2000. A. Hyva¨rinen, P.O. Hoyer, and M. Hinki, “Topographic Independent Component Analysis,” Neural Computation, vol. 13, no. 7, pp. 1527-1558, 2001. M. Tipping and C. Bishop, “Mixtures of Probabilistic Principal Component Analyzers,” Neural Computation, vol. 11, no. 2, pp. 443482, 1999. I. Nabney and C. Bishop, “Netlab Neural Network Software,” http://www.ncrg.aston.ac.uk/netlab, July 2003. R.C. Gonzalez and R.E. Woods, Digital Image Processing. Reading, Mass.: Addison Wesley, 1993. T.M. Cover and J.A. Thomas, Elements of Information Theory, pp. 1249. New York: John Wiley & Sons, 1991. D.H. Ballard, An Introduction to Natural Computation, pp. 46-48. Cambridge, Mass.: MIT Press, 1999. N. Kru¨ger, “Collinearity and Parallelism are Statistically Significant Second Order Relations of Complex Cell Responses,” Neural Processing Letters, vol. 8, pp. 117-129, 1998. W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly, “Edge CoOccurrence in Natural Images Predicts Contour Grouping Performance,” Vision Research, vol. 41, pp. 711-724, 2001. Nikhil Balakrishnan received the Bachelor of Medicine and the Bachelor of Surgery (MBBS) degree from Mahatma Gandhi University, India, in September 1999 and the MS degree in biomedical engineering from the University of Illinois at Chicago in 2004. His research interests include computer vision, neural networks, and machine learning.

VOL. 27,

NO. 9,

SEPTEMBER 2005

Karthik Hariharakrishnan received the BS degree in engineering from the Birla Institute of Technology and Sciences, India, in 2000 and the MS degree in electrical and computer engineering from the University of Illinois at Chicago in 2003. He is currently with the Multimedia Group in Motorola India Electronics Ltd. His research interests are video compression, object tracking, segmentation, and music synthesis.

Dan Schonfeld received the BS degree in electrical engineering and computer science from the University of California, Berkeley, and the MS and PhD degrees in electrical and computer engineering from the Johns Hopkins University, Baltimore, Maryland, in 1986, 1988, and 1990, respectively. In August 1990, he joined the Department of Electrical Engineering and Computer Science at the University of Illinois, Chicago, where he is currently an associate professor in the Departments of Electrical and Computer Engineering, Computer Science, and Bioengineering, and codirector of the Multimedia Communications Laboratory (MCL) and member of the Signal and Image Research Laboratory (SIRL). He has authored more than 60 technical papers in various journals and conferences. He has served as a consultant and technical standards committee member in the areas of multimedia compression, storage, retrieval, communications, and networks. He has also served as an associate editor of the IEEE Transactions on Image Processing on nonlinear filtering as well as an associate editor of the IEEE Transactions on Signal Processing on multidimensional signal processing and multimedia signal processing. He was a member of the organizing committees of the IEEE International Conference on Image Processing and IEEE Workshop on Nonlinear Signal and Image Processing. He was the plenary speaker at the INPT/ASME International Conference on Communications, Signals, and Systems. He has been elected a senior member of the IEEE. He has previously served as president of Multimedia Systems Corp. and provided consulting and technical services to various corporations, including AOL Time Warner, Chicago Merchantile Exchange, Dell Computer Corp., Getco Corp., EarthLink, Fish & Richardson, IBM, Jones Day, Latham & Watkins, Mirror Image Internet, Motorola, Multimedia Systems Corp., nCUBE, NeoMagic, Nixon & Vanderhye, PrairieComm, Teledyne Systems, Touchtunes Music, Xcelera, and 24/7 Media. His current research interests are in multimedia communication networks, multimedia compression, storage, and retrieval, signal, image, and video processing, image analysis and computer vision, pattern recognition, and medical imaging.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Suggest Documents