AMOVIP: Advanced Modeling of Visual Information Processing

1 downloads 0 Views 257KB Size Report
AMOVIP: Advanced Modeling of Visual Information Processing. M. Keil & G. Cristéobal. Instituto de Optica(CSIC). Serrano 121, 28006 Madrid (Spain).
AMOVIP: Advanced Modeling of Visual Information Processing

M. Keil & G. Cristobal Instituto de Optica(CSIC) Serrano 121, 28006 Madrid (Spain) fmat,[email protected]

H. du Buf University of Algarve, Dept. of Electronics and Computer Science Campus de Gambelas, 8000 Faro (Portugal) [email protected]

B. Escalante Universidad Nacional Autonoma de Mexico Faculty of Engineering, Graduate Division Mexico, D.F. 04510 (Mexico) boris@verona. -p.unam.mx Abstract

The AMOVIP1 project is a joint initiative for a cooperation in speci c aspects of vision research and their applications, involving EU partners in Spain and Portugal together with developing countries Mexico and Brazil in Latin America. Models of visual perception concern primarily the local brightness as perceived at di erent spatial luminance patterns. The main aim of this project is to develop a 2D brightness model that is able to predict most if not all known brightness e ects, such as illusions, and to apply this in practical applications for improving medical diagnosis, feature detection in remote sensing, as well as image coding. 1

Introduction

The AMOVIP project is part of the speci c EU programme on cooperation with third countries and international organizations (INCO) for the Latin America region (Mexico and Brazil). The project duration will be 36 months and the starting date was March 1998. The two main objectives are: (i) to develop and to test a uni ed theory of models based on Gabor (and also Hermite) lters capable of describing the major ndings on human spatial brightness perception. One such models relies on a symbolic image representation consisting of multiscale lines, edges and vertices with rules for local 1 This work has been supported in part by the \AMOVIP" EU INCO-DC project No. 961646.

L.Pessoa Universidade Federal do Rio de Janeiro Programa de Sistemas e Computacao Rio do Janeiro, 68501, Brazil [email protected]

brightness induction as caused by the interpretation of foreground, background and transparency patterns [1]. (ii) To apply this theory in practical applications like image enhancement/ restoration and coding, where the goal is to include knowledge about visual perception in order to develop better methods in terms of resulting perceived image quality. To accomplish these tasks, a strong link between the development of models and critical psychophysical experiments to test these will be applie d.On the other hand, the theoretical modeling and the image restoration as well as coding algorithms developed will be studied by concentrating on practical vision applications, notably in the eld of medical imaging e.g. Ma gnetic Resonance Images (MRI), Digital Cardio Images (DCI) sequences and tomographic images. Apart from developing improved image restoration and coding methods, this project contributes to a deeper understanding of the visual system and therefore also to improved arti cial vision. One of the objectives of AMOVIP is to apply and assess the previously described methods for helping medical diagnosis and video communication that will lead to more eÆcient techniques for image enhancement and compression in these elds. There are many data sets available for starting brightness modeling, but the uni cation of all these by a single model has never successfully been achieved. The reasons for this failure are (1) not many labs working in visual psychophysics have the signal processing knowledge, (2) modeling takes as longer as doing experiments, and (3) the visual system is highly nonlinear and complex. The interaction between psychophysics

on one side and mathematics and image processing on the other, has resulted in the development of new image representation models that allow for a more eÆcient description of the luminance transitions within an image. There are two possibilities of how to model brightness perception: \simple" and \advanced" models (described below). Speaking in biological terms, for simple models it seems to be enough modeling simple cells, complex cells and perhaps including retina and LGN, too. For advanced modeling one also has to take into account hyper-complex cells (which can be modeled by end-stopped operators) and some kind of grouping mechanism which realizes the Gestalt principle of \good continuation" [8]. 2

Structure of the project

The project consists of four main Workpackages (WPs) plus two additional WP dedicated to the Management and Exploitation and Dissemination aspects, respectively. The main four packages are:  WP1: Brightness modeling. In this WP a one-dimensional model based on complex Gabor functions will be extended to 2D. The model will include schemes for multiscale line/edge detection, vertex detection, and amplitude nonlinearities plus rules for brightness induction and assimilation related to the interpretation of foreground, background, and transparency patterns in the image formation. These patterns with related induction e ects are the only ones that will be studied in this project (brightness also has temporal and depth/stereo e ects). The symbolic image description in terms of multiscale lines, edges, etc, together with nonlinearities and induction rules, will be used to construct for an arbitrary input image a brightness map that corresponds to our visual perception.  WP2: Psychophysical experiments. Although much data are already available in the literature, additional data for speci c observers are required because it is known that there exist signi cant inter-subject di erences. This means that some general data on disks and gratings are needed in order to calibrate the model for one observer, and then very speci c data will be measured in order to test model details. It is also very important that all data are available for at least one subject, at one background luminance level, and with (quasi)static temporal stimulus presentations. Some data, in the form of exponentially approximated luminance-brightness relations for gratings at di erent frequencies, will also be used in WPs 3 and 4 in order to optimize the quantizer design in image denoising and coding.

 WP3: Multiresolution enhancement/coding. The aim of this WP is to develop methods that can characterize the degradations and statistics of images for a selected number of applications, with a special emphasis i n medical and active vision in robotics. This will be done by using image multiresolution models based on Gabor (and also wavelet-Hermite) representations. For the coding application, image coders will be developed based on pattern features and statistics from images. Characteristics of brightness models will be used for quantizer design.  WP4: Motion estimation and video processing. Image segmentation will be considered here as a rst step to video coding. Region boundary will be characterized by Markov Random Field modeling and by 3D wavelet transformation. The image sequence can be modeled using a Markov Random Field a pproach that will provide a segmentation result through a Maximum a Posteriori (MAP) estimate. A 3D wavelet transformation will be considered here as an alternative to image sequence representation. The selection of the basis function s for this transformation is an open issue. Gabor functions and/or Hermite polynomials will be used for the spatial domain, while Laguerre polynomials be used for the temporal domain. Segmentation will be achieved by statistical detection of 3D contours. The AMOVIP partners are:  CSIC - Consejo Superior de Investigaciones Cienti cas (Madrid, Spain)- Coordinator  UALG - University of Algarve (Faro, Portugal)  UNAM - Universidad Nacional Autonoma de Mexico (Mexico)  UFRJ - Universidade Federal do Rio de Janeiro (Brazil) We have identi ed four areas of application: a) Mathematical simulator; b) Medical package; c) Video conference package and d) Ground (soil) analysis. The details about such applications together with main updated information about AMOVIP is available from the Web page 2 . Besides to facilitate the interaction between partners, it is an excellent gateway to disseminate the results of the project. We have contacted potential users that are interested to collaborate in the area of medical applications: the Surgery and Experimental Medicine Unit of the Hospital General \Gregorio Mara~non" in Madrid and the Instituto Mexicano de Psiquiatria (Mexico). Another application that will be considered in AMOVIP is ground analysis. We already have contacted with research groups belonging to Brazilian Agricultural Research 2 http://www.iv.optica.csic.es/projects/amovip.html

Corporation with the aim to apply the multiscale line/edge and vertex detection as well as the more general multiscale brightness model to be developed. 3

Current results. Future work

3.1 Modeling brigthness in 2D

There are mainly two classes of models concerned with brightness perception. On the one hand one can nd models, mainly in one dimension, that account for Mach bands, Chevreul illusion, luminance staircases and so on [5, 1, 7, 6, 4]. We will refer to them as simple models. On the other hand we have two dimensional models which focus more on formation of illusory contours and related brightness phenomena [2], and to some extent [3]. We will call those models advanced models, for one has to include more computational stages compared to simple models. Fig. 1 shows some examples of common brightness illussions. We have designed and implemented new Gabor lter sets. One of the purposes of this sets is to obtain a quasi continuous scaling of the frequency necessary later on for a multiscale line and edge detection. We have started with the rst stages of brightness modeling (on-o cells, simple cells, end-stopped cells, etc) and psychophysical experiments (for 1D signal tests). We have implemented admissible Gabor wavelets, which were proposed as simple cells models by the Systems Biophysics Group at the Ruhr-Universitat Bochum, polar-separable Gabors and stretchable Gabors. Inspired by the stretchable Gabors, we have de ned a set of lters de ned by: Schoco(r; ) = [1 cos(2r)]  cos2n () (1) The rst factor thus looks like a donut, and the second factor cuts out a pie, hence the name. Note that r 2 [ 1; +1]. Through proper scaling the selectivity for speci c spatial frequencies is obtained. This lter allows to go to even higher wave-numbers than the stretchable Gabor without getting aliasing. In Fig. 2 the radial part (i.e. a slice through the donut) of the choco-pie is compared with a DOG- lter (left hand side) and a stretchabl e Gabor (right hand side). As can be seen, the valley that represents the drop o to low frequ encies of the choco-pie follows the DOG's valley. Gabor wavelets are known to provide an optimal set of basis functions for representation of natural images. Most image coding schemes rely on a multi-orientation

a

b

c Figure 1. Some common brightness illusions. The first picture shows a triangular wave luminance profile. At the tips of this wave Mach band like features are perceived rather than sharp edges. The third image shows the Ehrenstein disk, where an illusory circle can be seen that seems brighter than the background. The rightmost picture shows a Hermann grid. Here gray spots seem to appear at the crossings in peripheral vision, suggesting a dependency on spatial frequency.

Choco−pie (dotted) and DOG in Fourier space 1

0.9

0.8

0.7

Amplitude

0.6

0.5

0.4

0.3

0.2

0.1

0 −8

−6

−4

−2 0 2 radial frequency variation

4

6

8

Choco−pie (dotted) and stretchable Gabor in Fourier space 1

0.9

0.8

0.7

Amplitude

0.6

0.5

0.4

0.3

0.2

0.1

0 −8

−6

−4

−2 0 2 radial frequency variation

4

6

8

Figure 2. Fourier space comparison of radial parts of choco-pie filter (dotted lines) with the DOG-filter (top) and the stretchable Gabor filter (bottom), respectively

arguments that only with such ltered images the commonly used Gabor-wavelet transform actually acts as a reasonable line- or bar detector. By doing so, we are rapidly faced with another problem, for which up to now there are existing no satisfactory answer: how is luminance information passed to the visual cortex? This problem arises especially if one decides to use socalled DC-free Gabor-wavelets, which means that they are insensitive to mean illumination (i.e. the answer to a uniform illuminated surface is zero). The problem is in general resolved by considering more coarse scales (corresponding to big lter kernels) together with a separate brightness-channel (e.g. a low passed ltered image) appart from the contrast-channels (obtained from the Gabor-wavelets). This, however, again is contradictionous to the most recent biological results, because (i) up to now there are only indenti ed (non-oriented) contrast-channels, and (ii) cortical simple cells respond to homogeneous illuminated surfaces extending over their classical receptive eld, thus representing brightness information as well as contrast information at the same time. This issue is currently under investigation in order to construct a plausible biological model. 3.2 Medical image enhancement

and multi-resolution analysis of a given stationary image, highly inspired by the simple-cells which were found in the primary visual cortex of cats or monkeys, respectively. The receptive elds of these simple cells have been earlier successfully described by self-similar or scale-invariant Gabor-functions, which in this case provide a wavelet transform. Gabor wavelets are supposed to act as edge- or bar-sensitive linear lters, and many appoaches to computer vision start by ltering a given image with Gabor wavelets. However, a huge amount of post-processing is usually necessary in order to identify corresponding zero-crossings and/or associated maxima in the set of wavelet transfer functions with the objective of detecting non-spurious events like edges or bars. This is getting more important if the image is degraded by noise. These 'engineering-approaches' perform often quite well in praxis for certain image classes. On the other hand our current work suggests nevertheless that from a biological point of view these methods can not be considered as plausible models, because cortical simplecells respond already robustly in nearly all stimulicontexts. These was especially con rmed in some new examinations. We suggest instead to start with an image which corresponds to the answer of the parasol ganglion cells or the magnocellular cells of the lateral geniculate nucleus. It can be veri ed by symmetry-

In order to analyze an image on a local basis, the image is multiplied by a window function. This windowing takes place at several positions over the entire input image, comprising a sampling lattice S . Within every window, the image is described by a weighted sum of polynomials. We use polynomials that are orthogonal with respect to the window function. When a Gaussian window is used, for instance, the Hermite polynomials are chosen for the expansion. The mapping from the input image to the weights of the polynomials, henceforth referred to as the polynomial coeÆcients, is called a forward polynomial transform. By interpolating the polynomial coeÆcients with speci c functions, the original image can be resynthesized. This process is called an inverse polynomial transform. Figure 3 illustrates one example of image enhancement in the case of Computer Tomography by a Hermite polynomial decomposition. 3.3 Psychophysical experiments

We have investigated Mach band appearance as a function of two main stimulus parameters: (a) contrast and (b) edge sharpness. Supra-threshold measurements of (i) bandwidth and (ii) position are

collected and reported as a function of the two parameters. A key aspect that remains to be developed in the integration of brightness modeling in quantizer design and for motion estimation. Designing optimal quantizers is an iterative process of re ning quantizer representation and decision levels. The adaptation of quantization functions to the non-linear brightness perception will provide a signi cant improvement in quality for the same bit-rate. 3.3.1

a

Methods

We used displays consisting of two versions of the same type of stimulus placed side by side (Fig. 4). Each stimulus consisted of black, white and gray regions. In the stimulus on the right, the colors of the black and white regions were reversed with respect to the one on the left. The rst display was a minimal version of standard's White display, consisting of three stacked horizontal bars, with a rectangular gray patch covering the central third of the middle bar. In the second display, we modi ed White's display by slanting the stem of the T-junction. This display was consistent with an interpretation of the middle strip as slanted in depth. In the third and fourth displays we modi ed White's displays by transforming the T-junctions into Y- and W- junctions. These displays were consistent with interpretations of the middle strip as consisting of three segments slanted with respect to each other. The gray patch in one of the stimuli always had constant luminance (L1 =140, on a scale of 0 to 255), while the other was parametrically varied (L2 =120,140,160,180). Results are shown in Fig. 5. For display #1, #2 to all subjects the left gray patch appeared darker. For display #3, to all subjects (except one) the right patch appeared darker. For display #4 there exists large discrepance between subjects about witch patch appeared darker.

b Figure 3. Image enhancement in Computer Tomography (CT) a. Example of one slice of a CT b. Enhanced image by using a hierarchical Hermite filtering and a fixed thresholding method

4

Conclusions

Future extensions of AMOVIP will be in the area of multimodality fusion in medical imaging. There exists a number of challenging medical applications that require the integration of di erent image modalities, e.g. in neuro-surgery (study of epilepsy); psychiatry (schizophrenia) and neurology (brain pathologies). The end results of AMOVIP will be of great bene t to devise new criteria for image fusion. Finally, some active collaborations between AMOVIP partners and other groups have been started, in particular with the

a:

b:

c:

d: Figure 4. Set of stimuli. (a) Display #1. (b) Display #2. (c) Display #3. (d) Display #4.

Dept. of Psychology at the University of Munster (Germany) and with the Brain Imaging Center, McLean Hospital, MA (USA). References

[1] H. du Buf and S. Fisher. Modeling brightness perception and syntactical image coding. Optical Engineering, 34(7):1900{1911, 1995. [2] A. Gove, S. Grossberg, and E. Mingolla. Brightness perception, illusory contours, and corticogeniculate feedback. Visual Neuroscience, 12:1027{1052, 1995. [3] F. Heitger, L. Rosenthaler, R. von der Heydt, E. Peterhans, and O. Kubler. Simulation of neural contour mechanisms: From simple to end-stopped ce lls. Vision Research, 32(5):963{981, 1992. [4] F. Kinfdom and B. Moulden. A multi-channel approach to brightness coding. Vision Research, 32:1565{1582, 1992. [5] J. McArthur and B. Moulden. A two-dimensional model of brightness perception based on spatial lt ering consistent with retinal processing. Vision Research, 39:1199{1219, 1998. [6] H. Neumann. An outline of a neural architecture for uni ed visual contrast and brightness perception. Technical report, CAS/CNS-Boston University Center for Adaptive and Neural Systems, 1998. [7] L. Pessoa, E. Mingolla, and H. Neumann. Contrast- and luminance-driven multiscale network model of brightness perception. Vision Research, 35:2201{2223, 1995. [8] M. Wertheimer. Laws of organization in perceptual forms. Psychologische Forschung, 4:301{350, 1923.

Figure 5. Results. Psychometric functions indicating the percent of times the subject indicated that a stimulus with the grey patch placed co-axially with black regions was darker. Top to bottom: results for display #1,#2, #3 and #4 respectively.