The method of maximum mutual information for ... - Semantic Scholar

IEEE TRANSACTIONS ON MAGNETICS, VOL. 36, NO. 4, JULY 2000

1741

The Method of Maximum Mutual Information for Biomedical Electromagnetic Inverse Problems Renjie He, Liyun Rao, Shuo Liu, Weili Yan, Ponnada A. Narayana, and Hartmut Brauer

Abstract—This paper proposes a new method for biomedical electromagnetic inverse problems based on the technique of maximum mutual information. The new method is different from the conventional methods for it will not depend on the widely used regularization technique. The new method provides a general paradigm for developing algorithms dealing with various biomedical inverse problems that can be described using lead field, including localization and imaging of neural source activities in brain and heart from EEG/MEG and ECG/MCG, it is also possible to apply it to other electromagnetic inverse problems like electrical impedance tomography etc. This paper mainly provides the theoretical development, with MEG inverse problem as case studies. Index Terms—Biomedical electromagnetic imaging, source localization, inverse problems, maximum mutual information.

I. INTRODUCTION

T

HE main problems in solving biomedical electromagnetic inverse problem would be the nonuniqueness and ill-posed properties. Even the nonuniqueness can be conquered, the illposed properties are not easy to be settled, and various regularization methods are widely used to make compromise between the stability and correctness of the inverse solution. This paper proposes a new idea for solving the biomedical electromagnetic inverse problem. The new idea is based on the maximum of mutual information (MI) between forward mapping results and measured data. We study the algorithms for adjusting the solution of inverse problem, and arrive at the maximum of the mutual information at last. While the direct algebraic formulation like minimum norm least squares has the advantage of computational speed, it’s difficult to investigate an algebraic based algorithm for MI method. The application of information in biomedical inverse problems is not new, in [1], a method based on maximum entropy is proposed as a new regularization criterion. Other applications Manuscript received October 25, 1999. This work was supported in part by the National Natural Science Foundation of China under Grants 59777006 and 59937160. R. He is with Hebei University of Technology, Tianjin 300130, China and University of Texas Medical School at Houston, Houston, TX 77030 USA (e-mail: [email protected]. edu). L. Rao is with Hebei University of Technology, Tianjin 300130, China and Baylor College of Medicine, Houston, TX 77030 USA (e-mail: [email protected]). S. Liu and W. Yan are with Hebei University of Technology, Tianjin 300130 China (e-mail: [email protected]). P. A. Narayana is with University of Texas Medical School at Houston, Houston, TX 77030 USA. H. Brauer is with Technical University of Ilmenau, D-98684 Ilmenau, Germany. Publisher Item Identifier S 0018-9464(00)06738-8.

of entropy and cross-entropy appeared in [2]–[7]. Also, using information to measure the distance between various recorded biomedical electromagnetic data is well understood [8]. The cross-entropy method in [4] is based on Kullback-Lieber distance that measures the distance between two probability distributions, which is similar to our ideas. However, they make assumption about the distribution of current source, and the minimization of cross-entropy is carried out to adjust the probability distribution of source. The entropy measurement used in [1] is working directly on the distribution of the recoding data on surface, but they fail to apply the robust measurement of mutual information or cross-entropy. We propose to use mutual information to measure the distance of the probability distribution between the recording data and test data on the surface in this paper, which provides a new general approach to biomedical electromagnetic inverse problems. Mutual information has several major advantages over other measures. It is robust toward changes in intensity, position and noise; it works well when there are only part data available, it can work directly on raw data and requires no preprocessing. Detailed discussions of mutual information can be found in both [9] as well as [10]. In essence, it works on the basis of how well one mapping data explains the other. Based on the belief that mutual information can provide the reliable measure of the distance between the mapping data, we develop the new method as below. II. THEORETICAL DEVELOPMENT For biomedical electromagnetic inverse problem, we seek an estimate of the neural electric sources in brain or heart that in the form of are related to the surface mapping data (1) where is the mapping function. By applying the concept of lead field, (1) can be in the form of a matrix expression. Assuming the proposed source will produce a test mapping data on the surface, we require the mutual information between the reference (recorded) mapping data and test mapping data arriving at maximum, that is (2) where

is the mutual information. Entropy is defined as

0018–9464/00$10.00 © 2000 IEEE

(3)

1742


while the joint entropy of

and

where

is

is between 0 and 1, and defined as (13)

(4) where is the corresponding probability density function (pdf). Defining the entropy of the reference mapping data and test and , then the mutual informapping data as mation is (5)

is the Mahalonobis distance (14) To maximum the mutual information between and , above results can be used on the derivative of mutual information , and we have

To apply this mathematics to the discrete mapping data, we deand fine the available intensity levels in mapping data are and , then the entropy of is given by (6) The mutual information between

and

is (15)

(7) where we define To estimate the probabilities of we define

and

using histograms, (16)

Histogram Histogram

(17)

(8)

Histogram

(9)

is the number of points in the mapping data sample where space. Alternatively, using Parzen window method to estimate the underlying pdf from a set of samples, we have [7]

is the covariance matrice of joint distribution of and and can be assumed as block diagonal, then

, (18)

By that, we obtain the estimation of the derivative of the mutual information in the form of

(10) is a Gaussian density with covariance . Using where to approximate the expectation, the another set of samples entropy of random variable can be estimated as (11) The derivative of entropy with respect to

can be expressed as

(12)

(19) , it will be especially convenient To evaluate if the mapping operator (the forward problem) would be linear, which is usually satisfied by the definition of most of biomedical electromagnetic inverse problems that are based on the idea of lead field, which comes from the fact that the effects of sources on the surface are additive. The local maximum can be inferred by a stochastic gradient decent approximation algorithm using (19) which is well known in signal processing and controls as LMS algorithm, the whole procedure will be iteratively executing the following samples from to form set , drawing steps: drawing samples from to form set , performing . The careful design of uniform sampling methods is very important for above algorithm. Besides that, problems related to above algorithm include estimation of covariance, adjustment of , and global search of maximum. The last problem usually can be mollified by using multi-scale methods [11]. Lots of studies can be referred in the literature of signal processing, controls

HE et al.: MAXIMUM MUTUAL INFORMATION FOR BIOMEDICAL ELECTROMAGNETIC INVERSE PROBLEMS

1743

Fig. 1. (a) and (b) are the computational models of MEG inverse problems. (c) and (d) show the reconstructed dominant components (greater than 5% of the dipole intensity) and minor components (less than 1% of the dipole intensity) respectively corresponding to the source mesh shown in (a). (e) and (f) show the reconstructed dominant components (greater than 20% of the dipole intensity) and minor components (less than 2% of the dipole intensity) respectively corresponding to the source mesh shown in (a).

and neural networks on stochastic gradient decent approximation algorithm. However, while the traditional regularization can be avoided, the payment for this method is heavy computational load. Based on the presentation of (8),(9) and (7), traditional optimization methods like Powell’s minimization or Nelder and Mead’s downhill simplex method can be applied directly [12]. However, due to the local minima, global optimization should be carried out in a multi-scale form. Furthermore, stochastic global searching algorithms such as simulated annealing or evolutionary algorithm can be used to deal with the local minima. III. CASE STUDIES—INVERSE PROBLEM OF MEG Our studies assume two basic computational structures as in [1], the outside hemisphere of the geometry model for MEG is of radius 12 cm, consists of 127 sensors, arranged in 6 rings plus one at the pole. The distribution of sensors and sites in sensors hemisphere is (1, 6, 12, 18, 24, 30, 36). are distributed on a hemisphere to simulate the surface of scalp. -cm The first source mesh is L-shaped wall consisting of and degrees) with planar surfaces (at phi mesh nodes distributed uniformly on each plane [as shown in Fig. 1(a)]. Placing dipoles on and perpendicular to these planes serves as elementary model for brain activity on deep sulci. The

second source mesh contains 127 sites, distributed uniformly on a hemisphere of radius 8 cm, concentric with that of the hemisphere surface of sensors plane [as shown in Fig. 1(b)]. Here the assumed dipoles have merely theta and phi components, we assume the sensors respond only to the radial components of magnetic field , the return currents are neglected, so that the fields at the sensors follow directly from Biot-Savart law. It’s an important exception that the MEG forward problem for spherically symmetric head model can be solved directly. The radial component of the field at sensor location can be com, and for a spherically symmetric puted as conductor, the vector normal to the surface can be expressed as for all on all surfaces. In this case, the convanishes, and tribution of the passive currents to can be simply computed from the well-known primary current model [13]

(20) is the distance between the observation point where and the source point , with magnitude . The primary current source exists only at discrete point, being a dipole with moment located at .

1744


The reconstructed source distribution is restricted to a discrete grid of sites. The forward problem is (21) is where unknown source components, is the lead field matrix the in computed from (20). The relationship of (13) can be easily derived as

(22) . In this case, computation is greatly where reduced. However, we make studies using a histogram-based method because of the facility for programming. Different from the stochastic gradient decent approximation algorithm, histogram based method needs to rebin the intensity distribution on the surface before applying (8), (9) and (7). Also, the zero probability items should be excluded from the summation of (7). In our studies, the recoded mapping data and the forward mapping data are scaled respectively by the dynamic range of data, and symmetrically rebinned into 64 intensity levels. IDL5.2 is used for programming, which provides two efficient optimization programs, AMOEBA and POWELL. AMOEBA performs the downhill simplex method, while POWELL performs the Powell’s minimization. A simple multi-scale scheme is executed to sample the mapping data in the ratio of 3:2:1, and produces the sample distribution as (1, 2, 4, 6, 8, 10, 13), (1, 3, 4, 9, 12, 15, 18) and (1, 6, 12, 18, 24, 30, 36), the samples in each scale are blurred by averaging its neighbor samples. Optimization of MI is performed in each scale using Powell’s minimization, and results are propagated to next scale as the initial estimation of the inverse solution. The noise in the recoded mapping data is treated in the same manners as in [1], and the prominent reconstructed sources are shown in Fig. 1(c) and (d). In Fig. 1(c), a single dipole located at (2.7, 0, 6.7) is applied, the recoded mapping data is added with 5% Gaussian noise, and the prominent components are selected as those with intensity greater than 5% of the source intensity. In Fig. 1(d), minor components are displayed. In Fig. 1(e), a single dipole located at mid way of mesh sites is applied, which is indicated using light gray arrow, the recoded mapping data is added with 20% Gaussian noise, and the prominent components are selected as those with intensity greater than 20% of the source intensity, and are indicated using darker arrows. The minor components are shown in Fig. 1(f).

Optimization using Powell’s method is time-consuming for the high dimensional problems, in both cases, the program will run over 8 hours on a Pentium III 450 MHz PC with 256 MB RAM in IDL environment. However, its efficiency may be improved by programming using compliable language like C/C , and careful design of the multi-scale and optimization scheme would inevitably improve the speed. IV. CONCLUSION We present in this paper a new method for biomedical electromagnetic inverse problems that will not resort to regularization method. The theoretical development is described, with case studies on MEG inverse problem being demonstrated to provide experimental support. The main disadvantage of the new method is its inefficiency in computation, which would be an interesting topic for further researches. REFERENCES [1] M. Huang, R. Aaron, and C. A. Shiffman, “Maximum entropy method for magnetoencephalography,” IEEE Transactions on Biomedical Engineering, vol. 44, pp. 98–102, 1997. [2] C. J. S. Clarke and B. S. Janday, “The solution of biomagnetic inverse problem by maximum statistical entropy,” Inverse Problems, vol. 5, pp. 483–500, 1989. [3] A. A. Ioannidest, J. P. R. Bolton, and C. J. S. Clarke, “Continuous probabilistic solution to the biomagnetic inverse problem,” Inverse Problem, vol. 6, pp. 523–542, 1990. [4] F. N. Alavi, J. G. Taylor, and A. A. Ioannidest, “Estimates of current density distributions: I. Applying the principle of cross-entropy minimization to electrographic recodings,” Inverse Problem, vol. 9, pp. 623–639, 1993. [5] R. E. Greenblatt, “Probabilistic reconstruction of multiple sources in the bioelectromagnetic inverse problem,” Inverse Problem, vol. 9, pp. 271–284, 1993. [6] T. R. Knosche, E. M. Berends, H. R. A. Jagers, and M. J. Peters, “Determining the number of independent sources of the EEG: A simulation study on information criterion,” Brain Tomography, vol. 11, pp. 111–124, 1998. [7] P. Viola and W. M. Wells III, “Alignment by maximization of mutual information,” in Proc. International Conference on Computer Vision, Boston, MA, USA, June 1995, pp. 16–23. [8] T. Inouye, S. Toi, Y. Matsumoto, K. Shinosaki, A. Iyama, and N. Hosaka, “The 3-dimensional representation of EEG distance by use of ShannonGelfand-Yaglom information measure during mental arithmetric,” Brain Tomography, vol. 8, pp. 379–384, 1996. [9] T. M. Cover and J. A. Thomas, Elements in Information Theory. New York, NY: Wiley, 1991. [10] S. W. Golomb, R. E. Peile, and R. A. Scholtz, Basic Concepts in Information Theory and Coding. New York, NY: Plenum, 1994. [11] K. Wang, H. Begieifer, and B. Projesz, “Spatial enhancement of eventrelated potentials using multiresolution analysis,” Brain Tomography, vol. 10, pp. 191–200, 1998. [12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd ed: Cambridge University Press, 1992. [13] J. C. Mosher, R. M. Leahy, and P. Lewis, “Matrix Kernels for the Forward Problem in EEG and MEG”, Los Alamos Technical Report: LA-UR-97-3812, 1997.