Logistic Discrimination Between Classes with Nearly ...

Logistic Discrimination Between Classes with Nearly Equal Spectral Response in High Dimensionality Helio Radke Bittencourt

Robin Thomas Clarke

Departamento de Estatística – Faculdade de Matemática Pontifícia Universidade Católica - PUCRS Porto Alegre, RS - Brazil [email protected]

Centro Estadual de Pesquisas em Sensoriamento Remoto Universidade Federal do Rio Grande do Sul - UFRGS Porto Alegre, RS - Brazil [email protected]

Abstract— Logistic discrimination can be regarded as a partially parametric approach to pattern recognition. The method is quite general and robust: it assumes nothing about the probability distribution of variables and requires the estimation of fewer parameters than some better-known procedures such as the Gaussian maximum likelihood discriminator. This paper describes the logistic discrimination model and gives results obtained when using it to classify an AVIRIS image with classes that are spectrally very similar. Keywords- digital image classification, pattern recognition, logistic discrimination, AVIRIS sensor, high dimensional data.

be multivariate Normal and the number of parameters to be estimated can be very large, since with k classes, k mean vectors (of dimension p×1) and k covariance matrices (dimension p×p, symmetric) are estimated. Reference [2] states that the estimation of the within-class covariance matrices is one of the most difficult problems in dealing with highdimensional data. Fortunately the assumptions required for logistic discrimination are considerably weaker and the number of parameters to be estimated is smaller, as shown in the following sections. II.

I.

INTRODUCTION

Statistical approaches to the problem of pattern recognition have been extensively discussed in the scientific literature and have been implemented in a large number of commercial systems. In such approaches, each pattern is regarded as a pdimensional random vector, where p is the number of characteristics used in classification. A survey published in IEEE Transactions on Pattern Analysis and Machine Intelligence [1] of papers that have appeared since 1979 found 350 articles on pattern recognition, of which 300 were concerned with aspects of statistical approaches to the problem. There are now many statistical methods for classification, each presenting advantages and disadvantages. Techniques requiring assumptions on the functional shape of the variables composing the feature space involve parameter estimation, and are therefore termed parametric. The number of parameters to be estimated can vary a great deal according to the classifying method used and the number of classes to be discriminated. Logistic discrimination, derived from multinomial logistic regression, is a partially parametric method for discriminating between classes which has some useful characteristics and also certain advantages over the better-known Gaussian maximumlikelihood classification procedure, widely used for classifying digital images. Like the Gaussian classification procedure, logistic discrimination is a supervised classification procedure that requires training data from which model parameters are estimated. However in the Gaussian maximum-likelihood method, the underlying probability distributions are assumed to

LOGISTIC DISCRIMINATION

The logistic regression model for binary (0,1) response variables is widely used in the medical and biological sciences, especially in epidemiology. Reference [3] describes an extension to the logistic model in which response variables may have one of k discrete values (k>2), by which the model can be used to discriminate between k classes. In logistic discrimination, the probability that a pixel with vector x belongs to the class wi is estimated directly from the expressions (1).

P (wi | x ) =

(

exp ȕ i 0 + ȕ Ti x k −1

(

)

1 + ¦ exp ȕ j 0 + ȕ Tj x j =1

(1)

)

where ª x1 º «x » 2 x=« » « » « » «¬ x p »¼

ª β i1 º «β » i2 ȕi = « » « » « » «¬β ip »¼

1 ≤ i ≤ k −1

The quantities ȕi 0 and ȕ i are model parameters, with ȕi 0 termed the intercept and ȕ i is a vector of parameters associated with the p characteristics of the vector x. The logistic model requires the estimation of k-1 vectors of parameters ȕ i , corresponding to the k-1 classes present in the image. The k-th class is taken as a basis, from which the natural log of the ratio

0-7803-7930-6/$17.00 (C) 2003 IEEE

of the two probabilities (in words, ‘the log of the ratio probability of belonging to class wi, divided by the probability of belonging to class wk’) become linear functions of the parameters. This logarithm is the logit function. Hence, in a model for discriminating between k classes, there are k-1 logit functions gi( x ), as in (2). T

gi(x) = ln [P(wi | x) / P(wk | x)] = βi0 + β i x

1 ≤ i ≤ k −1

two classes with different variance-covariance matrices; for the logistic classifier, this number falls to 221. TABLE I.

NUMBER OF PARAMETERS TO BE ESTIMATED USING LOGISTIC DISCRIMINATION AND GAUSSIAN MAXIMUM LIKELIHOOD MODELS

Logistic Discrimination

Gaussian Maximum Likelihood (equal Σ)

(2)

(k − 1)( p + 1)

The author in [4] states that the assumption of linearity is fundamental to the logistic approach and, for that reason, called it a partially parametric model, since only the logit functions are modeled. Despite the apparent strength of this assumption, authors of references such as [5] and [6], hold that the logistic model can be used with a wide range of probability distributions and, in theory at least, logistic discrimination has greater robustness than the method of Gaussian maximum likelihood. The classification rule in logistic discrimination is very simple, given by expression (3).

kp +

Gaussian Maximum Likelihood (unequal Σ)

p ( p + 1) · § k¨ p + ¸ 2 © ¹

p( p + 1) 2

1000

Number of parameters

800 600 400

(3)

0 1

3

PARAMETERS ESTIMATING

9

11

13

15

17

19

21

1000

Number of parameters

800 600 400 200 0 1

3

5

7

9

11

13

Dimensionality (p)

15

17

19

21

Figure 1. Number of parameters to be estimated by logistic discrimination and by Gaussian maximum likelihood (covariance matrices unequal)

Spectral response

Fig. 1 shows that for logistic classification, the two quantities dimensionality (p) and number of classes (k) define a plane, whilst for the Gaussian classifier they define a quadratic surface. For example, to discriminate between three classes in 16 dimensions using a logistic model requires 34 parameters to be estimated; the Gaussian classifier requires 456 parameters when the three variance-covariance matrices are distinct, although this number falls to 184 when they can be considered equal. An AVIRIS image, with 220 spectral bands, requires almost 50 thousand parameters to discriminate between just

7

Dimensionality (p)

The procedure for estimating the logistic model parameters is based on maximization of the likelihood function "(x, ȕ) . For this to be possible, n training pixels x1 , x 2 , , x n are required, with correct knowledge of the classes wi to which they belong. The estimates of the k-1 vectors of parameters ȕ i are those that maximize the log-likelihood function. As the expression is a non-linear function, numerical methods are needed to find its maximum. These methods are iterative and are available in a number of statistical software packages. For the present paper we used the CATMOD procedure of the SAS software, in which the Newton-Raphson iterative procedure gave fairly rapid convergence. According to [7], the number of parameters to be estimated in logistic discrimination is considerably smaller than that needed in the Gaussian Maximum Likelihood (GML) method, which is a considerable advantage. While logistic discrimination requires (p+1) parameters for (k-1) classes, in the Gaussian case p parameters are necessary for each mean vector and, in general, p(p+1)/2 for each of the k covariance matrices. Table 1 shows the expressions required to find the number of parameters in each case.

5

Number of classes (k)

III.

∀ j≠ i

if P(wi | x) > P(wj | x),

Number of classes (k)

200

x∈ wi

Class 1 - Corn-notill Class 2 - Soybean-notill Class 3 - Soybean - minimum

B1

B20

B39

B58

B77

B96 B115 B134 B153 B172 B191 B210

Bands

Figure 2. Spectral signature for the three classes: corn no till, soybean no till and soybean minimum

0-7803-7930-6/$17.00 (C) 2003 IEEE

IV.

RESULTS

An AVIRIS image with very similar spectral classes was classified by logistic discrimination with the following results. A small sample of 86 pixels was used to fit the logistic discriminator, and a validation sample with a total of 56 pixels was used, for which ground truth is known. Fig. 2 shows the spectral signature to the three classes that were discriminated: Corn-notill (w1), Soybean-notill (w2) and Soybean-minimum (w3).

The results presented in Table III indicate a clear distinction between the classes in despite of the very similar spectral responses. Since the logit functions are linear, the boundaries between classes are also linear, so that hyperplanes can satisfactorily separate the three classes. The total accuracy was estimated as 96,4%.

The logistic discrimination model analyzed 16 spectral bands out of a total of 220 available in AVIRIS. The 16 bands were systematically chosen, and a total of 34 parameters were estimated. Table II shows the results of the estimation process performed by SAS System, as well as the spectral bands used. Note that the class w3 – soybean minimum – was used as base.

Logistic discrimination does not require specific forms for the probability distributions of variables, and has greater generality than some of the better-known classification methods such as Gaussian maximum likelihood. Furthermore, the number of parameters that must be estimated is relatively small, so that the number of training samples can be smaller.

TABLE II.

Parameters / Bands

RESULTS OF THE ESTIMATION PROCESS g(x) Class w1 Corn-notill

g2(x) Class w2 Soybean-notill

Intercept

-1151,478

-584,5830

Band 008

0,1180

0,0043

Band 019

0,0824

-0,0679

Band 030

-0,1160

0,0422

Band 041

0,0022

-0,0274

Band 052

-0,0007

0,1790

Band 063

0,3130

0,2390

Band 074

-0,0368

-0,0050

Band 085

-0,4000

-0,6810

Band 096

-0,0025

0,1530

Band 118

0,5240

0,2930

Band 129

-0,1420

-1,0570

Band 140

-0,0766

0,5270

Band 173

-0,6420

-0,3150

Band 184

0,5460

1,3570

Band 195

0,1650

0,6520

Band 206

0,2240

-0,7530

TABLE III.

TABLE OF CLASSIFICATION BASED ON 56 TEST PIXELS Classification by Logistic Discrimination

True class

Corn notill (w1)

Soybean notill (w2)

Soybean minimum (w3)

21 0 0 100,0% 0,0% 0,0% 0 17 1 0,0% 94,4% 5,6% 0 1 17 0,0% 5,6% 94,4% Overall percentage correctly classified = 96,4%

Corn notill (w1) Soybean notill (w2) Soybean minimum (w3)

V.

FINAL REMARKS

The assumption that the logit functions are linear in the parameters is less critical than the assumption, for Gaussian discrimination, of multivariate Normality. In consequence, logistic discrimination is applicable in a wide variety of situations. A disadvantage of the method is that iterative methods are required to estimate its parameters, and in some cases computational problems are encountered due to collinearity of parameters. This causes problems where significance tests for parameters are required. Despite these problems, we consider that the logistic discrimination model is a viable alternative procedure for use in classifying digital images. Other experimental studies [7] with digital images show that, even when it is not possible to make inferences about the significance of parameter values, results of classification can be very satisfactory. REFERENCES [1]

[2]

[3] [4] [5]

[6]

[7]

A. K. Jain, R. P. Duin and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 04-37, 2000. V. Haertel and D. Landgrebe, “On the classification of classes with nearly equal spectral response in remote sensing hyperspectral image data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5., pp. 2374-2386, Sep 1999. D. Hosmer and S. Lemeshow, Applied Logistic Regression. New York: John Wiley & Sons, 1989. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley & Sons, 1992. J. A. Anderson, Logistic Discrimination. In Handbook of Statistics (Vol. 2) P. R. Krishnaiah and L. Kanal (Eds.) Amsterdam: NorthHolland, pp. 169-191, 1992. B. Efron, “The efficiency of logistic regression compared to normal discriminant analysis,” Journal of the American Statistical Association, vol. 70, no. 352., pp. 892-898, 1975. H. R. Bittencourt and R. T. Clarke, “Use of Logistic Discrimination to Classify Remotely-Sensed Digital Images,” in 12th Portuguese Conference on Pattern Recognition. Aveiro, Portugal, 2002.

0-7803-7930-6/$17.00 (C) 2003 IEEE

Logistic Discrimination Between Classes with Nearly ...

Logistic Discrimination Between Classes with Nearly ...

Suggest Documents

Correlations between Olfactory Discrimination

Isomorphisms between pattern classes

DISCRIMINATION BETWEEN WEIBULL AND LOGNORMAL ...

Improved discrimination between monocotyledonous and ...

Improved discrimination between monocotyledonous and

Serum Biomarkers for Discrimination between

relationship between nearly coincident spiking and

1 DISCRIMINATION BETWEEN CALCIUM ...

Discrimination Between Human Leukocyte Antigen

Discrimination between Alzheimer's Disease and

Discrimination between malignant and benign

cooperation between logistic service providers and ...

Relationship between logistic service and ...

Discrimination between normal and glaucomatous eyes with visual ...

Discrimination between Rain and Snow with a Polarimetric ... - NOAA

Correlation between two-point discrimination with other measures of ...

Getting Started With PROC LOGISTIC

Binary Logistic Regressioin with SPSS

Logistic Regression with Structured Sparsity

Discrimination of conifer height, age and crown closure classes using ...

Relationships between classes of monotonic functions

Cooperative Interactions between Different Classes of ... - PLOS

NEW DNA MARKERS FOR DISCRIMINATION BETWEEN CLOSELY ...

Discrimination between Neoplastic and Nonneoplastic ... - CiteSeerX