Logistic Discrimination Between Classes with Nearly Equal Spectral Response in High Dimensionality Helio Radke Bittencourt
Robin Thomas Clarke
Departamento de Estatística – Faculdade de Matemática Pontifícia Universidade Católica - PUCRS Porto Alegre, RS - Brazil
[email protected]
Centro Estadual de Pesquisas em Sensoriamento Remoto Universidade Federal do Rio Grande do Sul - UFRGS Porto Alegre, RS - Brazil
[email protected]
Abstract— Logistic discrimination can be regarded as a partially parametric approach to pattern recognition. The method is quite general and robust: it assumes nothing about the probability distribution of variables and requires the estimation of fewer parameters than some better-known procedures such as the Gaussian maximum likelihood discriminator. This paper describes the logistic discrimination model and gives results obtained when using it to classify an AVIRIS image with classes that are spectrally very similar. Keywords- digital image classification, pattern recognition, logistic discrimination, AVIRIS sensor, high dimensional data.
be multivariate Normal and the number of parameters to be estimated can be very large, since with k classes, k mean vectors (of dimension p×1) and k covariance matrices (dimension p×p, symmetric) are estimated. Reference [2] states that the estimation of the within-class covariance matrices is one of the most difficult problems in dealing with highdimensional data. Fortunately the assumptions required for logistic discrimination are considerably weaker and the number of parameters to be estimated is smaller, as shown in the following sections. II.
I.
INTRODUCTION
Statistical approaches to the problem of pattern recognition have been extensively discussed in the scientific literature and have been implemented in a large number of commercial systems. In such approaches, each pattern is regarded as a pdimensional random vector, where p is the number of characteristics used in classification. A survey published in IEEE Transactions on Pattern Analysis and Machine Intelligence [1] of papers that have appeared since 1979 found 350 articles on pattern recognition, of which 300 were concerned with aspects of statistical approaches to the problem. There are now many statistical methods for classification, each presenting advantages and disadvantages. Techniques requiring assumptions on the functional shape of the variables composing the feature space involve parameter estimation, and are therefore termed parametric. The number of parameters to be estimated can vary a great deal according to the classifying method used and the number of classes to be discriminated. Logistic discrimination, derived from multinomial logistic regression, is a partially parametric method for discriminating between classes which has some useful characteristics and also certain advantages over the better-known Gaussian maximumlikelihood classification procedure, widely used for classifying digital images. Like the Gaussian classification procedure, logistic discrimination is a supervised classification procedure that requires training data from which model parameters are estimated. However in the Gaussian maximum-likelihood method, the underlying probability distributions are assumed to
LOGISTIC DISCRIMINATION
The logistic regression model for binary (0,1) response variables is widely used in the medical and biological sciences, especially in epidemiology. Reference [3] describes an extension to the logistic model in which response variables may have one of k discrete values (k>2), by which the model can be used to discriminate between k classes. In logistic discrimination, the probability that a pixel with vector x belongs to the class wi is estimated directly from the expressions (1).
P (wi | x ) =
(
exp ȕ i 0 + ȕ Ti x k −1
(
)
1 + ¦ exp ȕ j 0 + ȕ Tj x j =1
(1)
)
where ª x1 º «x » 2 x=« » « » « » «¬ x p »¼
ª β i1 º «β » i2 ȕi = « » « » « » «¬β ip »¼
1 ≤ i ≤ k −1
The quantities ȕi 0 and ȕ i are model parameters, with ȕi 0 termed the intercept and ȕ i is a vector of parameters associated with the p characteristics of the vector x. The logistic model requires the estimation of k-1 vectors of parameters ȕ i , corresponding to the k-1 classes present in the image. The k-th class is taken as a basis, from which the natural log of the ratio
0-7803-7930-6/$17.00 (C) 2003 IEEE
of the two probabilities (in words, ‘the log of the ratio probability of belonging to class wi, divided by the probability of belonging to class wk’) become linear functions of the parameters. This logarithm is the logit function. Hence, in a model for discriminating between k classes, there are k-1 logit functions gi( x ), as in (2). T
gi(x) = ln [P(wi | x) / P(wk | x)] = βi0 + β i x
1 ≤ i ≤ k −1
two classes with different variance-covariance matrices; for the logistic classifier, this number falls to 221. TABLE I.
NUMBER OF PARAMETERS TO BE ESTIMATED USING LOGISTIC DISCRIMINATION AND GAUSSIAN MAXIMUM LIKELIHOOD MODELS
Logistic Discrimination
Gaussian Maximum Likelihood (equal Σ)
(2)
(k − 1)( p + 1)
The author in [4] states that the assumption of linearity is fundamental to the logistic approach and, for that reason, called it a partially parametric model, since only the logit functions are modeled. Despite the apparent strength of this assumption, authors of references such as [5] and [6], hold that the logistic model can be used with a wide range of probability distributions and, in theory at least, logistic discrimination has greater robustness than the method of Gaussian maximum likelihood. The classification rule in logistic discrimination is very simple, given by expression (3).
kp +
Gaussian Maximum Likelihood (unequal Σ)
p ( p + 1) · § k¨ p + ¸ 2 © ¹
p( p + 1) 2
1000
Number of parameters
800 600 400
(3)
0 1
3
PARAMETERS ESTIMATING
9
11
13
15
17
19
21
1000
Number of parameters
800 600 400 200 0 1
3
5
7
9
11
13
Dimensionality (p)
15
17
19
21
Figure 1. Number of parameters to be estimated by logistic discrimination and by Gaussian maximum likelihood (covariance matrices unequal)
Spectral response
Fig. 1 shows that for logistic classification, the two quantities dimensionality (p) and number of classes (k) define a plane, whilst for the Gaussian classifier they define a quadratic surface. For example, to discriminate between three classes in 16 dimensions using a logistic model requires 34 parameters to be estimated; the Gaussian classifier requires 456 parameters when the three variance-covariance matrices are distinct, although this number falls to 184 when they can be considered equal. An AVIRIS image, with 220 spectral bands, requires almost 50 thousand parameters to discriminate between just
7
Dimensionality (p)
The procedure for estimating the logistic model parameters is based on maximization of the likelihood function "(x, ȕ) . For this to be possible, n training pixels x1 , x 2 , , x n are required, with correct knowledge of the classes wi to which they belong. The estimates of the k-1 vectors of parameters ȕ i are those that maximize the log-likelihood function. As the expression is a non-linear function, numerical methods are needed to find its maximum. These methods are iterative and are available in a number of statistical software packages. For the present paper we used the CATMOD procedure of the SAS software, in which the Newton-Raphson iterative procedure gave fairly rapid convergence. According to [7], the number of parameters to be estimated in logistic discrimination is considerably smaller than that needed in the Gaussian Maximum Likelihood (GML) method, which is a considerable advantage. While logistic discrimination requires (p+1) parameters for (k-1) classes, in the Gaussian case p parameters are necessary for each mean vector and, in general, p(p+1)/2 for each of the k covariance matrices. Table 1 shows the expressions required to find the number of parameters in each case.
5
Number of classes (k)
III.
∀ j≠ i
if P(wi | x) > P(wj | x),
Number of classes (k)
200
x∈ wi
Class 1 - Corn-notill Class 2 - Soybean-notill Class 3 - Soybean - minimum
B1
B20
B39
B58
B77
B96 B115 B134 B153 B172 B191 B210
Bands
Figure 2. Spectral signature for the three classes: corn no till, soybean no till and soybean minimum
0-7803-7930-6/$17.00 (C) 2003 IEEE
IV.
RESULTS
An AVIRIS image with very similar spectral classes was classified by logistic discrimination with the following results. A small sample of 86 pixels was used to fit the logistic discriminator, and a validation sample with a total of 56 pixels was used, for which ground truth is known. Fig. 2 shows the spectral signature to the three classes that were discriminated: Corn-notill (w1), Soybean-notill (w2) and Soybean-minimum (w3).
The results presented in Table III indicate a clear distinction between the classes in despite of the very similar spectral responses. Since the logit functions are linear, the boundaries between classes are also linear, so that hyperplanes can satisfactorily separate the three classes. The total accuracy was estimated as 96,4%.
The logistic discrimination model analyzed 16 spectral bands out of a total of 220 available in AVIRIS. The 16 bands were systematically chosen, and a total of 34 parameters were estimated. Table II shows the results of the estimation process performed by SAS System, as well as the spectral bands used. Note that the class w3 – soybean minimum – was used as base.
Logistic discrimination does not require specific forms for the probability distributions of variables, and has greater generality than some of the better-known classification methods such as Gaussian maximum likelihood. Furthermore, the number of parameters that must be estimated is relatively small, so that the number of training samples can be smaller.
TABLE II.
Parameters / Bands
RESULTS OF THE ESTIMATION PROCESS g(x) Class w1 Corn-notill
g2(x) Class w2 Soybean-notill
Intercept
-1151,478
-584,5830
Band 008
0,1180
0,0043
Band 019
0,0824
-0,0679
Band 030
-0,1160
0,0422
Band 041
0,0022
-0,0274
Band 052
-0,0007
0,1790
Band 063
0,3130
0,2390
Band 074
-0,0368
-0,0050
Band 085
-0,4000
-0,6810
Band 096
-0,0025
0,1530
Band 118
0,5240
0,2930
Band 129
-0,1420
-1,0570
Band 140
-0,0766
0,5270
Band 173
-0,6420
-0,3150
Band 184
0,5460
1,3570
Band 195
0,1650
0,6520
Band 206
0,2240
-0,7530
TABLE III.
TABLE OF CLASSIFICATION BASED ON 56 TEST PIXELS Classification by Logistic Discrimination
True class
Corn notill (w1)
Soybean notill (w2)
Soybean minimum (w3)
21 0 0 100,0% 0,0% 0,0% 0 17 1 0,0% 94,4% 5,6% 0 1 17 0,0% 5,6% 94,4% Overall percentage correctly classified = 96,4%
Corn notill (w1) Soybean notill (w2) Soybean minimum (w3)
V.
FINAL REMARKS
The assumption that the logit functions are linear in the parameters is less critical than the assumption, for Gaussian discrimination, of multivariate Normality. In consequence, logistic discrimination is applicable in a wide variety of situations. A disadvantage of the method is that iterative methods are required to estimate its parameters, and in some cases computational problems are encountered due to collinearity of parameters. This causes problems where significance tests for parameters are required. Despite these problems, we consider that the logistic discrimination model is a viable alternative procedure for use in classifying digital images. Other experimental studies [7] with digital images show that, even when it is not possible to make inferences about the significance of parameter values, results of classification can be very satisfactory. REFERENCES [1]
[2]
[3] [4] [5]
[6]
[7]
A. K. Jain, R. P. Duin and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 04-37, 2000. V. Haertel and D. Landgrebe, “On the classification of classes with nearly equal spectral response in remote sensing hyperspectral image data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5., pp. 2374-2386, Sep 1999. D. Hosmer and S. Lemeshow, Applied Logistic Regression. New York: John Wiley & Sons, 1989. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley & Sons, 1992. J. A. Anderson, Logistic Discrimination. In Handbook of Statistics (Vol. 2) P. R. Krishnaiah and L. Kanal (Eds.) Amsterdam: NorthHolland, pp. 169-191, 1992. B. Efron, “The efficiency of logistic regression compared to normal discriminant analysis,” Journal of the American Statistical Association, vol. 70, no. 352., pp. 892-898, 1975. H. R. Bittencourt and R. T. Clarke, “Use of Logistic Discrimination to Classify Remotely-Sensed Digital Images,” in 12th Portuguese Conference on Pattern Recognition. Aveiro, Portugal, 2002.
0-7803-7930-6/$17.00 (C) 2003 IEEE