projection pursuit density estimation* abstract - SLAC National ...

2 downloads 0 Views 3MB Size Report
The formal goal of nonparametric density estimation is to estimate the probability density of a p-dimensional random vect,or X E Rp on the basis of i.i.d. ...
SLAC-PUB-3215 STAN-ORION-002 August 1983 Rev. March 1984 M

PROJECTION

PURSUIT

DENSITY

ESTIMATION*

JEROME H. FRIEDMAN and WERNER STUETZLE Statistics Department and Stanford Linear Accelerator Center Stanford University, Stanford, California 94305 ANNE SCHROEDER Institute National de Recherche en Informatique et Automatique Le Chesnay, France

ABSTRACT

The projection pursuit methodology is applied to the multivariate density estimation problem.

The resulting nonparametric

procedure is often less biased than kernel and

near neighbor methods. In addition, graphical information to help gain geometric insight into the multivariate

is produced that can be used

data distribution.

Submitted to Journal of the American Statistical Association, Theory and Methods

* Work supported by the Department of Energy under contracts DE-AC03-76SF00515 and DE-AT03-81-ERI0843, the Office of Naval Research under contract number ONRN00014-81-K-0340> and the Army Research Office under contract number DAAG29-8% K-0056.

1. Introduction

The formal goal of nonparametric density estimation is to estimate the probability density of a p-dimensional random vect,or X E Rp on the basis of i.i.d. observations Xl... x~ without making the assumption that the density belongs to a particular parametric family. Often in practice a more important objective is to gain geometric insight into the data distribution

in Rp.

Nonparametric estimation of univariate probability

density functions has been ex-

tensively studied. Examples of successful methods are the related techniques of kernel estimates (Parzen, 1962; Rosenblatt, 1971), near neighbor estimates (Loftsgaarden and Quesenberry, 1965) and splines (Boneva, Kendall, and Stefanov, 1971). A good overview is given by Tapia and Thompson (1978). Th e d irect extension of these methods to multivariate settings, however, has not been as successful in practice. This can partly be I a-

attributed to their deteriorating statistical performance caused by the so called “curse of dimensionality” (Bellman, 1961) which requires very large spans (radii of neighborhoods) in order to achieve sufficient counts. The resulting estimates are then highly biased. In addition, these methods do not provide any comprehensible information about the structure of the multivariate

point cloud.

Our approach t’o muhivariate density estimation is based on the notion of projection pursuit (Friedman and Tukey, 1974, and Friedman and Stuetzle, 1981). It attempts to overcome the curse of dimensionality

by extending the classical univariate density

estimation methods to higher dimensions in a manner that involves only univariate estimation. As a by-product, graphical information is produced that can be quite helpful in exploring and understanding the multivariate data distribution.

2. Overview

The goal of projection pursuit methods is to estimate multivariate functions by combinat,ions of smooth univariate (ridge) functions of carefully selected linear combinations of the variables.

2

Our projection pursuit density estimation (PPDE) method constructs estimates of the form

PM(X)= PO(X)ff

fmPm .x)7

m=l

(1)

where: -

PM is the densit.y estimate (or current model) after f%!fiterations of the procedure.

-

po is a given multivariate

-

8, is a unit vector specifying a direction in Rp, thus

density function to be used as the initial model.

0, +X=

e

BmjXi

i=l is a linear combination of the original coordinate measurements.

fm

-

is a univariate function.

From (1) PPDE is seen to approximate the multivariat,e

density by an initially

proposed density po, muhiplied (augmented) by a product of univaria,te functions

fm

of

linear combinations 8, * x of the coordinates. The choice of the init’ial density is left to the user and should reflect his best a priori knowledge of the data. A Gaussian density with sample mean and sample covariance matrix is often a natural choice. The purpose of PPDE is to choose the directions fm(em

. x).

0m and construct the corresponding functions

The product of these functions estimates the ratio of the data density to

the initial model density. From (1) we obtain the recursion relation ??4dx)

Since

fhl

is used to modify PM-~

=

PM-l(X)fM(oM

(4

’ x)0

Taoobtain phf, we refer to t,he

fnl

as “augmeming

functions”. The recursive definition of the model (2) suggests a stepwise approach for its construction.

At the M-th

iteration, there is a current model phf-l(x)

constructed in the

previous steps. (For t)he first step, M = 1, the current model is the initial model PO(X) specified by the user.) Given PM-~(X),

we seek a new model PM(X) to serve as a better

3

approximation

to the data density p(x).

augmenting function

Thus a direction 0~

fM( t?M- X) are chosen to

and its corresponding

maximize the goodness-of-fit of phf(x).

We measure relative goodness-of-fit by the cross-entropy term of the Kullback-Leibler distance

w= J

l”gPM(“)

P(“)

(3)

dx.

From (2) we see that W achieves its maximum at t)he same location as

f’-@M,fA,l) = /

kfM(~M

’ x, dx)

This is to be maximized under t,he constraint

that PM(X)

i.e.S pnf(x)dx

dx.

(4

be properly normalized,

= 1. For a given direction OhI and known p(x),

(5) a-

eM represent the data and current model is seen to maximize (4). Here p eM and PM-i-1

marginal densities along the (one-dimensional) subspace spanned by en{. Using this fM for given Oh{, it remains to find the direction 811,1for which (4) achieves t’he maximum value. The optlima eM and its corresponding augmenting function fM(oM

. x) define

the new model through (2). In actual applications the data density p(x) is unknown. We have, instead, a sample of N i.i.d. observations xl, x2, . ..xN from p(x). Th e cross-entropy W is e&mated

by

the log-likelihood

1 N CV = N .x lOgfl~,l(Xi).

(6)

1=1 Analogously, %!@n/r, f,~)

is estimated by

lN 6,~(emu fM)

=

N

&

log fM(eM

* xi),

(7)

i-l

where

fM(eM f x)

is an estimate for the ratio of data and model marginals along @*,I.

The optimal value tiM maximizing & ( eM, fM)

and thus the log likelihood I@ of the

new model is det’ermined by numerical optimization.

4

3. Estimation We now discuss the estimation of

Procedures

f(0 . x), th e ratio

of data and model marginals

along a direction 8. First consider the current model marginal P’M-l(t? . x). Without loss of generality we let 0 be the first coordinate axis, that is 0 ax = xl. Then

P”f-&l) = /- PM-h+hkwhz

(8)

is continuous then

If Py&Xl)

(9)

=

1 El

2h

O” 1(x1 - h 5 z 2 x1 + h)pb&)dz J

(10)

-co

where

(11) From (8), one has

P”M-l(xl) = ;eo& J

1(x1 - h L Yl 5 Xl + h)PAGl(Y)dY

= limo $EpM-, [I@1- h I YI I xl + h)l. Cur estimate of P’M-1(x1)

(12)

is obtained from (12) by using a small finite value for h and

employing a Monte Carlo method to estimate the expected value. A Monte Carlo sample Yl? Y2, * * * ye,

of size Ns is generated with density PM-~(X)

i&-1

(Xl) = &

f$ “j-1

is taken as our estimate of P’M-1(x1).

1(x1 - h -< yjl

and

5 xl + h)

Since the choice of xl as the direction 8 was

arbitrary, (13) can equally well be written

5

-_-.=w

for any 8. Note that the same Monte Carlo sample can be used for all 0 and x.

In

Appendix 2, we discuss in detail procedures for generating a Monte Carlo sample from the density PM-~(X). By assumption, the data represent a sample from p(x) that can be used, in analogy with (14) to estimate the data marginal ps(B - x) by

fis

(0. X) =

&

(15)

cI(f?*x-h

Suggest Documents