Simulating from the posterior density of Bayesian wavelet regression ...

3 downloads 0 Views 115KB Size Report
For this reason, Barber, Nason & Silverman (2001) consider a pointwise approximation to the posterior distribution of f xi at each point xi given the observed data.
Simulating from the posterior density of Bayesian wavelet regression estimates S TUART BARBER Department of Mathematics, University Walk, University of Bristol, Bristol, U.K. Email:[email protected]

Abstract

Several authors have considered Bayesian approaches to wavelet shrinkage and thresholding. Relatively few of the resulting nonparametric curve estimates have associated uncertainty bands. We consider a simulation based approach to computing uncertainty bands based on the BayesThresh rule of Abramovich, Sapatinas & Silverman (1998).

1 Introduction We consider estimating an unknown function f when noisy data y1 ; : : : ; yn are available;

yi = f (xi ) + "i with xi

i = 1; : : : ; n = 2J ;

= i=n, and where the "i are assumed to be independent and identically N (0; 2 ) distributed.

f = (f (x1 ); : : : ; f (xn )) can be represented by its discrete wavelet transform d = (dj;k ; j = 0; : : : ; J , 1; k = 0; : : : ; 2j , 1). To b of the data vector y and threshold these values estimate d, we compute the discrete wavelet transform d One approach is by wavelet thresholding. The vector of function values

or shrink them towards zero. The inverse DWT can be applied to the resulting estimate of d, providing an estimate of f. Numerous wavelet thresholding rules have been proposed. We consider forming interval estimates of

f (xi ) for i = 1; : : : ; n to supplement the point estimates already available, using the Bayesian approach of Abramovich, Sapatinas & Silverman (1998).

1

2 BayesThresh Several authors have proposed Bayesian thresholding rules. One such rule is given by Abramovich, Sapatinas & Silverman (1998), and referred to as BayesThresh. A prior is placed upon each of the wavelet coefficients dj;k , and updated by the observed data y to form a posterior for each coefficient. We then estimate each coefficient by the median of its posterior distribution and the inverse DWT is used upon the resulting estimates to form an estimate of f. The priors for the wavelet coefficients are mixtures of a normal distribution and a point mass at zero;

dj;k  j N (0; j2 ) + (1 , j )(0)

j = 0; : : : ; J , 1; k = 0; : : : ; 2j , 1:

(1)

Here, the j lie within [0; 1] and  (0) is a point mass at zero. Hyperparameters j and j2 are assumed to take the form

j2 = 2, j C1

and

n

o

j = min 1; 2, j C2 ;

j = 0; : : : ; J , 1;

where and are chosen and C1 and C2 are determined empirically from the data. In practice,

2 is

also estimated from the data. It is assumed that the coefficients are independent. This prior is a limiting case of the prior model proposed by Chipman, Kolaczyk, & McCulloch (1997), which is formed as a mixture of two normal distributions.

3 Analytic credible bands The posterior distribution for the value of f at some point xi is a weighted sum of posteriors of wavelet coefficients dj;k . Thus, it is conceptually possible to find these posteriors analytically. However, the complexity of these posteriors makes an analytic solution to the problem prohibitive. Consider the prior of Chipman et al. (1997);

dj;k  pj N (0; c2j 2j ) + (1 , pj )N (0; 2j )

j = 0; : : : ; J , 1; k = 0; : : : ; 2j , 1:

(2)

Here, the hyperparameters pj ; cj and j are all found empirically from the data, making this procedure more “automatic” than the BayesThresh rule.

2

The prior used by Abramovich et al. (1998), given in equation (1) can be viewed as a limiting case of (2) where 2j

! 0 and cj ! 1 in such a way that cj j ! j . For either of these priors, the posterior 2

2 2

2

distribution of f (xi ) given the observed data y is

[f (xi )jy] 

XX j

j;k (xi ) [dj;k jy] :

k

Now, for the prior model (2), the posterior of each

[dj;k jy] is a mixture distribution with two normal

components. When two of these distributions are added, the resulting mixture distribution has 4 normal components. Continuing to add terms to this distribution, taking the convolution of all posteriors would give a posterior distribution of

f (xi ) with 2n components.

n coefficient

Using the BayesThresh

prior (1), many of these components would combine to form a point mass at zero. However, there would still be a total of 2J + 1 components in the posterior distribution of f (xi ). This is computationally prohibitive, so we turn to a simulation approach instead, detailed in section 4.

4 Simulation Using the prior distribution (1) for the wavelet coefficients, we have posterior distributions for each dj;k

b

given the observed data via the estimated wavelet coefficients dj;k , as given in equation (3).

9 8 < = !j;k b d , d  = (  +  ) 1  j;k j;kqj j F (dj;k jdbj;k ) = ; + 1 + !j;k I (dj;k > 0) 1 + !j;k : j =  + j 2

2

2

where

2

2

(3)

q

( ) 2 2 j2 db2j;k 1 , j  + j !j;k = exp , 2 2 2 j  2 ( + j ) n

o

Note that the posterior distributions of the wavelet coefficients are independent given the data. We sample a set of wavelet coefficients ds = dsj;k ; j = 0; : : : ; J , 1; k = 0; : : : ; 2j and invert the DWT to produce a sample from the posterior distribution of y. We note that the posterior of dj;k has a discontinuity at zero, and define

n

o

u0 = lim F ,1 (dj;k jdbj;k ) dj;k "0

3

n

o

u00 = lim F ,1 (dj;k jdbj;k ) : dj;k #0

b

b

We also define Fu,1 (dj;k jdj;k ) and Fl,1 (dj;k jdj;k ) to be the inverse cumulative distribution function of

dj;k given dbj;k with dj;k less than and greater then zero respectively. That is, Fu (dj;k jdbj;k ) =

1 1+ j;k

Fl (dj;k jdbj;k ) =

1 1+ j;k

! !

 

( (

dj;k ,dbj;k j2 =(2 +j2 ) q j = 2 +j2 dj;k ,dbj;k j2 =(2 +j2 ) q j = 2 +j2

)

+ 1+!!

j;k

)

j;k

dj;k > 0 dj;k < 0:

b

We sample a value dsj;k from the posterior density of dj;k given dj;k using the following algorithm.



Generate u  U [0; 1].

 If u 2 [u0; u00 ], set dsj;k = 0.  If u < u0, set dsj;k = Fl, (ujdbj;k ). 1

 If u > u00, set dsj;k = Fu, (ujdbj;k ). 1

We use this algorithm to resample a value for each wavelet coefficient, generating a full set of values ds . This set of coefficients can be transformed via the inverse discrete wavelet transform to give a sample

K such sets of values. Pointwise 100(1 , )% posterior credible intervals can then be found for each f (xi ) by ordering the sampled values and taking the central 100(1 , )% as the credible interval. from the posterior density of

f

given the observed data. We sample a total of

We show several examples, the first using the pointwise polynomial function of Nason & Silverman (1994). This function is defined to be

8 > > < 4x (3 , 4x) y (x) = (4x ) , 10x + 7) , > > : x(x , 1) 2

1

4 3

16 3

2

2

3 2

x 2 [0; 12 ] x 2 [ 21 ; 34 ] x 2 [ 34 ; 1];

and is shown in the topmost plot of figure 1. The centre plot shows the same function but with the

N (0; 0:1) error value at each of 512 equally spaced points. The lower plot in figure 1 shows the BayesThresh estimate of f1 (x) with the dotted lines indicating the pointwise 95% credible

addition of a

4

intervals when 200 samples were taken. The computation for this figure required approximately one minute, running in Splus. Further examples are shown in figures 2 and 3, with pointwise 95% credible intervals for the test functions of Donoho & Johnstone. As with figure 1, 200 simulations were used. In all cases, the default BayesThresh choices of

= 0:5, = 1 were employed; superior performance can be gained

by tuning the choice of these parameters. We note that the intervals are not symmetric about the BayesThresh estimate, and thus the approach of estimating the variance of each

fb(xi ) and reporting an interval of fb(xi ) plus or minus two standard

errors is potentially misleading. For this reason, Barber, Nason & Silverman (2001) consider a pointwise approximation to the posterior distribution of f (xi ) at each point xi given the observed data which takes the first four moments into account. This results in interval estimates which incorporate information on the skewness and kurtosis of the posterior distributions of the wavelet coefficients.

Bibliography A BRAMOVICH , F., S APATINAS , T. & S ILVERMAN , B.W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society, Series B 60, 725–749. BARBER , S., NASON , G.P. & S ILVERMAN , B.W. (2001). Posterior probability intervals for wavelet thresholding. To appear in the Journal of the Royal Statistical Society, Series B. C HIPMAN , H.A., KOLACZYK , E.D., & M C C ULLOCH , R.E. (1997). Adaptive Bayesian wavelet shrinkage. Journal of the American Statistical Association 92, 1413–1421. NASON , G.P. & S ILVERMAN , B.W. (1994).

The discrete wavelet transform in S. Journal of

Computational and Graphical Statistics 3, 163–191.

5

0.0

0.2

0.4

y

0.6

0.8

1.0

True function

.................... ............... ............ .......... ........ .. .. .. .... ....... ....... ....... ...... ...... .. .. ..... ....... ......................................... ...... ................... ....... ............... ....... .. .............. .. ............ ..... .. .. . ............ .... .. ............. . .. . ........... ..... .. .. . ........... .... .. .. ............. .. ...... ............. .. .. .. .. ............... ...... .. .. ................. ... .. .......................... ......... .. ... .... .... ....................... .................. 0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

0.6

0.8

1.0

x

y

0.0

0.5

1.0

Noisy data

0.0

0.2

0.4 x

0.0

0.2

0.4

y

0.6

0.8

1.0

95% CIs

0.0

0.2

0.4 x

Figure 1: Pointwise 95% credible intervals for the piecewise polynomial function of Nason & Silverman (1994).

6

bumps Noisy data 0 10 20 30 40 50

Noisy data -5 0 5 10 15 20

blocks

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

x

-10

blocks 0 10

20

bumps 0 10 20 30 40 50 60

x

0.0

0.2

0.4

0.6

0.8

1.0

0.0

x

0.2

0.4 x

Figure 2: Pointwise 95% credible intervals for the “blocks” and “bumps” test functions of Donoho & Johnstone.

7

doppler

-15

Noisy data -5 0 5 10

Noisy data -15 -5 0 5 10

heavi

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

x

doppler -10 -5 0 5 10

heavi -15 -10 -5 0 5 10

x

0.0

0.2

0.4

0.6

0.8

1.0

0.0

x

0.2

0.4 x

Figure 3: Pointwise 95% credible intervals for the “Heavisine” and “doppler” test functions of Donoho & Johnstone.

8