A New Strategy for Effective Learning in Population ...

5 downloads 0 Views 1MB Size Report
ABSTRACT. In this work, we focus on advancing the theory and practice of a class of Monte Carlo methods, population Monte Carlo. (PMC) sampling, for dealing ...
A NEW STRATEGY FOR EFFECTIVE LEARNING IN POPULATION MONTE CARLO SAMPLING M´onica F. Bugallo , V´ıctor Elvira? , and Luca Martino† 

Dept. of Electrical and Computer Engineering, Stony Brook University, New York (USA) ? T´el´ecom Lille, CRIStAL (UMR 9189) (France) † Dept. of Electrical Engineering, Universidad de Valencia (Spain) ABSTRACT

In this work, we focus on advancing the theory and practice of a class of Monte Carlo methods, population Monte Carlo (PMC) sampling, for dealing with inference problems with static parameters. We devise a new method for efficient adaptive learning from past samples and weights to construct improved proposal functions. It is based on assuming that, at each iteration, there is an intermediate target and that this target is gradually getting closer to the true one. Computer simulations show and confirm the improvement of the proposed strategy compared to the traditional PMC method on a simple considered scenario. Index Terms— Importance sampling, Monte Carlo methods, population Monte Carlo. 1. INTRODUCTION In Bayesian signal processing, all the information about the unknowns of interest is contained in their posterior distributions. The unknowns can be parameters of a model, or a model and its parameters. In many important problems, these distributions are impossible to obtain in analytic form. An alternative is to generate their approximations by Monte Carlo-based methods [1, 2]. In this work, we focus on Population Monte Carlo (PMC) sampling methods [3, 4] for dealing with problems with static parameters. PMC methodology belongs to the adaptive importance sampling (AIS) family [5, 6, 7, 8, 9, 10] and has been an active area of research for more than a decade since the publication of [3]. Since then, several variants have been proposed: the D-kernel algorithm [11, 12], the mixture population Monte Carlo algorithm [13], the nonlinear PMC [14] or the DM-PMC, GR-PMC and LR-PMC [4]. Other sophisticated AIS schemes have been recently proposed in the literature, e.g. the AMIS [15, 16, 17], APIS [18, 19] and LAIS [20, 21, 22] methods. PMC is based on importance sampling, which amounts to generating samples from a selected distribution called instruThis work has been supported by the National Science Foundation under Award CCF-1617986 and by the European Research Council (ERC) through the ERC Consolidator Grant SEDAL ERC-2014-CoG 647423.

978-1-5386-3954-2/16/$31.00 ©2016 IEEE

mental, importance or proposal probability density function (pdf). More specifically, a set of proposal pdf s is considered. These distributions are different from the target distribution because we assume that direct sampling from the target distribution is unfeasible. Good proposal functions are the ones that are close to the target distribution. Once the samples are generated from the set of proposal pdfs, they are assigned weights. The samples with their associated weights represent an approximation of the target distribution. This approximation is used for adapting the location parameters of the proposal pdfs. The process repeats and the proposal functions keep adapting as we proceed with the iterations, which is why the PMC methodology belongs to the AIS family. In this process, learning takes place from samples and weights obtained in previous iterations. In this paper, we devise a new method for efficient adaptive learning from past samples and weights to construct improved proposal functions. For the adaptation step, tempered target and weights are considered for improving the state space exploration. For the construction of the estimators, the standard importance weights are applied (differently from [23, 24]), so that the consistency is not jeopardized. The new method will be particularly useful for current signal processing challenges concerning complex systems. For instance, With large numbers of unknowns and/or data, the resulting target distributions are extremely peaky and very difficult to approximate. The novel PMC methodology uses a sequence of modified targets, which facilitates the adaptability of the algorithm and then improves its performance. 2. PROBLEM STATEMENT AND BASICS OF PMC Suppose that we want to approximate a target distribution by a set of samples and weights. Here x ∈ Rdx is the unknown state of the system. The distribution of interest is most often a posterior, π ¯ (x) = p(x|y1:Ny ), where yn ∈ Rdy represents observations with information about x, and y1:Ny ≡ {y1 , y2 , · · · , yNy }. In general, we know π ¯ (x) only up to a normalizing constant, i.e., we can evaluate π(x) ∝ π ¯ (x). More precisely, we have

1540

π(x) = `(y1:Ny |x)h(x),

(1)

Asilomar 2016

where ` is the likelihood function and h represents the prior pdf. 2.1. The basics of AIS First, we review the concept of importance sampling [25, 1, 2]. Let the samples used for approximation be drawn from π(x) itself and let them be denoted by x(m) , m = 1, 2, · · · , M . The approximating distribution is =

M X 1 δ(x(m) ), M m=1

(2)

Particle filtering in high-dimensional systems where δ(x(m) )with is Gaussian the unit approximations delta measure centered at x(m) .

When it is difficult or impossible to draw samples from π(x), Particle filtering in high-dimensional systems 9, 2014 function q(x), with a shape the alternative is to useJanuary a proposal with as Gaussian approximations close as possible to π(x) and support larger than that of Abstract

(m)

Particle high-dimensional systems In thisx paper filtering a new multiple approach π(x). If now ∼ q0 in (x), thenparticle thefiltering approximation of π(x) 0 we introduce for problems where the state-space of the system is of high-dimension. We with Gaussian approximations propose to break the space into subspaces and to perform separate particle Particle filtering high-dimensional systems 9, 2014 is inJanuary filtering in each of them. The two critical operations of particle filtering,

with Gaussian approximations

the particle propagation and weight computation of each particle filter M are performed wherever necessary with the aid of parametric distributions Abstract (m) (m)by received fromM other subspaces. The proposed method is demonstrated January 9,an2014 In this paper we introduce a new multiplesimulations particle filtering approach computer and the results show excellent 0 0 performance 0 when for problems where the state-space ofcompared the system of high-dimension. to is other implementationsWe of multiple particle filtering. January 9, 2014 propose to break the space into subspaces and to perform separate particlem=1 Abstract filtering in each of them. The two critical operations of particle filtering, Introduction In this paper we particle introducefilter a newM multiple particle filtering approach the particle propagation and 1 weight computation of each foraid problems where the state-space of the system(m) is of high-dimension. Abstract are performed wherever necessary with the of parametric distributions (m) We propose to break thethe space into subspaces and toare perform separate particle many real applications, systems in question characterized by highreceived other subspaces.aInnew The proposed method is demonstrated In thisfrom paper we introduce multiple particle filtering approach by 0 of particle 0 filtering, filtering in eachperformance of them.exhibiting Thewhen two critical dynamical models highlyoperations non-linear behavior and with computer simulations and the dimensional results show an excellent for problems where the state-space of the system is of high-dimension. We the particle propagation and weight computation of each particle filter compared to other implementations of multiple particle filtering. ensuing distributions that are distinctly non-Gaussian [?]. The non-linearity m=1 propose to break the space into subspaces and to perform separate particle

π b (x) ∝ =

X

X

w b

w ¯

δ(x

),

(3)

δ(x

),

(4)

are performed whereverjustify necessary with the aid of parametric distributions andcritical non-Gaussianity filtering in each of them. The two operations offeatures particle filtering,particle filtering (PF) as the methodology received from other subspaces. The proposed method is demonstrated by of choice for approximation of the posterior distributions of the parameters (m) (m) the particle propagation and weight computation each particle computerofsimulations and the results show(m) an excellent performance when w0consist π(x ) filter (m) are performed wherever necessary with the aid of parametric distributions of interestcompared [?]. Thetoobtained approximations of these distributions of other0implementations of multiple particle filtering. P (importance (m)drawn (j) received from other subspaces. The proposed method(particles) is demonstrated by by a selected 0 0 samples simulated instrumentalM qperformance (x )when byassigned 0and In many real simulations applications, in an question are characterized high- to the particles calculated computer andthe the systems results show excellent 0of weights j=1 w0 by or proposal) function compareddynamical to other implementations of multiple particle filtering. dimensional models exhibiting highly non-linear behavior and with 1 Introduction application of Bayes’ rule [?, ?].

Introduction

where w b

In many real applications, the systems in question are characterized by highdimensional dynamical models exhibiting highly non-linear behavior and with ensuing distributions that are distinctly non-Gaussian [?]. The non-linearity and non-Gaussianity features justify particle filtering (PF) as the methodology of choice for approximation of the posterior distributions of the parameters of interest [?]. The obtained approximations of these distributions consist of simulated samples (particles) drawn by a selected instrumental (importance or proposal) function and of weights assigned to the particles calculated by application of Bayes’ rule [?, ?].

1

and w ¯

=

=

π b1M (x) ∝

(m)

xi are the un-

(m)

w bi

1

1

x3,t

x4,t

1

1

x2,t

instrumental function ⇡1 (x)instrumental function ⇡0 (x)

Iteration 0

posterior p(x|y1:Ny )

i

i

1

1

1

x2,t

x3,t

x4,t

2,t 1

y

2,t y 1 3,t 1

y

1:Ny (1) i 1 4,t 1 y 1:N)y (M 1 i 0 y (1) 1 1 (M ) instrumental w1 1,t 1 (M )

y

y

3,t 1

y

y

4,t 1

y

y

y

y

y yy

y

y

y

y

y

y

y

y

i

y

π(xi

i

y

y

i y

y

i

y

i

)

(m) (m) (m) qi (xi |µi , C(m) ) (m)

,

(m)

=

w PM i

j=1

(j)

wi

, with m =

(m)

[Output, i = I]: Return the pairs {xi 1, . . . , M and i = 1, . . . , I.

(m)

,w bi

} for m =

2.2. Implementation of AIS: PMC sampling The PMC algorithm constitutes a possible implementation of the AIS methodology. In its most basic implementation, it considers M proposals (as many as samples) that are iteratively adapted (i = 1, · · · , I). In particular, location pa(m) rameters {µi }M m=1 will be adapted, while the parameters {C(m) }M m=1 are static (e.g., with Gaussian proposal densities, the location and the static parameters are the mean vectors and the covariance matrices, respectively). At each iteration, exactly one sample is drawn from each proposal, and its weight

4,tlikelihood 1 (Mi ) y 1 p(y1:Nyy|x) |x) function: p(x) (1) x x21 [0, x⇡2,ti (x)1 = p(y1:N and0 = 1] 1] and 0 0 0 = i 2i [0, x4,t 1 w1 w M x1,t 1 [0, 1]i and Marginalization i 2 0 =0 function: ⇡X |x) p(x)w1(1) xinstrumental i (x) = p(y1:N 2,t 1 y (M ) x (m) (m) 3,t 1 M w1 Iteration i x1,t π xiw ∝ b2 [0,Marginalization ), (5) 2,t 1]δ(x and 10 =w 0(M ) x3,tb11 (x) Marginalization Marginalization 1 x4,t1 1 xx2,t Example Iteration i Example x Marginalization 3,t m=1 Iteration i Iteration i 4,t 1 posterior: p(x|y1:N ) / p(y1:N |x)p(x) Example x1,t Marginalization i 1:Ny |x)p(x) x1:N posterior: x4,t p(x|yIteration 3,txprior 1:Ny ) / p(y posterior: p(x|y ) / p(x) p(y1:Ny |x)p(x) Example y1,t Iteration x2,t iExample Example prior p(x) x4,txp(y likelihood |x) prior p(x) y 1:N posterior: p(x|y 1,t 1:N y ) / p(y1:Ny |x)p(x) 2,t posterior: p(x|y1:N p(y |x)p(x) x 1:N posterior: p(x|y ) / p(y |x)p(x) y) / y 3,t 1:N 1:N likelihood p(yy 1:Ny |x) y likelihood prior p(x) instrumental function: ⇡i (x) = p(yy p(x) p(y1:Ny |x) 1:N y2,t 1,tx|x) 3,t p(x) priori p(x) x4,t ⇡iprior i target distribution: (x) = p(y1:N |x) p(x) andp(y =0 likelihood p(y1:Ny |x)1541 i 2⇡[0, y instrumental function: i (x) y1]2,t= y x4,t 0 1:Ny |x) p(x) likelihood p(y1:Nlikelihood 3,t p(y1:Ny |x) i y |x) y1,t function: · · p(x) 2 i 2 (0, 1] ⇡and 0 < 1 < ·|x) 2 [0, 1] and 0 = 0 instrumental i (x) = p(y1:N i i y yi 1,t distribution: y4,tp(y1:N y3,ttarget ⇡i (x) =function: p(x) instrumental p(y1:Ny |x) p(x) i (x) = y2,t y⇡|x) 2 [0, 1] and 0 = 0 i Marginalization y y2,t · · [0, · 1] and 0 = 0 2 1] and 0 < 1 2. weights wweights target at i 2 [0,= 21(x function: ⇡M (x) =i)(x |x) w < )1i,j posterior 2)p(y [0, 1] and = 0(xi,j ) w< p⇡= = = ww (xw 0Z i weights 1:N 0 )p(x) 0 Update Example < / 1(x =0 < ) 02== 1instrumental M/ i(m) i 2 [0, 1] and0=0 0 < according toy 1 < 1at target 0Update X )pj)1] 1=1=0 11i,j 2(x i,j X 2(x [0, and 0 0/ i,j >2.⇡Update (m) 0 ) =< j 1w< 1 < 0 i i,j 0 = 0 =0 j=011AIS filter =0 < 2M < j AIS filter 2 [0, 1] and p (x (x )< 21 < 0=0 3 = AIS filter > i 0 (m) p(x |y )=p(x) 1 0 prior < < 1:N (m) mZ 10 > 02 Z 3 m=1 y i,j Z 2 [0, 1] and = 0 i 0 marginalized posterior: 0 p(x |y ) / p(y |x , x )p(x , x )dx p (x ) = w (x ) =0 < < < = 1 )1)l/< / target at 1 2 p(x|y 31:N n 2. j weights >Update 0y 0iposterior: w1:N posterior: / |x)p(x) y0 yy> 0 np(y M 0 1:N < |x0 , y i )p(x i,j xdraw 2|y> 1 = Update weights wAIS / of k marginalized filter x 1,t 1 samples posterior: ) p(x |y )dx marginalized p(x |y ) = p(x 1. 2.Only x 1,t 1 > i,j > > 1likelihood Multiple 1. Only 1 AIS0 1 |x) 0 1.1 x of xi,j 2 =samples 1 0=0samples p(y likelihood 0 of i,jn ) draw 101 =01 p(x1:N |y y ) = p(x |x , y )p(x |y )dx 2 1linear, =x0 2marginalized m=1 Iteration 0 2 > 1posterior: 1 n|yand marginalizediposterior: p(x |yUnknowns ) = p(xnon-linear, |x , y =)p(x )dx M We investigate a novel strategy for effective adaptive learning 1 < 1,t are conditional x0draw 2 < X l1 0x2,t x =x 1 2filter 0 =01of >02,t> )1 12 2 i,j i3,t i,j i,j Multiple based on assuming that the proposal function belongs to a1. 2.OnlyUpdate 1 (m) = /2i,j> 2. Update according w ⇡ (x ) 1 x3,tweights 2. Update weights w ⇡posterior: /to (m) (x m 2. 1:N Update w / marginalized p(x ) 1/ p(y ,0weights xn )p(x = 0 )=0 i(x 1 l , xn )dx (x )|x=,|x posterior: / ⇡ifilter p(y x l )p(x , x=0 )dx< 1l < 1 m=1 ki,j|yn |y)1:N2)AIS Multiple pM (xj ) = AIS wmarginalized Multiple AIS =1 i marginalized i,j p(x i,n x i 2 [0, = > 1Update p(x |y p(xk |xj , y1:Ny )p(xj |y1:N 2. weights wand / y )< [0, 4,t 1 1] 22 k 1iix2 3,t Z0filter 4,t = 1 1posterior: 1:N w / 2p(x), /m=1 p(y 0 /=gradual 0 Z mi (m) p(x ) |y = ) of these kernels are learned from the generated samples and X X0wi,j (xi,j ) pM (x pii,j(x)j )Z = 1 >X pM )AIS)= wwi,j )j samplesXw Multiple in1. theOnly parameter space i/ i,jx(x j(x Update weights according wi (x i w(x i,j draw of 2.=Update weights according toto /in p (x ) 2. Multiple AIS the⇡parameter p (x ) = w (x ) i,k (x ) ) = spacew p (x (x ) Multiple AIS p (x ) = w ) 1 =p(x > |x marginalized posterior: p(x ) 1 AIS = p(x |x , y1:N )p(x |y1:N )dx > m=1 1 ,(x 0parameter k |y1:N kspace jiparameter jspace j 2 m=1 m=1 Multiple AIS in the marginalized posterior: p(x |y ) = y )p(x y in yx1:N y Iteration Multiple 1. Only draw samples of Multiple AIS k k j 1:N j |y1:Ny )dxj Multiple AIS the i,k y y their weights as the iterations proceed. In the problems that 3 x Iteration i k AIS filter 1. Only draw (m) Multiple AIS of xi,k Multiple 1 (m) Z 1 AISIteration 1 samples = p(x) i,k AIS |y1:Ny 2) = 1 Multiple inMarginalization the parameter space 2 > i 1,t p(x |y j AIS the resampling we weight the generated Iteration i jstep), AIS filter (m) AIS filter filter samples by k AIS filter Multiple AIS Multiple AIS Multiple AIS j 2. Update weights w / Multiple AIS we address, the likelihoods will be much more peaked than 3 Z Z 2.p(x Update weights wAIS / (m)AIS Multiple Multiple posterior: |yOnly )p(x |y )dx 31samples i,kj⇡ i (x 1:N ) = k |xj , yMultiple 1:N 1:N i,k Z (m) xjx draw of|y 1. Onlymarginalized draw samples of p(x x1.kMultiple AIS Iteration i Multiple )AIS |xZ, y )p(x |y )dx )⇡i,k (xi,k 2,t 1 AIS i,k j marginalized AIS filteri,k AIS posterior: p(x |yj2j = )(m) =filter p(xp(x |x , y )p(x |y yfilter kIteration AIS AIS filter marginalized p(x |y ) =filter ) )dx posterior: Z Z a tempered kktoAIS AIS filter p(x Zi,k Z 3 1:N assuming that proportional the product ZZ 2. p(x Update weights w)p(x /) )dxtarget the priors, which implies that the posteriors will also be much kposterior: AIS filter (m) k |y filter posterior: ) p(xk |xj ,)p(x y1:N )p(x marginalized posterior: |y ) = p(x |x , )p(x )dx marginalized p(x |y, k )y)p(x =|yp(x p(x |x)dx , AIS yk |y )p(x |yy= )dx posterior: |ymarginalized ) k=p(x p(x |x ,marginalized y p(x |y AIS filter i,k x k 1:N j |y1:N (m) marginalized posterior: |y = p(x |x , y )p(x )dx j 1:N j 1:N 1:N 3,t 1 y (m) y y posterior: p(x |y ) = p(x |x y |y Z ⇡ (x ) Zp(x posterior: p(x |y )) ==k |y|y y)dx j 1:N j k 1:N k 1:N k Multiple AIS in the parameter space 1.marginalized Only draw samples of x marginalized posterior: p(x |y p(x|x|x, y, y )p(x )dx i,k Z 1. Only draw samples of x 1. Only i,k λ p(xdraw |y1:NyofM )x p(x 1 |y (m) i,k 2 samples 1:Ny ) (m) i,k i,k X 3Only draw 2. Update weights w / 2 h(x) (this would be an “intermediate poste|x) 2. Update weights w1:N / 1. samples of x more peaked. M (m) marginalized posterior: p(x |y ) = p(x |x , y )p(x |y )dx (m) (m) marginalized posterior: p(x |y ) = 1. marginalized Only draw `(y samples of x Iteration (m) M x k 1:N k j 1:N j 1:Nip(x j j , y1:N )p(xj |y1:N )dxj i,k posterior: p(x |y1:N )ofx x= p(xk |xj , y1:N )p(x )dxjwofk x (xX) k |x i,k(m) y1. Only 4,tp1. 1(x ksamples j |y 1:N draw 1.⇡ Only draw samples ) = (x ) p(x⇡ |y(x ) )of j i,k 1. Only draw samples i,k Only draw samples of x i,j i,j i,jk i,j i,k 3 i,k (m) 5 m 5 Update weights wi,k / Marginalization 2 km=1 Multiple AIS we inof parameter space pM (x ) = wi,k (xi,k ) ⇡the (x ) 1. Only 2.where Update weights draw of 1. 2.Only draw xi,j rior” that pursue), λw1samples is/draw some small positive numi,k M i x 2 samples p(x |y 2. Update weights X 2. Update w (m)/ samples 5 1. Only draw of 1.) xOnly samples of wwxi,k// m=1 2. Update 2. Update weights wi,j / M i,k 1. Only draw samples of xweights i,k mweights M(m) ⇡ (x i ) Iteration i Iteration X p (x ) = w (x ) (m) k (m) ber less than one. The weights then given by|yy )Eq. (7). M Xi,k p(x Algorithm 2 PMC Sampling with Gradual Learning i,k (m) M M X |y Mp are m31:N p(x |y ) iy ) (m) (m) ) = w (m) (x|y) )=(m) (x X )i,kp(x ) |y (m) 1:N X weights pp(x (x (m) /p(xXi,j (m)w X (m) (m)X ) m M kw m=1 Update 2. Update weights w(x i) (x i,k /Update 2.w(m) wi,k /w (m) (m) p i,k (x ) (x=i,ki,k 2. Update weights (x )2. (x (xk ) / = w )Xw (x ) M ) i,j = ) i,k (x ) pweights i,j jk ppMM )/=Update = w2.pweights w (x i,jw i,k i,k X i,j (x i,k y

i

y

y

y

y

y

y y

n

y y

i

l

y l

n

n

n

1:Ny

1:Ny

l

1:Ny

n

l

n

l

n

l

n

1:Ny

1:Ny

l

n

l

n

l

y

i

n

y

y (m) p(xi,n |y1:Ny )

(m) i

(m)

p(xi,n |y1:Ny )

(m) i

n

y y y i

y

l

y

l

l

n

n

n

y

1:Ny

(m)

y

y

⇡i (xi,n )

i

(m)

p(xi,n |y1:Ny )

(m) i y

i

(m)

⇡i (xi,n )

(m)

⇡i (xi,n )

y

y

y

y

y

y

y

y

y

i

j

j

1:Ny

j

1:Ny

k

j

1:Ny

k

i,j

1:Ny

k

1:Ny

k

j

y (m) i,n

1:Ny (m) i,n

i,n i

(m)

p(xi,j |y1:Ny )

(m) i,j

(m)

p(xi,j |y1:Ny )

(m) i,j

k

1:Ny(m) k

i,j

i

k

1:Ny

(m) i,n

n

i

p(xi,j |y1:Nyn )

(m) i,j

(m)

⇡i,j (xi,j )

y j

1:Ny

y 1:Ny i,j (m) i,n (m)

1:Ny

(m)

⇡i,j (xi,j )

k

1:Ny

1:Ny l (m) i,n 1:Ny l

k

n

l

n

l

y

(m)

⇡i,j (xi,j )

(m) i,j

M

M i (m) i,j

M

M i

m i,j

j

(m) i,j

m i,j

j

i,j

m=1

n i

M

y

1:Ny M (m) i i,j

(m)p(x(m) |y m 1:Ny ) (m) i,n i,j i,j

j

i

m=1

(m)

⇡i (xi,n )

m=1

yy

y

(m) i,j

y y

y

y

y

(M ) on -l i e w0 ne ly= {x(m) , wd(m)p}oM ar 0 d m=1 0 0 st e ,x ra Up rio w n da an sa r: te mp d p( co we Iteration 1le xn nd so igh |y iti fx posterior U p(x|y ts 1:N ) 1: on N n a n al y ) instrumental functionkn⇡1c(x) c ow(1)ord ma Z l i ne / x1 n s i n ma ar rg ,x Exampleina rg x(M ) areg to p( i na 1 l liz 1. n o w (m y1 1. (1) ed l n i w ) On i posterior: p(x|y1:Ny ) / p(y1:Ny |x)p(x) : z M Iteration 1 N -l i On 0 e p d y u | n / p( ly (M ) os xl l l e p y t w 2prior j a ( posterior p(x|y ) t x 1 , i d 2. er os . U p(x)ra p 1:N r, i,nm) xn dr t er i Uinstrumental x⇡n |y afunction w )p AI le ⇡A0 (x) pd w pdor: ior i( (x axin(m 1:Ny aty |x) sam sSa fi xI(1) at p( likelihood p(y1:N l,x ew ) mlpt 0S : p ,nd ) e wx pl e ) elerExample co n) ( (M ) ei g x so eijg|y1 dx n i xs0 n| di t o instrumental function: ⇡i (x) = p(y1:Ny |x) p(x) hts posterior: f xp(x|y1:N )t :/ p(y1:N (1) |x)p(x) y h l N fx U i 1: y ) s o w N w (m nk ac prior i,j 0 p(x) na n y ) Z ) i = n ll co w(M ) ,j =0 o i 2 [0, 1] and 0 m i w 0 r Z / ne / p( m likelihood din1:N |x) n p(y ar (m) (m)s M ar x (m= ginrandom measure g t p( i 0 )ar{x ,x m=1 0 , w0 }a instrumental function: ⇡ ,⇡j i (x) | g= ali o wxj | p( i p(y1:N |x)re np(x) l 1. (m y 1. i,j (x (my1:Nin2al[0, x ze 1] and = 0 o 0 1 n- i k) , On p M d :N Oni,j )) yM) ized y l p ine / 1:p y | ul t p ma ly i (x os x l N y 2. ar (yx )(m jIterationip1 os l,x dr j ) 2. ter rg Strategy for large number of unknowns: Marginalization , x i,np() le te aw Up i na ior drp(x|y n) Upposterior awA 1:N Marginalization y p( n⇡i ( |x for slarge = number IsS ) AISrior :of unknowns: dStrategy liz 1. daX M axn(m 1k:N|y xl at Iteration i ainstrumental : i function ⇡a1m (x) ed mp p( fi i,d ) y1) t ,x e On Iteration e n l : N xj M w p( ) ptleer c po w lem (1) n) o y ly ) e u x | s =o w meig y1 x1 s nd dx dx st e lti igh 2. n k dr | y1 iti k f1 x i,j ht :Nx(M ) of pl e t s l rio aw Up s o xn :N (x (may1 ) AI w (m n i,j r: A da a sa y ) Z ll i,j w)c(1) co = mp te p( m S fi IS i,j )/ 1 i ne Z ) rd / xk ar lte we p( l es (M ) i ng ar x (m w1 |y gi r igh ,x i ) of n 1 t o p( x ⇡ , j |y p( ts xi 1. :Ny ) aliz l i, 1 w j(m|x y1 j( w (m e ,k x (m :Ny )k , On p M i :N i, ) M) i,k ) = d pZ y1 j ) y | u / ly i ( ma o x p / p( 2 lti :N Example (x (m 2j l,x dr xj ) ster x .(m rg p yi ) p( io p(x|y1:N A , p) aw posterior: i U ) i na n) ) / p(yle1:N |x)p(x) ⇡ n |(yx ⇡ ,k |yp AI p( r:M IS prior i( i, liz sa =xk |x X 1. 1d xl x (m 1:kN|y k( mp x (m :Nyate filt S p(x) i, ) y 1 ,x ): O n p M ed j , p( x n ) ) w i, ) M N y k e l p n) likelihood p(y |x) esm ) r 1:N y ) ly2 i (x 1: j | y u l t ei g os dx d w N m 2. o=f 1 i y ) 1:N xk t er k dr k ) ht i p l instrumental function: ⇡ (x) = p(y |x) p(x) , p 1:N i aw xi j ( (x y ) Up le s w ( ior A ( x , m j : da sa = X IS AISi, ) jm|y)1] and M i 2i [0, 0 =Z0 = ,j 1 j mp te p( fi ) : / l N x p( we t er lem y ) x (m s =o w m k |y dx p( Data igh i ) 1: xexchange ⇡ , j |y f1 x i,k j N ts i, j |x 1: Marginalization (Strategy w (m i,k (my ) for large numberj (of x (unknowns: x m Ny k, ) i, ) Z i,k ) i,k ) p M= y1 j ) Iteration i i ma ) / p( :N ( x rg y ) x (m j) p p( i, ) i na (x k | xk ⇡ = y i, liz 1. k |xX M 1: |y k( N e ( x m y 1: On p M d j, i, ) M) N y1 k ) y ) ly i ( pos u m dx : w Nm lti 2. = k dr xk ) ter y 1 p i ) k ,j p aw Up le ior ( ( A x x(mj | :M da sa = X IS AIS ) i,j y mp te p( filt ) 1:N xk w l e e y ei g 3 )d 2 r smo= w m |y xj ht f1x i,k 1:N sw( (x (my ) i,k m Z i,k ) = i,k ) ) / p( x (m p( i ) ⇡ , k |y xk i, 1: k( |x x (m Ny j, ) i, ) pi M y1 k ) (x :N y ) k) p( xj = X M |y 1: N y ) 3 m dx w m = 1 i,k j ( x (m i,k ) )

1:Ny (m) i,j

i,j

k

i

1:Ny

k

j

1:Ny

k

1:Ny

j

1:Ny

i,k (m) 1:Ny n i,j (m) p(x (m) (m) i,k |y1:Ny ) (m) i,k i,j ⇡i,j i,k (xi,k ) n

y

i,k

(m)

p(xi,k |y1:Ny )

(m) i,k

k

j

1:Ny

M

M i

m i,k

k

k

1:Ny

j

M

l

(m) (m) p(xi,n i,n |y1:Ny1:Ny ) (m) M (m) i ⇡i (xi,n )ii,n j

(m) i

k

(m)

m i,k

1:Ny

y

Mi,k M m mi,j i,j

m=1

j j

m=1 m=1

m=1

k

1:Ny

k

y

(m) (m) i,j i,j

! !

(m) i,k

y

j

p(x⇡i,j |y (m) )) i,j (x1:Ny i,j (m) ⇡M i,j (xi,j )

M M i i

(m) i,j

m i,j

k

1:Ny

(m)

i,j (m) i,j

M

j

j

y M i

l

1:Ny

1:Ny

(m) i,k

(m) ⇡i,j (xi,j )

(m) i,k

m=1

n

(m)

j

1:Ny

i,j (m) p(xi,k |y1:Ny ) (m) i,j i,j (m) ⇡i,k (xi,kp(x ) (m) |y1:Ny ) (m) i,j

p(xi,j |y1:Ny )

(m) i,j

m i,k

k

(m) i,j

i,k

l 1:N y (m) i,j i,jy 1:Ny l n

y

1:Ny

m=1

j

(m) i,j

(m)

M k i (m) i,k

1:Ny

j

⇡i,k (xi,k )

n

j

y

y

j

j

y

y

y

y

y

y

i na liz

rg

nk

On

random2measure .

ma

1.

1:Ny

1:Ny

k

j

1:Ny

k

j

1:Ny

k

j

1:Ny

(m) i,k (m) i,k

(m) i,k

m

i,k new data

j

k

j

M i

y

)

=

3

l

⇡i,k (xi,k )

m i,k

j

y

new data

(m) i,k

yM

m=1

3

(m) 0

y

(m) 0

(m) 0

wj M X

w b

=

m ,i

marginalized posterior: p(xk |y1:Ny )

(m) i,k

i,k

1:Ny (m) i,k

p(xk |xj , y31:Ny )p(xj |y1:Ny )dx 3j

=

(m) 0 ) (m)

(m) 3(m) ⇡0 (x(m) 3 ⇡0 (x0 ) ) + ⇡1Z(x0 ) p(xk |xj , y1:Ny )p(xj |y1:Nym=1 )dx0j (m) posterior: p(x ) = p(xk |xj(m) , y1:N )p(xp(x ) y )dxj j |y1:N y i k |y1:Nalternative approach: w0 / y (m) 0 (m) ⇡0 (x0 ) + ⇡1 (x0 ) 6

π(x ) (m)marginalized , = (m) (m) (m) 3 1.p(xOnly samples of|µ xi,k (x , C(m) ) 3 |y q draw l2. Update AIS filter (m) i) i i weights wi,k Z/ ⇡ (x ) Step 2

m

,

0(m) (m) 0 0

Step 3 3 (m) m p(xw ) pM (x = /filter (xw i,k/ )p(x classical approach: k l) AIS i,k alternative approach: w Z i 0

AIS filter

1. OnlyStep draw samples 3 i of xi,k

3

6

3

M

M i

(m) i,k

m i,k

k

m=1

(m) 0 (m) 0

(m)

p(x0 )

/  ⇡ (x ) + ⇡ (x  ) λ` (m) (m) `(y1:Ny |xi ) h(xi ) (m)

alternative approach: w0

0

(m) π(xi ) , (m) (m) (m) qi (xi |µi , C(m) )

(m)

wi

1

(m) 0

6

(m)

qi

(m)

(xi

(m)

|µi

3

, C(m) )

,

3

3

3

(m)

=

(m) 0

are used for adaptation purposes. They are normalized (m) as w ¯i , m = 1,3 . . . , M , and used at the resampling step of each iteration.

(m)

(c) Draw M independent parameters µi+1 from the disPM (m) ¯i δ(x − crete random measure, π ˆiM (x) = m=1 w (m) xi ), obtaining the new set of parameters Pi+1 = (1) (M ) {µi+1 , ..., µi+1 }. ,w bi

1:Ny

(m) new (m) datap(xi,k |y1:Ny ) y (m) i,k

M

k

(m) 0

Step 3

i,k

(m)

j

(m)

y N 1: ) |y m) ) ( j (m j ,

)

)

, C(m) )

[Output, i = I]: Return the pairs {xi 1, . . . , M and i = 1, . . . , I.

1:Ny

i,k

(m)

⇡i,k (xi,k )

old data

M

m=1

1:Ny

1:Ny

(m)

p(xi,k |y1:Ny )

(m) i,k

k

j

k

i,k

(m)

⇡i,k (xi,k )

0

with m = 1, . . . , M .

y

(m)

j

1:Ny

(m)

p(xi,k |y1:Ny )

(m) 0

=

j j

p(xi,k |y1:Ny ) (m) p(xi,k |y 1:Ny ) (m) ⇡i,k (xi,k ) (m) ⇡i,k (xi,k )

(m) 0

(m)

k

j j 1:N 1:N y y

y

7 X are used for inference i.e. for building the IS p (x ) = wpurposes, (x ) 1. Only draw samples of x (m) estimators. wi (m) and normalize them, w ¯i = PM (m) , with m = x 7 j=1 wj p(x ) classical approach: w / • The modified-tempered weights ⇡ (x ) 1, . . . , M . Compute also the non-modified weights

w bi

1:Ny

k k j j 1:N 1:N y y

y

(m) 1:NyM M i,k M m (m) iM k i,k m i,k i,k m=1 k i i,k m=1

(m) i,k

m i,k

j,i

k

x i, j(x i , ⇡i

(1) w1 (M ) w1

p(

(M )

j

x1

x i,

(1)

of

x1

(1) (m) 0 i,k (5) 0 (1)data old M 1 i (5)data old 1

Step 1

1

instrumental function ⇡1 (x)

y

y

i

y

y

(1)

x0

) s xa(M 0 r(1) w0e n

no wn

Iteration 0

posterior p(x|y1:Ny )

instrumental U function ⇡0 (x)

k

p(

es

Iteration 1

j,i

i,k

marginalized posterior: p(xk |y1:Ny )

=

posterior p(x|y1:Ny )

k

j

(m) i,k

Alternating conditionals j

=

M

(m) M k 1:Ny i i,k (m) m m=1 (m) (m) m i,j i,j j,i m j,ii,k (m) i,k

M M

m=1 m=1

m=1

(m) i,k

r: io

/

(x

0

(m) (m) i,k i,k

p(x ) Step 2 • The standard importance classicalweights, approach: w / xM ⇡ (xX )

M

random measure

(m)

1:Ny

y y

i,k i,k

(m)

i,j

j

i

1.

er

) (m j

w i,

(1)

j

j

kk 1:N 1:Nyy

M m Z ) k i i,kZ i,k Z p(x |x , y ⇡i,k)p(x conditionals p(xk |yAlternating i,k 1 1:Ny )dxj m=1 1:Ny ) = k j 1:Ny (xStep j |y (m) xp(x 0 =j |y1:Np(x y1:N3 )p(x )dx marginalized posterior: p(x =j |y1:Np(x k |x j j , y1:Ny )p(xj |y1:Ny )dxj marginalized posterior: marginalized p(xk |y1:Ny ) posterior: =Step 1 p(x y1:N )dx k |x y) y y k |xk j ,|y 1:Nyy))p(x j jk,|y y2 Step(m)

pi

(M ) x0 (1) w0 (M ) w0 (m) (m) {x0 , w0 }M m=1

i,j

(m) i,k

e

st

pl

s ht

x0

1:Ny

y

⇡i,k (xi,k )

i,j(m) 1:Ny

(m) ⇡i,j (xi,j (m) )

U

po

m

ig we

instrumental function ⇡0 (x)

j

kk j 1:Ny y i,j

1:Ny

2.

ed

sa

e

Iteration 0

k

y

p(xi,k |y1:Ny )

(m) i,k

(m)

(m) p(xi,j (m) |y1:Ny )

(m) i,j

m=1

aw

at

posterior p(x|y1:Ny )

O

k

liz

dr

pd

(m)

|µi

(m)

h(xi

1:Ny k k1:N y

j

y

at

M i,k

na

y

U

(m)

(xi

λ`

1:Ny (m) i,k

i,ky

1:Ny M (m) i (xjj ) pM ⇡i,k (x )i,j i M i,ji,k

pd

gi

nl

(m)

qi

U

ar

O

2.

=

)

k

i,j

2.

m

1.

(m)

wi

(m)

`(y1:Ny |xi

i,k

k

M

m = 1, . .marginalized . , M. posterior:

(b) Compute the modified-tempered weights, 

1:Ny j k 1:N y y

y

1:Ny (m) i,k

m=1

(a) Draw one sample from each proposal pdf, ∼

y 1:Ny

k

(x )the adaptation w At the next iteration, are wcom⇡i,kp(x⇡(x ⇡ (x the )= weight i i,k i,k (x⇡ )for (x ) i,k)(x)= ) p (x ) = w (x ) m=1 m=1 3 m=1 Alternating conditionals λ2 Alternating conditionals puted by using as aMultiple target distribution `(y |x) h(x), 1:N Alternating conditionals M Step 1 M y M AIS X X X conditionals Step 1Step (m) (m) m (1)M m X k Alternating M Alternating m pM(m) 3 = Step 2 wai,k 1 M x0(x (x AISX filter pi i,k(x)k ) λ= w (x ) λ =w2 < (xi i,k )k )conditionals X i,k (xi,k ) where 0p < 1. larger ,Mwe allow Z choosing i 1 i,k) By (x p)λ = k< (xw 2Step 3(m)Stepm=1 3 M (m) 2m=1 m m (5) Step 1 m=1 marginalized posterior: p(x |y ) = p(x |x , y )p(x |y )dx Alternating conditionals Step 2 p (x ) = wi,k (xi,k ) pM (x ) = w (x ) j 2 AIS filter i Zx0k i j,i l l j,i Step 3 AIS filter for more influence of the likelihood in the computation of the Z)p(xm=1 (1) m=1 posterior: |xStep , yl AIS |y )dx Z 1. Only draw samples Step p(x 2 |y ) = xp(x Step 1 of x marginalized 3filter 1 samples’ weights. (5) conditionals marginalized posterior: p(x |yp(x ) k=|xj ,p(x |x , y )p(x )p(x j |y|y1:N )dx )dxj Alternating conditionals Alternating marginalized posterior: p(x |y ) = y1:N Alternating conditionals k 1:N x 1. Only draw samples of x x y y y 2. Update weights2w / Step 3 Step l 1AIS filter ZThe Step 11. Only draw samples 1 2. Update weights of xStep 1 Algorithmx2 Step summarizes the methodology. w proposed / 3 l X AIS filter Step 3draw 1. Only samples of xi,k22. UpdateZweights Step Step 2 = Step 2 marginalized posterior: p(xk |y p(xk |xj , y1:N )p(xj |y1:N x novel w 1:N / ) p (x technique ) = w (x is ) the joint use key point of the of two imporX 3 l AIS xfilter Step 3 (m) w pStep (x ) 3= (x ) |x Step marginalized posterior: p(x |y ) = p(x p(x |y ) 1:N j , y1:N )p(xj |y1:N )dxj Z kAlternating 1:Ny k (m) i,k tance weights: conditionals X l i,kAIS/filter 2.l Update AIS filter AIS filterweights w (m) l p (x ) = w (x )

[Initialization]: Select the adaptive parameters P1 = (1) (m) {µ1 , ..., µ1 }, the static parameters, {C(m) }M m=1 of the M proposals, and the sequence [λ1 < ... < λL ]. [For ` = 1 to L]: [For i = 1 to I]:

(m) (m) qi (x|µi , C(m) ),

k 1:Ny i,j

1.

i,k

(m) xi

1:Ny

i

(m) i,k

i,j

k

y

1:Ny

j

) n r, ,x ea |x l lin y al :N on Z (y 1 p iti nd co ) xk / d y )d N an ) 1: y xl ) ) ) y | y :N xn )d (m n (m n :N |y 1 r, x i, x i, |y 1 xn k p( ⇡ i( ea (x l, xn x lin ( )p l p( p ) y n) , x/ N : n ( no 1: x amr or , i e i ,y re er xn |x l loinw |x k st sa y of j al t N po 1: wn ng Z p( x es pl tidoi Z p(y IS ed no m doir liz nk A sa oncc U na l e l t er ) = gi sca xk / ltip aw fi y ) ar nhdt )d dr u ) IS 1 : N y m iag l y y ) |y :)N) N y A M 1 dx m m xwne N nl 1: y ( ( : ) n n y | | O r,e j p(x (i,x ji(x i, |y 1 k xn ) eda t 1. (x (m)i ) y l, p ⇡ xn lipn )p , ( x ) |y 1 : N ) ) p( /: l ) rx y n-U m )pj (x j , ( N j (m j (m ri,o r: , n2o. 1: m ,i x i, xxni i o y x ( f i , eeair , p( ,j re wj M er xn xol inswt ⇡i |x k X st e|s sa ltpolo of 1 pNl y m) / Z xj dga po wn = es ym1: p( ( zoinen ( m a Z pl ed no s p = irtldi IS i , j a liz nk w A sw am nicndo ) s c a U g o na le htlter = dr xj gi sacra xk aw M ( y/ ltip eSigfi N y)) ar hnmtd )d dr nl u w pi m y y eiga O) M te AI ) |y 1: :N) y) :N 1 xwn nl .:N y a (m n | y (m n |y 1 O i, i, tre, |y11 pdj (x x j (x k ) p U 1. y deaa (x (m)i ) p( ⇡ i xn N lpin , 1: )p p( 2. ) |y m) ) ) r:/ y j nU(x j ( j i, (m j mio :N i, r: m n2o. 1 i, (x ,i e( ri fx io (x ,y j t o p , w wj re s M er xn ⇡i |x k X st pto sa l es of j / 1 p ) po (x = endg wn m Z es m ( p m pl ed lidzi sa no S ,j = i r I w m liz noa w nk c A a i s ) a r s c t g U na = dr iple ighfilte xj asra gi aw M ( y t e m ) ar ht dr nl pi u l e w IS y m N O y ei g M w nl at A y 1: | pd j xj ) ) (m i , (x j

j

1:Ny

A sequence of positive and increasing values is used to temper the target, i.e.,

} for m =

This is especially the case when the number of data is large and/or the dimension of x is high, which will most likely entail that there will be no samples in parts of the space with large probability masses. To avoid that, we use the concept of gradual learning. Let us consider the first iteration i = 1. The underlying idea is that, for the adaptation (i.e., for

0 < λ1 < λ2 < . . . < λi . . . < 1, as shown in Figures 2. When λ = 1, the target distribution is the posterior and the corresponding samples and weights represent its approximation. We conjecture that in this way we can obtain a good proposal function for any dimension of x. Obviously, the higher the dimension of x is, the more steps in learning the proposal function we need. We recall that the (m) returned weights, used for building the IS estimators, w bi

1542

are computed using the true unmodified target, so that the resulting IS estimators are always consistent. Note also that their calculation is almost free (only a new re-normalization (m) is needed), since the modified weights of Eq. (7) have w bi as intermediate result (i.e., the likelihood function is always evaluated for the tempered weights).

(a) λ = 0.001

(b) λ = 0.05

(c) λ = 0.1

(d) λ = 0.2

(e) λ = 0.5

(f) λ = 1

4. NUMERICAL RESULTS In this section, we tackle the problem of estimating the frequencies of a weighted sum of sinusoids. The data are given by yc (τ ) = A0 +

S X

Ai cos(2πfi τ + φi ) + r(τ ),

i=1

τ ∈ R,

Fig. 3. Sequence of different targets as λ changes. The black points show the evolution of the mean of the proposals.

where S is the number of sinusoids, A0 is a constant term, {Ai }Si=1 is the set of amplitudes of the sinusoids, {fi }Si=1 represents the set of frequencies, {φi }Si=1 are the phases, and 2 . r(τ ) are i.i.d. zero-mean Gaussian samples with variance σw Let us assume that we have Ny equally spaced samples yc (τ ) π with period Ts < max1≤i≤S 2πfi , i.e. fulfilling the sampling theorem [26], y[p] = A0 +

S X

Ai cos(Ωi k + φi ) + r[p],

p = 1, . . . , dy ,

i=1

where y[p] = yc (pTs ) for p = 0, 1, . . . , dy − 1, Ωi = 2πfi Ts 2 ). The goal consists on for i = 1, . . . , S, and r[p] ∼ N (0, σw inferring the set of unknown frequencies {fi }Si=1 . We set a  S uniform prior on D, where D = 0, 12 is the hypercube considered a domain of the target (which is periodic outside D). Then the posterior given the data is π ¯ (x) ∝ exp (−V (x)), where V (x) =

K S X 1 X y[k] − A0 − Ai cos(xi k + φi ) 2 2σw i=1

!2 ID (x).

k=1

We the address the bi-dimensional problem with S = 2 sinusoids where the true frequencies, {fi }2i=1 = [0.27 0.43], are unknown. For generating Ny = 20 observations, we set A0 = A1 = A2 = 1 and φ1 = φ2 = 0. Therefore, the problem consists in characterizing the posterior pdf of the frequencies {fi }2i=1 given the data. We use the DM-PMC method of [4], and we implement the gradual learning strategy proposed. We use a sequence of L = 8 modified targets, where we have set λi = Λ(i) for i = 1, ..., 8, where Λ = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 1]. As proposals, we use isotropic Gaussian distributions with Ci = σ 2 I2 for i = 1, ..., N where σ = 0.05, and with means initialized randomly and uniformly in D. We test the algorithms with K ∈ {1, 2, 5, 10}, and N = 10. In all cases, for sake of a fair comparison, we select T in such a way the number of target evaluations is fixed to E = N KT L = 16000 (note that for

K = 1 K = 2 K = 5 K = 10

Method Standard PMC

N =5 N = 10

0.0521 0.0862

0.0691 0.1507

0.1504 0.1298

0.1389 0.1448

Gradual learning PMC

N =5 N = 10

0.0517 0.0516

0.0513 0.0632

0.0603 0.0517

0.0574 0.0687

Table 1. RMSE in the estimation of parameters {fi }2i=1 . the standard PMC, L = 1). Table 1 shows the relative mean square error (RMSE) in the estimation of the true parameters. Note that the proposed gradual learning PMC sampling method outperforms the standard PMC for all sets of parameters. Moreover, when K is bigger the number of iterations T is smaller (since we keep E = N KT L constant). Therefore, the new method is of special interest when few adaptive iterations can be performed. Finally, Fig. 3 depicts the evolution of the proposals as λ changes for the new method. It is obvious that when λ is small, the target is less peaky and exploring the space of unknowns is easier, while λ tends to one, the target gets more peaky. At the end all proposals move to the mode (true value is [0.27 0.43]).

5. CONCLUSIONS We discuss the implications of considering a new strategy for proposal generation in population Monte Carlo (PMC) sampling and compare the new method with the standard PMC in terms of performance. The new PMC approach is based on a gradual learning process that approximates the target more precisely with iterations. Computer simulations and comparisons reveal a good and advantageous performance of the proposed method when compared to the standard PMC. It is expected that in more challenging scenarios (higher dimension of the space of unknowns) the gain of the proposed algorithm will be more.

1543

6. REFERENCES [1] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, Springer, 2004. [2] J. S. Liu, Monte Carlo Strategies in Scientific Computing, Springer, 2004. [3] O. Capp´e, A. Guillin, J.-M. Marin, and C. P. Robert, “Population Monte Carlo,” Journal of Computational and Graphical Statistics, vol. 13, no. 4, pp. 907–929, 2004. [4] V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, “Improving Population Monte Carlo: Alternative weighting and resampling schemes,” Signal Processing, vol. 131, no. 12, pp. 77–91, 2017. [5] M. F. Bugallo, L. Martino, and J. Corander, “Adaptive importance sampling in signal processing,” Digital Signal Processing, vol. 47, pp. 36–49, 2015. [6] V. Elvira, L. Martino, D. Luengo, and J. Corander, “A Grandient Adaptive Population Importance sampler,” in 23th European Signal Processing Conference (EUSIPCO), 2015, pp. 1–5. [7] G. H. Givens and A. E. Raftery, “Local adaptive importance sampling for multivariate densities with strong nonlinear relationships,” Journal of the American Statistical Association, vol. 91, pp. 132–141, 1996. [8] P. Zhang, “Nonparametric importance sampling,” Journal of the American Statistical Association, vol. 91, pp. 1245–1253, 1996. [9] L. Martino, V. Elvira, D. Luengo, and F. Louzada, “Adaptive population importance samplers: A general perspective,” in IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2016, pp. 1–5. [10] Z. I. Botev, P. L’Ecuyer, and B. Tuffin, “Markov chain importance sampling with applications to rare event probability estimation,” Statistics and Computing, vol. 23, pp. 271–285, 2013. [11] G. R. Douc, J.-M. Marin, and C. Robert, “Convergence of adaptive mixtures of importance sampling schemes,” Annals of Statistics, vol. 35, pp. 420–448, 2007. [12] G. R. Douc, J.-M. Marin, and C. Robert, “Minimum variance importance sampling via population Monte Carlo,” ESAIM: Probability and Statistics, vol. 11, pp. 427–447, 2007. [13] O. Capp´e, R. Douc, A. Guillin, J. M. Marin, and C. P. Robert, “Adaptive importance sampling in general mixture classes,” Statistics and Computing, vol. 18, pp. 447–459, 2008.

[14] E. Koblents and J. M´ıguez, “A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models,” Statistics and Computing, pp. 1–19, 2013. [15] J. M. Cornuet, J.-M. Marin, A. Mira, and C. P. Robert, “Adaptive multiple importance sampling,” Scandinavian Journal of Statistics, vol. 39, no. 4, pp. 798–812, December 2012. [16] L. Pozzi and A. Mira, “A R adaptive multiple importance sampling (ARAMIS),” Technical Report, 2012. [17] J.-M. Marin, P. Pudlo, and M. Sedki, “Consistency of the adaptive multiple importance sampling,” arXiv:1211.2548, pp. 1–25, 2012. [18] L. Martino, V. Elvira, D. Luengo, and J. Corander, “An adaptive population importance sampler,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014. [19] L. Martino, V. Elvira, D. Luengo, and J. Corander, “An adaptive population importance sampler: Learning from the uncertanity,” IEEE Transactions on Signal Processing, vol. 63, no. 16, pp. 4422–4437, 2015. [20] L. Martino, V. Elvira, D. Luengo, and J. Corander, “MCMC-driven adaptive multiple importance sampling,” in Proceedings of the 12th Brazilian Meeting on Bayesian Statistics (EBEB), 2014. [21] L. Martino, V. Elvira, D. Luengo, and J. Corander, “Layered adaptive importance sampling,” Statistics and Computing, to appear, 2016. [22] L. Martino, V. Elvira, D. Luengo, and J. Corander, “Interacting Parallel Markov Adaptive Importance Sampling,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 1–5. [23] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 3, pp. 411–436, 2006. [24] R. M. Neal, “Annealed importance sampling,” Statistics and Computing, vol. 11, no. 2, pp. 125–139, 2001. [25] B. D. Ripley, Stochastic Simulation, John Wiley & Sons, 1987. [26] J. G. Proakis, Digital Communications, McGraw-Hill, Singapore, 1995.

1544