arXiv:math/0611117v2 [math.ST] 14 Nov 2006
Minimax estimation of the Wigner function in quantum homodyne tomography with ideal detectors M˘ad˘alin Gut¸˘a
∗
and
Luis Artiles†
Abstract We estimate the quantum state of a light beam from results of quantum homodyne measurements performed on identically prepared pulses. The state is represented through the Wigner function, a “quasiprobability density” on R2 which may take negative values and must respect intrinsic positivity constraints imposed by quantum physics. The data consists of n i.i.d. observations from a probability density equal to the Radon transform of the Wigner function. We construct an estimator for the Wigner function, and prove that it is minimax efficient for the pointwise risk over a class of infinitely differentiable functions. A similar result was previously derived by Cavalier in the context of positron emission tomography. Our work extends this result to the space of smooth Wigner functions, which is the relevant parameter space for quantum homodyne tomography.
1
Introduction
The phenomena occurring at the interface between the microscopic and the macroscopic worlds have an intrinsic probabilistic nature. When measuring properties of atoms and laser pulses we obtain a random result whose probability distribution is determined by the state, or preparation of the quantum system. For example, if we count the number of photons coming from a laser source we observe a Poisson distributed random variable with mean equal to the intensity of the laser. The statistical inverse problem of inferring the state from results of measurements on many identically prepared systems, is called quantum state estimation. Recently it has become possible to apply such a method to the reconstruction of the quantum state of a light beam. The measurement technique is called Quantum Homodyne Tomography [22] and is used to confirm the creation of new and exotic quantum states of light such as squeezed states [3], single-photon-added coherent states [24] and Schr¨odinger cat states [20]. As experiments become more and more complicated, the costs – in terms of money and time – of running a measurement rise, and one needs to apply more sophisticated statistical techniques to reconstruct the state from a limited number of data. This paper makes a step in this direction by providing minimax convergence rates for a class of physical states. The object to be estimated is a real function of two variables called the Wigner function [23], which can be seen as a joint density of the electric and magnetic fields of the laser beam. However, since in quantum mechanics we cannot measure both electric and magnetic fields simultaneously, this function is in general not a probability density but has many features in common with the latter, for example the marginals along any direction are bona-fide probability densities. In quantum optics the Wigner function is a preferred representation of the quantum state [9] because many interesting quantum patterns such as squeezing or oscillation between negative and positive values, can be easily identified from the shape of the function. The estimation methods used by the physicists involve a number of ad-hoc approximations, binnings and truncations, making it difficult to verify the reliability of the procedure. Moreover, the quantum features in which the experimenter is interested may be washed out in the resulting estimator. ∗ Mathematical
Institute, University of Utrecht, Budapestlaan 6, 3584 CD Utrecht, The Netherlands,
[email protected] University of Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands,
[email protected], http://euridice.tue.nl∼lartiles/ 1 AMS 1991 subject classifications. Primary 62G05, 62G20; secondary 62C20. Key words and phrases. Nonparametric statistics, Wigner function, Quantum tomography, Kernel estimator, minimax rates † Eurandom,
1
From the statistical viewpoint, we deal with an ill posed inverse problem which is closely related to the problem of estimating a bivariate probability density in Positron Emission Tomography. In both cases the parameter of interest is a density and the data consists of independent identically distributed samples from the Radon Transform [7] of that density, with uniformly distributed angles. However as we mentioned above, the Wigner function is not necessarily positive but must satisfy other positivity conditions dictated by the laws of quantum mechanics. This means that some of the existing statistical results are not directly applicable in this case, and the problem of estimating the Wigner function should be studied separately in order to identify the specific “quantum features” that could be exploited in designing new estimators. The problem of state estimation in Quantum Homodyne Tomography was investigated in the non-parametric setting in [1]. The paper provides sufficient conditions for consistency of various estimators for two different representations of the state, namely the density matrix and the Wigner function. The problem of estimating the Wigner function was further investigated in [4] in a set-up which takes into account the detection losses occurring in the measurement, leading to an additional additive Gaussian noise. Although both [4] and the present paper consider a family of very smooth Wigner functions, the estimation techniques are very different. In [4] the bias dominates the variance due to the presence of the Gaussian noise, and for deriving the lower bound it is enough to consider a “worst family” consisting of just two states. In this paper, the variance is dominating and for the lower bound we need to consider a one parameter family of states. The techniques that we use here are very similar to the one of Cavalier [5] who considers the same tomography problem in the context of smooth probability densities rather than Wigner functions. For the problem of Positron Emission Tomography and the related regression problem of X-ray Tomography we refer to [16, 17, 6] and the references therein. For an introduction to Quantum Statistical Inference we refer to [2, 1]. In Section 2 we give a short introduction to quantum mechanics, present some properties of Wigner functions and the relation to the density matrix which will play an important role in the proof of the lower bound. The statistical set-up and the main results are presented in Section 3. Following [5], we identify a class of very smooth Wigner functions Z 1 2 f |W (w)| exp(2β|w|) dw ≤ L , (1.1) W(β, L) = W : 4π 2 R2
where β and L are positive constants. From the physical point of view, all the states which have been produced in the lab up to date belong to such a class, and a more detailed argument as to why this assumption is realistic, can be found in [4]. We consider a family of estimators depending on a bandwidth which is chosen according to the parameters of the class. The upper bound for the pointwise risk is proven in Theorem 1 and has the same almost parametric expression as the bound derived in [5]. For the lower bound we consider a family of Wigner functions which is different from the worst parametric family of probability densities of [5]. The latter cannot be used in our situation since it does not correspond to Wigner functions of physical states. Thus the main novelty of the paper is the derivation of the lower bound in the physical context of Quantum Homodyne Tomography rather than that of Positron Emission Tomography. The technical Lemmas are grouped in Section 4.
2
Quantum Homodyne Tomography and the Wigner function
In this section we briefly present some basic notions of quantum mechanics, the mathematical set-up of Quantum Homodyne Tomography, and some properties of Wigner functions. More information on the general set-up of quantum statistical inference can be found in the review paper [2] and the textbooks [13] and [14]. Quantum Homodyne Tomography is discussed in detail in [1, 4]. The state of a quantum system encodes all necessary information for computing the probability distribution of results of any given measurement. Mathematically, the state is described by a density matrix, which is a compact operator ρ on a complex Hilbert space H having the following properties: 1. Selfadjoint: ρ = ρ∗ , where ρ∗ is the adjoint of ρ. 2
2. Positive: ρ ≥ 0, or equivalently hψ, ρψi ≥ 0 for all ψ ∈ H. 3. Trace one: Tr(ρ) = 1. Positivity implies that the eigenvalues of ρ are all nonegative and by the last property, they sum up to one. Notice that the above requirements parallel those of defining probability densities. We have the following diagonal form ρ=
dimH X
λi ρi
(2.1)
i=1
where ρi is the projection onto the one dimensional space generated by the eigenvector ei ∈ H of ρ corresponding to the eigenvalue λi , i.e., ρei = λi ei . With respect to a fixed orthonormal basis {ψi }i≥1 in H, the operator ρ can be represented as a matrix with elements ρi,j = hψi , ρψj i. Let us consider now the following problem. We are given a quantum system prepared in an unknown state ρ and we would like to determine ρ. In order to obtain information about the system we have to measure its properties. The laws of quantum mechanics say that for any given measurement with space of outcomes given by the measure space (X , Σ), the result of the measurement performed on a system prepared in state ρ is random and has probability distribution Pρ over (Ω, Σ) such that the map ρ 7→ Pρ ,
(2.2)
is affine, i.e. it maps convex combinations of states into the corresponding convex combination of probability distributions. This has a natural interpretation: a system can be prepared in a mixture λρ1 + (1 − λ)ρ2 of states by randomly choosing the preparation procedure according to the individual state ρ1 with probability λ and ρ2 with probability 1 − λ. The distribution of the results will then reflect this randomized preparation as well. The most common measurement is that of an observable such as energy, position, spin, etc. Any given observables is described by some selfadjoint operator X on the Hilbert space H and we suppose for simplicity that it has a discrete spectrum, that is, it can be written in the diagonal form X=
dimH X
xa Pa .
(2.3)
i=1
with xa ∈ R and Pa one dimensional projections onto the eigenvectors of X. The result X of the measurement of the observable X for a preparation given by the state ρ, is a random element of the set Ω = {x1 , x2 , . . . } of eigenvalues of X and has the probability distribution Pρ [X = xa ] = Tr(Pa ρ).
(2.4)
This measurement will give statistical information about the diagonal elements of the density matrix ρ with respect to the eigenbasis of X, and it suggests that in order to estimate all matrix elements of ρ one would have to probe the system from a number of “directions” by performing different measurements on identically prepared systems. This can be generalized to the case of infinite dimensional Hilbert space, and measurements with outcomes in arbitrary measure spaces. In the next section we will see that an infinite dimensional density matrix can be estimated by measuring a randomly chosen observable from a continuous family of non-commuting observables.
2.1
Quantum homodyne tomography and the Wigner function
An important example of quantum system is the monochromatic light in a cavity, described by density matrices on the Hilbert space of complex valued square integrable functions on the real line, L2 (R). A distinguished orthonormal basis of this space is given by the vectors ψk (x) = Hk (x)e−x
2
/2
3
,
k = 0, 1, 2 . . . ,
(2.5)
where Hk are the Hermite polynomials normalized such that ψk (x) is a unit vector representing the pure state of k photons inside the cavity. We will denote the matrix elements of ρ with respect to this basis by ρi,j . Notice that the diagonal of the density matrix is a probability distribution over the nonnegative numbers pk = ρk,k . This is the probability distribution of results when measuring the number of photons in the cavity prepared in state ρ. Clearly this distribution does not contain information about the off-diagonal elements of ρ thus it is not sufficient for identifying the state of the system. This is a typical situation in state estimation and one has to devise experiments in which the systems are looked at subsequently from “different directions”, a broadly described methodology which in the physics literature goes by the name of quantum tomography. Quantum Homodyne Tomography is one such measurement method which is frequently used in quantum optics at the estimation of the quantum state of light [3, 20, 24]. We skip the measurement set-up which is described in detail in [1, 4] and present the statistical problem associated to this measurement. We observe (X1 , Φ1 ), . . . , (Xn , Φn ), i.i.d. random variables with values in R × [0, π] and distribution Pρ whose density with respect to the measure π −1 dφ × dx is given by pρ (x, φ) =
∞ X
ρj,k ψk (x)ψj (x)e−i(j−k)φ .
(2.6)
j,k=0
Since ρ is a positive definite matrix of trace 1, and ψj form an orthonormal basis, it follows that pρ is a probability density: real, nonnegative, integrates to 1. The data (Xℓ , Φℓ ), ℓ = 1, . . . , n, come from independent measurements on identically prepared pulses of light escaping from the cavity, whose state is completely encoded in the matrix ρ. For each of the systems independently, we repeat the following experimental procedure: we first choose the angle Φ uniformly distributed over [0, π] and then measure a certain observable Xφ called quadrature, obtaining a real valued result with probability density pρ (x, φ). The quadrature is defined as the linear combination Xφ := cos φQ + sin φP, where Q and P are the electric and magnetic fields of the light beam given by the selfadjoint operators Qψ(x) = xψ(x),
and
Pψ(x) = −i
dψ . dx
The characteristic functions of these densities can be put together to define a function of two variables fρ (u, v) := Tr ρ exp(−itXφ ) = F1 [pρ (·, φ)](t), W
(2.7)
where we have used the polar coordinates (u, v) = (t cos φ, t sin φ), and F1 is the Fourier transform with respect to the first variable, for fixed φ. Note that our convention for defining the Fourier transform and its inverse are the following Z ∞ Z ∞ 1 −ixt −1 F [f ](t) = f (x)e dx, F [g](x) = g(t)eixt dt. 2π −∞ −∞
Equivalently, we can write
fρ (u, v) = Tr ρ exp(−iuQ − ivP) , W
(2.8)
which resembles a characteristic function of a bivariate probability density, namely the joint density of Q and P. However, since the operators Q and P do not commute with each other, we cannot speak of their joint fρ (u, v) is in general not the characteristic function of a probability probability distribution and the function W density but rather of the so called Wigner function fρ ](q, p), Wρ (q, p) := F2−1 [W
(2.9)
a “quasi-distribution” which may take negative values but whose marginals are bona-fide probability densities. As we will see below, the Wigner function Wρ is in one to one correspondence with the density matrix ρ, and in quantum optics one frequently uses the Wigner function as an alternative representation of the quantum state, having the advantage that it illustrates important “quantum features” such as squeezing and negative oscillations. From (2.7) and (2.9) we deduce that the probability density of the data pρ (x, φ) is the Radon transform of the Wigner function Z ∞ Wρ (x cos φ − t sin φ, x sin φ + t cos φ)dt. R[Wρ ](x, φ) = −∞
4
adding Quantum Homodyne Tomography to a long the list of applications ranging from computerized tomography to astronomy and geophysics [7]. Another important feature of the Wigner function is that it can be used as a computational tool: for any selfadjoint operator X there exists a function WX from R2 to R such that the expectation of X is given by ZZ Tr(Xρ) = 2π WX (q, p)Wρ (q, p)dqdp. (2.10) In particular, the correspondence between the density matrix ρ and the Wigner function Wρ is an L2 isometry up to a constant kWρ −
Wτ k22
:=
ZZ
|Wρ (q, p) − Wτ (q, p)|2 dpdq =
∞ 1 X |ρi,j − τi,j |2 . 2π i,j=0
(2.11)
The space of Wigner function has an overlap with that of probability distributions. For example, all Gaussian densities which are bounded from above by 1/π are Wigner functions, while the rest do not correspond to physical states. This is due to the celebrated Heisenberg’s uncertainty relations which say that the non-commuting observables P and Q cannot have probability distributions such that the product of their variances is smaller than 41 . In general, a Wigner function cannot be too “peaked”: |Wρ (q, p)| ≤
1 , π
for all (q, p) ∈ R2 .
(2.12)
Some examples of quantum states which can be created in laboratory are given in Table 1 of [1]. Typically, the corresponding Wigner functions have a Gaussian tail but need not be positive. For example the state of one-photon in the cavity is described by the density matrix with ρ1,1 = 1 and all other elements zero which is equal to the orthogonal projection onto the vector ψ1 . The corresponding Wigner function is W (q, p) =
1 (2q 2 + 2p2 − 1) exp(−q 2 − p2 ). π
In conclusion, although we deal with a problem which is similar to that of Positron Emission Tomography, the parameter space is different from the space of probability densities and special techniques have to be developed for this situation.
3
The main results
Our problem is that of estimating the Wigner function Wρ (z), defined on the plane z = (q, p). In order to prove rates of convergence some restrictions are necessary to be imposed to the class. We consider the class W(β, L) of Wigner functions which are continuous and whose Fourier transform satisfy Z 1 f (w)|2 exp(2β|w|) dw ≤ L |W (3.1) 4π 2 R2 for β and L positive constants. This condition implies that the function to be estimated is very smooth. Such classes appeared in the statistical literature in [15], and we subsequently used in the context of density estimation [11], functional estimation, regression problems [10], and tomography [5]. In [4] we have argued that from the physical point of view it is natural to assume that the Wigner function of a state which can be created in the lab belongs to such a class. We use a kernel-type estimator based on the following function called a band-limited filter Kδn (u) =
1 4π 5
Z
δn
−δn
reiru .
(3.2)
This filter has already been used in the context of tomography [19, 17, 5], and its Fourier transform is e δn (t) = 1 |t|Iδn (t), K 2
(3.3)
where Iδn is the indicator function of {t : |t| ≤ 1/δn }. The estimator we use is n
X cn (z) = 1 Kδ ([z, Φi ] − Xi ) W n i=1 n
(3.4)
with i.i.d. observations (Xi , Φi ), for i = 1, . . . , n, with density pρ (x, φ) = R[Wρ ](x, φ). Following [19] we define the dual operator R# on L1 (R × [0, π]) by R [h](z) =
Z
R# R[W ](z) =
Z
#
Then
2π
h([z, φ], φ) dφ.
(3.5)
R[W ]([z, φ], φ) dφ
(3.6)
0 2π
0
represents the integrals of W over all lines passing through the point x. Note that in general R# R[W ](z) ≥ 0 for all Wigner functions W and all z ∈ R2 , and the number states ψk with k odd have the property that R# R[W ](0) = 0. In [5] it is assumed that the probability distributions f to be estimated are strictly positive which implies that R# R[f ](z) > 0 for all z. For the upper bound we will assume that the latter condition holds.
3.1
The upper Bound
Theorem 1 For any W ∈ W(β, L) and any fixed z ∈ R2 such that R# R[W ](z) > 0 we have as n → ∞,
where C ∗ =
π 3(4πβ)3 .
i h 3 cn (z) − W (z))2 = C ∗ R# R[W ](z) × (log n) (1 + o(1)) E (W n
(3.7)
Proof. We will provide only the main steps of the proofs pointing out where the assumption on the class of Wigner function plays a role. For a more detailed proof of the bounds for the class of probability densities in A(β, L) we refer to [5]. The risk can be decomposed in two parts, the bias and the stochastic part i i 2 h h cn (z)) − W (z) + E (W cn (z) − E(W cn (z)))2 cn (z) − W (z))2 = E(W E (W := b2n (z) + σn2 (z).
(3.8)
On one hand, using property (3.3) of the kernel and the inverse Fourier transform, the bias can be written as Z 1 f (w)I |w| > 1 e−ihw,zi dw. bn (z) = W (3.9) (2π)2 δn
By using Cauchy-Schwarz and the fact that W ∈ W(β, L) we get b2n (z) ≤
1 −2β/δn (1 + o(1)), e δn
(3.10)
as δn → 0. With the choice 1/δn = log n/(2β) the bias upper bound becomes b2n (z) ≤ c
log n (1 + o(1)), n 6
as n → ∞.
(3.11)
On the other hand, the variance is equal to 1 σn2 (z) = VarKδn ([z, Φ] − X) = n 1 1 2 2 E Kδn ([z, Φ] − X) − (E [Kδn ([z, Φ] − X)]) , n n
(3.12)
where (X, Φ) is a random variable with probability density pρ (x, φ) = R[Wρ ](x, φ). The second term can be bounded as follows Z Z 1 1 −2β|w| f (w)|2 e2β|w| dw = O( 1 ), (E [Kδn ([z, Φ] − X)])2 ≤ (3.13) (|w|)e dw × |W I δn n (2π)2 n n
with O( n1 ) uniformly with respect to W ∈ W(β, L). The first term is Z πZ 2 Kδ2n ([z, φ] − y)pρ (y, φ) dφ dy. E Kδn ([z, Φ] − X) = 0
Now denote
3 G(u) = π and let Gδ (u) = (1/δ)G(u/δ). We have Z Kδ2n ([z, φ] − y)pρ (y, φ) dy = R
Z
(3.14)
R 1
r cos(ur) dr 0
π2 3(2π)4
1 δn
3
2
,
(3.15)
(Gδn ∗ R[W ](·, φ))([z, φ]).
(3.16)
Using [5] Lemma 4, we have that as δ → 0, Z π (Gδ ∗ R[W ](·, φ)) dφ = R# R[W ](z)(1 + o(1)) + O(δ 1/3 ). 0
With this the second term of the variance can be written as 3 4−4/3 ! 2 π2 1 1 # E Kδn ([z, Φ] − X) = , R R[W ](z)(1 + o(1)) + O 4 3(2π) δn δn as δn → 0. Thus as n → ∞
(log n)3 (1 + o(1)) + O n
(log n)4−4/3 n 3 log n = C ∗ R# R[W ](z)(1 + o(1)) + O(log−1/3 n) n
σn2 (z) ≤ C ∗ R# R[W ](z)
If R# R[W ](z) > 0 then we obtain the claimed constant. Notice that if R# R[W ](z) = 0 then the convergence 3 rate is faster than logn n .
3.2
The lower bound
In order to prove a lower bound result we consider the slightly modified class of Wigner functions W(β, L, αn ) = {W ∈ W(β, L) : R# R[W ](z) ≥ αn }, for a sequence αn such that limn→∞ αn = 0 and limn→∞ (αn (log n)1/3 ) = ∞. Let us denote 1/2 (log n)3 rn (W, z) = C ∗ R# R[W ](z) . n
7
(3.17)
(3.18)
Theorem 2 For a fixed z ∈ R2 , we have
cn (z) − W (z) W sup E lim inf inf n→∞ W rn (W, z) cn W ∈W(β,L,αn )
where inf W cn denotes the infimum over all estimators of W (z).
!2 ≥1
Proof. The proof is based on the standard procedure of building a hardest parametric subfamily for the class W(β, L) of Wigner functions of the form Wc = Wα + cga , (3.19) where c is a parameter in a neighborhood of the origin, Wα and ga are functions to be defined shortly. The essential point of the proof is that the family of probability densities f0 + cga used in [5] is not always contained in our parameter space consisting of Wigner functions. For illustration we will show that for some parameters β, the function f0 defined in equation (29) of [5] is not the Wigner function of a quantum state. Indeed, suppose for the moment that this was the case, i.e. f0 = Wρ for some state ρ. Then by the rotation symmetry of f0 , the density matrix ρ must be diagonal and pρ (x, φ) =
∞ X
ρk,k ψk2 (x).
k=0
UsingPthe inequality [8] kψk k∞ ≤ k, where k is a constant whose value is slightly bigger that 1, and the fact that ρk,k = 1, we find that kpρ k∞ ≤ k. However, the Radon transform of the Wigner function Wρ is pρ (x, φ) = R[Wρ ](x, φ) = R[f0 ](x, φ) =
1 β , π x2 + β 2
which violates the above bound for β ≤ 1/(πk). We thus define a parametric subfamily of W(β, L) which is a suitable modification of the family considered in [5] in order to cope with this problem. Construction of Wα . Consider the Mehler formula, (see [8], 10.13.22) 1 1 2 −x2 21 − z z √ . Hn (x) e = p exp −x 1+z πn!2n π(1 − z 2 ) n=0 ∞ X
n
Integrating both terms with fα (z) = α(1 − z)α we get Z 1 Z ∞ X pα (x, φ) := ψn (x)2 fα (z)z n dz = n=0
0
0
The Fourier transform of pα is fα (w) := F [pα ](w) = W
Z
0
1
1
fα (z) 1−z p dz. exp −x2 1+z π(1 − z 2 )
fα (z) 2 1+z dz exp −|w| 1−z 4(1 − z)
(3.20)
(3.21)
(3.22)
R fα (0) = 1 which is satisfied for the chosen Notice that the normalization condition pα = 1 is equivalent to W functions fα , thus pα is a probability density corresponding to a diagonal density matrix ρα with elements Z 1 α ρk,k = z k fα dz. (3.23) 0
We denote by Wα the Wigner function whose Fourier transform is defined in equation (3.22) with α > 0 a parameter to be fixed later. This function is considered in a more general form in [4], and corresponds to fαǫ for ǫ = 0 . 8
Construction of ga . Let [5] 1 Ha (s) = (2π)2
Z
∞
r cos(sr) dr, sinh2 βr
a−1
1+
0
s ∈ R,
(3.24)
where a > 0 is a parameter which will depend on n as a = an = nη with 0 < η < 1. The Fourier transform of ga is t e a (t) = 1 , t ∈ R. (3.25) H 4π 1 + a−1 sinh2 βt Let
1 ga (z) = 2(2π)3
Z
R2
and its Fourier transform
Now, let us consider the family
ga (w) = e
where the real parameter c satisfies
|w| cos(hz, wi) dw, sinh2 β|w|
a−1
1+
|w| 1 , 4π 1 + a−1 sinh2 β|w|
w ∈ R2 .
z ∈ R2 ,
(3.26)
(3.27)
Wc = Wα + cga
(3.28)
q , |c| ≤ Ca = √ a(log a)3/2
(3.29)
with q > 0 sufficiently small. By translating with z in R2 we obtain our hardest family for pointwise estimation at the point z Wcz (ζ) = Wc (ζ − z). (3.30)
We will check now that Wcz belongs to the class W(β, L) for an appropriate choice of α in Wα , which means that Wcz is a Wigner function and Z Z 2 2 1 1 fz f 2β|w| 2β|w| W (w) W (w) e dw = dw ≤ L. (3.31) e c c 4π 2 4π 2 By Lemma 5 we have Wα ∈ W(β, L/4) for a small enough α > 0, and from Lemma 5 of [5] we have that Z c2 2 |˜ ga (w)| e2β|w| dw ≤ L/4, (3.32) 4π 2
for all c ≤ Ca . The last two conditions together imply (3.31).
Furthermore, Wcz has to be a Wigner function. As translations in the plane transform Wigner functions into Wigner functions, we need only to show this for Wc . This means that there exist a family of density matrices ρc such that their corresponding Wigner functions are Wc . The invariance of Wc under rotations in the plane translates into the fact that ρc has all off-diagonal elements equal to zero, and thus we only need to show that all its diagonal elements are positive and add up to one. The relation between the diagonal matrix elements and the Wigner function is [18] Z 2 1 fc (w) dw, ρck,k = e|w| /4 Lk (|w|2 /2)W (3.33) 2π R2
where Lk are the Laguerre polynomials. By linearity we have
a ρck,k = ρα k,k + cτk,k
where a τk,k =
and
Z
∞
2
te−t
/4
0
ρα k,k = α
Z
1
(3.34)
e a (t) dt, Lk (t2 /2)H
(3.35)
z k (1 − z)α .
(3.36)
0
9
Corroborating the result shown in Lemma 4
as a, k → ∞, with that of Lemma 2 in [4]
a τkk = O k −5/4 (log a)4 , −(1+α) ρα , k,k ∼ k
we conclude that if α < 1/4, then ρck,k ≥ 0 for all k ≥ 0 and |c| ≤ Ca for a sufficiently large. Now we can use the fact that for the family of translated functions, as defined in eq. (3.30), R# R[W z ](z) = R# R[W ](0). Indeed, using (2.7) we get Z 1 z f z (t cos φ, t sin φ)eitx dt W R[W ](x, φ) = 2π Z 1 f (t cos φ, t sin φ)e−it[z,φ] eitx dt = W 2π Z 1 f (t cos φ, t sin φ)eit(x−[z,φ]) dt W = 2π = R[W ](x − [z, φ], φ).
Now, from definition of R# we get R# R[W z ](z) = =
Z
2π
R[W z ]([z, φ], φ)dφ
0
Z
(3.37)
2π
0
R[W ](0, φ)dφ = R# R[W ](0).
Thus, Rc (z) := R# R[Wcz ](z) = R# R[Wα ](0) + cR# R[ga ](0) for any z. Given in our case Wα and ga are invariant under rotations we obtain Z 2π Z 2π R0 := R# R[Wα ](0) = R[Wα ](0, φ) dφ = pα (0, φ) dφ =
√ 2 πα
Z
0
1
0
0
(1 − z)α−1/2 dz > 0. (1 + z)1/2
(3.38)
For the second term, using eq. (3.24), Lemma 5 in [5], and definition of Ca sup |cR# R[ga ](0)| = sup |2πcHa (0)| = o(1) as
|c|≤Ca
|c|≤Ca
a → ∞.
(3.39)
We conclude that Rc = R0 (1 + o(1)) ≥ αn , for n → ∞ and thus, for a large enough, Wcz ∈ W(β, L, αn ). Moreover, from (3.18), rn (Wcz , z)2 = rn (Wα , 0)2 (1 + o(1)) = C ∗ R0
(log n)3 (1 + o(1)). n
(3.40)
The rest of the proof is based on the Van Trees inequality and follows along the lines of [5]. The main difference is in the proof of Lemma 6 where the Fisher information of the family of densities π1 R[Wcz ] is approximated. Take a continuously differentiable probability density, λ0 (c), defined on the interval [−1, 1], such that λ0 (−1) = λ0 (1) = 0, with a finite Fisher information I0 . The new density λ(c) = λa (c) = Ca−1 λ0 (Ca−1 c) is a prior density with finite Fisher information I(λ) = I0 Ca−2 . Finally let us define I(c) the Fisher information of the family of densities π1 R[Wcz ]. Using the Van Trees inequality 2 2 cn (z) − W (z) cn (z) − W z (z) ≥ inf sup EWcz W sup EW W inf c cn W
≥ inf
cn W
cn W
W ∈W(β,L,αn )
Z
Ca
−Ca
EWcz
2 z c Wn (z) − Wc (z) λa (c)dc
10
≥
|c| 1, the right hand side is bounded from above by Cx−(1+2α) and from below by cx−(1+2α) , for some positive constants c, C. For the first derivative one can see that p′α (x) =
α(α + 1)2α+2 x2 1 √ pα (x) − x π
Z
x
0
α u2α du + √ exp(−x2 ), 2 2 α+2 (u + x ) πx
which can be bounded using the same argument as before to obtain −(2+2α) |p(1) , α (x)| ≤ C1 x
for some C1 and |x| > 1. For the second derivative the procedure is the same. Lemma 4 Let τ a be the diagonal matrix with elements Z ∞ 2 t2 1 a Lk (t2 /2)e−t /4 τk,k = dt. 4π 0 1 + a−1 sinh2 βt Then
a τkk = O k −5/4 (log a)4
(4.6)
(4.7)
Proof. We analyze first the dependance on a for a fixed k. The functions {Lk (u)e−u/2 }k≥0 form a orthonormal basis of L2 (R). By using Cauchy-Schwarz followed by Lemma 5 from [5] we get a |τk,k |≤
1 4π
Z
∞ 0
t3 dt −1 (1 + a sinh2 βt)2
for some positive constant C. 12
1/2
≤ C(log a)3/2 ,
a Let now a be fixed and look at the asymptotic behavior of τk,k as k → ∞. We use the differential equation of the Laguerre polynomials, [12] 8.979:
Ln (x) =
1 ((x − 1)L′n (x) − xL′′n (x)) . n
Thus d Ln (t2 /2) = tL′n (t2 /2) dt d2 Ln (t2 /2) = L′n (t2 /2) + t2 L′′n (t2 /2) dt2 which implies
(4.8) (4.9)
t2 ′′ 2 1 1 d2 d Ln (t2 /2) − t−1 Ln (t2 /2) Ln (t /2) = 2 2 2 dt 2 dt 1 d2 2 −1 d 2 2 2 (t − 1)t Ln (t /2) = Ln (t /2) − 2 Ln (t /2) . 2n dt dt
and
Using integration by parts we obtain the formula Z ∞ Z 2 2 1 ∞ 1 t2 dt = Lk (t2 /2)e−t /4 Lk (t2 /2)e−t /4 f (t) dt, 2 −1 4π 0 k 0 1 + a sinh βt where the function f is given by a−1 (P2 (t) sinh 2βt + P3 (t) cosh 2βt) P3 (t)a−2 sinh2 2βt P1 (t) + + . 1 + a−1 sinh2 βt (1 + a−1 sinh2 βt)2 (1 + a−1 sinh2 βt)3 with Pi (t) polynomials with degree at most four, whose coefficients do not depend on a. R1 R∞ We split the integral into 0 and 1 and use the following bounds for the behavior of Laguerre polynomials in the two intervals (see [21] Theorem 8.9.12 and Theorem 7.6.4): max e−x/2 |Ln (x)| = O(n−1/4 ),
x∈[1,∞)
and Ln (x) = x−1/4 O(n−1/4 ), uniformly on (0, 1]. Thus using Lemma 5 of [5], Z ∞ 1 t2 2 −t2 /4 = O (log a)4 k −5/4 dt L (t /2)e k 4π 1 + a−1 sinh2 βt 0 Lemma 5 For any (β, L) there exists an α > 0 such that Wα belongs to the class W(β, L). Proof. By using Minkowski inequality we get Z
2 √ fα (z) 1+z r exp −r2 + βr dz dr ≤ 1−z 4(1 − z) 0 0 #2 "Z Z 1/2 1 ∞ fα (z) 1+z + 2βr dr dz . r exp −r2 2(1 − z) 1−z 0 0 Z
Z 2 f e2β|w| W (w) dw = 2π α
∞
1
The interior integrals satisfies the bound Z ∞ 1+z + 2βr dr ≤ C(β)(1 − z). r exp −r2 2(1 − z) 0 13
for some positive constant C(β). Thus Z as α → 0.
Z 2 f e2β|w| W (w) dw ≤ C(β) α
0
1
α(1 − z)α−1/2
2
= C(β)
α α + 1/2
2
→ 0,
Lemma 6 For α ≤ 1/2, the Fisher information of the family of densities RWcz satisfies I(c) = C ∗ (log a)3 R0−1 (1 + o(1))
(4.10)
where R0 is defined in (3.38). Proof. We sketch the proof following the line of [5] and pointing out where the differences appear. After some transformations, the Fisher information of the family can be brought to the form Z Z Ha2 (u) 1 π du. (4.11) dφ I(c) = z π 0 R[Wc ]([z, φ] − u, φ) By expanding R[Wcz ]([z, φ] − u, φ)−1 up to the second order and bounding the second derivative one can show that Z Z Ha2 (u) 1 2 R[W z ]([z, φ] − u, φ) du − R[W z ]([z, φ], φ) Ha (u) du c c Z 2 2 =O u Ha (u) du = O (log a)2 , as a → ∞. (4.12) Recall that R[Wcz ]([z, φ], φ) = R[W c ](0, φ) does not depend on φ, cf. (3.37). Thus we can write Z 1 Ha2 (u)du + O((log a)2 ) I(c) = R[Wc ](0, φ) From [5] we have Z
Ha2 (u)du =
1 (log a)3 (1 + o(1)) as 3 · 2 · (4πβ)2
a → ∞.
(4.13)
From equations (4.11)–(4.13) one obtains the desired result. The main difference in the proof appears when deriving the bound (4.12). To derive this, one needs a bound of the absolute value of the second derivative 2 ∂ 1 = O(1), as a → ∞, ∂t2 R[W z ](t, φ) c t=[z,φ]−u which is uniform in u. We have ′′ 2 pc (u, φ) p′c (u, φ)2 ∂ 1 ≤ + ∂t2 R[W z ](t, φ) pc (u, φ)2 pc (u, φ)3 c t=[z,φ]−u
(4.14)
where
pc (u, φ) = R[Wcz ]([z, φ] − u, φ) = R[Wc ](−u, φ) = R[Wc ](u, φ) = pα (u, φ) + cHa (u),
and the derivatives are with respect to the first argument. If u is in a compact interval, the two terms on the right hand side of (4.14) are O(1) as a → ∞ since in that case sup|c|