Kernel entropy estimation for linear processes Hailin Sanga , Yongli Sangb and Fangjun Xuc
arXiv:1712.00196v1 [math.ST] 1 Dec 2017
a
Department of Mathematics, University of Mississippi, University, MS 38677, USA
[email protected]
b
Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
[email protected]
c
School of Statistics, East China Normal University, Shanghai 200241, China and NYU-ECNU Institute of Mathematical Sciences at NYU Shanghai, Shanghai 200062, China
[email protected];
[email protected] Abstract Let {Xn : n ∈ N} be a linear process with bounded probability density function f (x). We R study the estimation of the quadratic functional R f 2 (x) dx. With a Fourier transform on the kernel function and the projection method, it is shown that, under certain mild conditions, the estimator X 2 Xi − Xj K n(n − 1)hn hn 1≤i m, and Rd |λ|2γ |φε (λ)|2 dλ < ∞ for some γ ∈ (0, 1]. Assumption 2 det(a0 ) 6= 0, det(ap ) 6= 0 and det(aq ) 6= 0 with p = min{i ∈ N : det(ai ) 6= 0} and ∞ P q = min{i ∈ N : det(ai ) 6= 0 and i > p} < ∞, |ai |γ < ∞, E |eιλε1 − φε (λ)|4 ≤ cγ,4 |λ|4γ ∧ 1 i=0 R and Rd |λ|2γ |φε (λ)|2 dλ < ∞ for some γ ∈ (0, 1], ∃ two indices ir s.t. det(air +r − air ) 6= 0 for each r ∈ {1, 2, · · · , q}. Assumption 3 det(a0 ) 6= 0, det(ap ) 6= 0 with p = min{i ∈ N : det(ai ) 6= 0} < ∞, ∃ indice ir such ∞ P that det(air − air +r ) 6= 0 for each r ∈ {1, · · · , p}, |ai |γ < ∞, E |eιλε1 − φε (λ)|2 ≤ cγ,2 |λ|2γ ∧ 1 i=0 R and Rd |λ|2γ |φε (λ)|2 dλ < ∞ for some γ ∈ (0, 1]. 16
Using similar arguments as in the proof of Theorems 2.1 and 2.2 with some proper modifications, we can get the following results. Theorem 6.1 Under the assumption 1 or 2, we further assume that f is bounded. Then there exist positive constants c1 and c2 such that Z 1 2 2γ f (x) dx ≤ c1 + det(hn ) , (23) E Tn (hn ) − n Rd n ηn,γ 1 det(hn )2γ 1 X 2 + 2 + E Tn (hn ) − E Tn (hn ) − Yi ≤ c2 2 n n det(hn ) n n
(24)
i=1
and, if additionally n det(hn ) → ∞ as n → ∞, i √ h L n Tn (hn ) − E Tn (hn ) −→ N (0, 4σ 2 ), where Yi = 2 f (Xi ) −
R
Rd
(25)
n P ∞ P f 2 (x) dx , ηn,γ = |ai |γ and σ 2 = lim n−1 Var (Sn ) with n→∞
`=0 i=`
Sn =
n X
Z f (Xi ) −
2
f (x) dx . Rd
i=1
Theorem 6.2 Under the assumption 3, we further assume that f is bounded and ∞. Then there exist positive constants c3 and c4 such that Z 1 2 2γ + det(hn ) , f (x) dx ≤ c3 E Tn (hn ) − n Rd n 1 X 1 det(hn )γ E Tn (hn ) − E Tn (hn ) − Yi ≤ c4 + √ n n det(hn ) n
R
Rd
b |K(λ)| dλ
i1 c17 n2 |λ1 − λ2 |γ |φε ((λ1 − λ2 )ap1 )| |φε ((λ1
− λ2 )ap2 )| .
Similarly, |I2 (λ1 , λ2 )| ≤ c18 n2 |λ1 − λ2 |γ |φε ((λ1 − λ2 )ap1 )||φε ((λ1 − λ2 )ap2 )|. Finally, |I3 (λ1 , λ2 )| ≤ ≤
X
E [P H(X )(λ )P H(X )(−λ )P H(X )(−λ )P H(X )(λ )] i i 1 i i 2 j j 1 j j 2
j−i≥p2 +1 c19 n2 |φε ((λ1
− λ2 )ap1 )||φε ((λ1 − λ2 )ap2 )|.
So I0,p2 +1
Z c20 b 1 hn )||K(λ b 2 hn )|(1 + |λ1 − λ2 |γ )|φε ((λ1 − λ2 )ap )||φε ((λ1 − λ2 )ap )| dλ1 dλ2 ≤ 2 |K(λ 1 2 n R2 c21 ≤ 2 . n hn
24
Therefore, I02 ≤ I0,1 + I0,2 + · · · + I0,p2 + I0,p2 +1 ≤
c22 . n2 hn
(38)
Using similar arguments, for 1 ≤ p ≤ p1 , we can obtain Ip2 ≤
c23 . n 2 hn
(39)
We now estimate Ip1 +1 . Note that 1 (2π)2 n2 (n − 1)2
Ip21 +1 =
Z R2
b 1 hn )K(−λ b K(λ 2 hn ) Jn (λ1 , λ2 ) dλ1 dλ2 ,
where Jn (λ1 , λ2 ) =
i h E Pk1 H(Xi1 )(λ1 )Pk2 H(Xj1 )(−λ1 )Pk3 H(Xi2 )(−λ2 )Pk4 H(Xj2 )(λ2 ) .
X i1 6=j1 , i2 6=j2 k1 ≤i1 , k2 ≤j1 , k3 ≤i2 , k4 ≤j2 |k1 −i1 |+|k2 −j1 |≥p1 +1 |k3 −i2 |+|k4 −j2 |≥p1 +1
Recall properties of the projection operator Pk defined in (29). By Cauchy-Schartz inequality and the condition (3), we can obtain |Jn (λ1 , λ2 )| ≤ c24 |λ1 |2γ |λ2 |2γ |φε (λ1 a0 )|2 + |φε (λ1 ap1 )|2 |φε (λ2 a0 )|2 + |φε (λ2 ap1 )|2 X |ai1 −k1 |γ |aj1 −k2 |γ |ai2 −k3 |γ |aj2 −k4 |γ . (40) × i1 6=j1 , i2 6=j2 k1 ≤i1 , k2 ≤j1 , k3 ≤i2 k4 ≤j2 |k1 −i1 |+|k2 −j1 |≥p1 +1, |k3 −i2 |+|k4 −j2 |≥p1 +1 at least two k indices are identical, identical k indices are greater than the others
There are only four possibilities for the k indices in the above summation: 1) all the k indices are identical; 2) three k indices are identical and strictly larger than the last one; 3) two pairs of different identical k indices; 4) one pair of identical k indices and strictly larger than the remaining two different indices. For the first three possibilities, the summation in (40) is less than a constant ∞ P multiple of n2 since |ai |γ < ∞. For the last possibility, the summation in (40) is less than a i=0
constant multiple of n2 ηn,γ where ηn,γ =
n P ∞ P
|ai |γ . Therefore,
`=0 i=`
Ip21 +1 ≤
c25 ηn,γ n2
2 2γ 2 2 b |K(λh )| |λ| |φ (λa )| + |φ (λa )| dλ . n ε 0 ε p1
Z R
Putting inequalities (37), (38), (39) and (41) together gives ηn,γ 1 2 + 2 . kDn k2 ≤ c26 n2 hn n Step 4. Here we estimate E |Nn − N n |2 , where Nn =
1 2π
Z
b K(0) φn (λ) − φ(λ) φ(−λ) dλ.
R
25
(41)
Note that Z 2 1 2 b b K(λh E |Nn − N n | = E n ) − K(0) φn (λ) − φ(λ) φ(−λ) dλ 2π R Z 2 ιλX1 1 b b ≤ K(λhn ) − K(0) e φ(−λ) dλ E 2 (2π) n R i2 1 h = E Kn ∗ f (X1 ) − f (X1 ) , n where Kn (x) = h1n K( hxn ). Since f is bounded, 2
E |Nn − N n |
Z 2 c27 Kn ∗ f (x) − f (x) dx ≤ n R Z c27 2 2 b b = |K(λh n ) − K(0)| |φ(λ)| dλ (2π)2 n R Z c27 2 2 b b |K(λh ≤ n ) − K(0)| |φε (a0 λ)| dλ (2π)2 n R c28 h2γ n ≤ , n
where in the first equality we used the Plancherel theorem. It is easy to see that Z n 1X 2 Nn = f (Xi ) − f (x) dx . n R i=1
Combining all the above results gives n 1 X 2 E Tn (hn ) − E Tn (hn ) − Yi ≤ c29 n i=1
ηn,γ h2γ 1 n + + 2 2 n hn n n
! .
Step 5. We show the central limit theorem for n
1 X √ f (Xi ) − n i=1
Z
f 2 (x) dx .
R
It suffices to consider the case when the Xi are dependent. Note that Z P0 f (Xi ) − f 2 (x) dx R
= E [f (Xi )|F0 ] − E [f (Xi )|F−1 ] i−1 ∞ P P Z i λaj εi−j +ι λaj εi−j h ι 1 ιλai ε0 ιλai ε00 j=i+1 j=0 = E φ(λ) e e −e dλ F0 , 2π R where {ε0i : i ∈ Z} is an independent copy of {εi : i ∈ Z}.
26
Then ∞
i−1
P P Z
Z
ι λaj εi−j +ι λaj εi−j 0
2 j=i+1 j=0 f (x) dx ≤ φ(λ) e eιλai ε0 − eιλai ε0 dλ
P0 f (Xi ) − 2 2 R R
Z
0
≤ |φ(λ)||eιλai ε0 − eιλai ε0 | dλ 2 R
Z
≤ 2 |φ(λ)||eιλai ε0 − φε (λai )| dλ 2 RZ |λ|γ |φ(λ)| dλ ≤ 2|ai |γ RZ |λ|γ |φε (λ)|2 dλ. ≤ c30 |ai |γ R
Now by Lemma 1 in Wu (2006), we can easily obtain n
1 X √ f (Xi ) − n i=1
Z
d f 2 (x) dx −→ N (0, σ 2 )
R
for some σ 2 ∈ (0, ∞). Using similar arguments as in Step 3, we can show that sup n
where Sn = Hence,
n P
i=1 σ2 =
f (Xi ) −
R
Rf
2 (x) dx
1 E |Sn |4 < ∞, n2
.
lim n−1 Var (Sn ). This completes the proof.
n→∞
Proof of Theorem 2.2. The proof of this theorem is similar to that of Theorem 2.1. We only need to modify the Step 3 in the proof of Theorem 2.1 and consider the dependent case. Using the same notations as in the proof of Theorem 2.1 and then Lemma 7.2, Z √ √ 2 b n E |Dn | ≤ n |K(λh dλ n )|E |φn (λ) − φ(λ)| R Z i h 1 ιλX1 ιλX1 2 b +√ |K(λh )|E |e − E e | dλ n n R Z Z c1 c1 b ≤√ |K(λ)| dλ + √ |λ|2γ |φε (λa0 )|2 dλ. nhn R n R This completes the proof. At the end, we give two useful lemmas which are related to the sufficient and necessary conditions for the assumptions in Theorems 2.1 and 2.2. Lemma 7.3 Let φε (λ) = E [eιλε1 ] for all λ ∈ R. For any γ ∈ [0, 1], if E |ε1 |2γ < ∞, then there exists a positive constant cγ,2 such that E |eιλε1 − φε (λ)|2 ≤ cγ,2 |λ|2γ . Moreover, if E |ε1 |4γ < ∞, there exists a positive constant cγ,4 such that E |eιλε1 − φε (λ)|4 ≤ cγ,4 |λ|4γ . 27
Proof. This first inequality follows from E |eιλε1 − φε (λ)|2 = 1 − |φε (λ)|2 ≤ 1 − [E cos(λε1 )]2 ≤ c1 |λ|2γ . For the second inequality, E |eιλε1 − φε (λ)|4 2 = E 1 − eιλε1 φε (−λ) − e−ιλε1 φε (λ) + |φε (λ)|2 = 1 + φε (2λ)φ2ε (−λ) + φε (−2λ)φ2ε (λ) − 3|φε (λ)|4 = 1 + 2E cos(2λε1 ) E 2 cos(λε1 ) − E 2 sin(λε1 ) + 4E sin(2λε1 )E cos(λε1 )E sin(λε1 ) 2 − 3 E 2 cos(λε1 ) + E 2 sin(λε1 ) λε1 2 = 1 + 2 1 − 2E sin2 (λε1 ) (1 − 2E sin2 ( )) − E 2 sin(λε1 ) 2 λε λε1 1 + 8E sin(λε1 )(1 − 2 sin2 ( )) E sin(λε1 )[1 − 2E sin2 ( )] 2 2 2 λε1 2 − 3 (1 − 2E sin2 ( )) + E 2 sin(λε1 ) 2 4γ ≤ c2 |λ| . This completes the proof. Lemma 7.4 Let φε (λ) = E [eιλε1 ] for all λ ∈ R. If ε1 has non-degenerate distribution, i.e., it is not equal to a constant almost surely, then in the inequalities E |eιλε1 − φε (λ)|2 ≤ cγ,2 (|λ|2γ ∧ 1) and E |eιλε1 − φε (λ)|4 ≤ cγ,4 (|λ|4γ ∧ 1), the range γ ∈ (0, 1] is optimal. Proof. By Cauchy-Schwarz inequality, we only need to show that the range γ ∈ (0, 1] is optimal for the first inequality. Suppose E |eιλε1 −φε (λ)|2 = 1−|φε (λ)|2 ≤ c(|λ|2γ ∧1) for some γ > 1. Since ε2 is an independent copy of ε1 , let φε1 −ε2 (λ) = E [eιλ(ε1 −ε2 ) ], then 1 − φε1 −ε2 (λ) = 1 − |φε (λ)|2 ≤ c(|λ|2γ ∧ 1) for some γ > 1. Note that E |eιλ(ε1 −ε2 ) − 1|2 = 1 − φε1 −ε2 (λ) + 1 − φε1 −ε2 (−λ) = 2(1 − φε1 −ε2 (λ)). Therefore, for λ close enough to 0, 1 2 |φε1 −ε2 (x + λ) − φε1 −ε2 (x)| ≤ E |eιλε1 −ε2 − 1| ≤ E |eιλε1 −ε2 − 1|2 ≤ c1 |λ|γ . Since γ > 1, φ0ε1 −ε2 (x) = 0 for all x ∈ R. That is, |φε (x)|2 = φε1 −ε2 (x) = 1 for all x ∈ R or ε1 equals a constant almost surely. So the γ must be less than or equal to 1.
Acknowledgement We would like to thank the two referees and the Associate Editor for their valuable comments. We would also like to thank Professor Xia Chen for very helpful discussions. F. Xu is partially supported by National Natural Science Foundation of China (Grant No.11401215), Shanghai Pujiang Program (14PJ1403300), 111 Project (B14019) and Natural Science Foundation of Shanghai (16ZR1409700).
28
References [1] Ahmad IA. 1979. Strong consistency of density estimation by orthogonal series methods for dependent variables with applications. Ann. Inst. Statist. Math. 31: 279–288. [2] Bailey WN. 1935. General Hypergeometric Series. Cambridge Tracts in Mathematics and Mathematical Physics. [3] Bickel JP and Ritov Y. 1988. Estimating integrated squared density derivatives: Sharp best order of convergence estimates. Sankhy¯ a Ser. A 50: 381–393. [4] Bondon P and Palma W. 2007. A class of antipersistent processes. J. Time Series Anal. 28: 261-273. [5] Devroye L and Gy¨ orfi L. 1985. Nonparametric Density Estimation: The L1 View. New York: Wiley. [6] Fa¨ y G, Moulines E, Roueff F and Taqqu MS. 2009. Estimators of long-memory: Fourier versus wavelets. J. Econometrics 151: 159-177. [7] Gauss F. 1866. Disquisitiones generales circa sericm infinitam, Ges. Werke 3: 123-163 and 207-229. [8] Gin´e E and Nickl R. 2008. A simple adaptive estimator of the integrated square of a density. Bernoulli 14: 47–61. [9] Gretton A, Borgwardt KM, Rasch MJ, Sch¨olkopf B and Smola A. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13: 723–773. [10] Hall P and Marron JS. 1987. Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6: 109–115. [11] Hipel KW and McLeod AI. 1994. Time Series Modelling of Water Resources and Environmental Systems. Elsevier. [12] Honda T. 2000. Nonparametric density estimation for a long-range dependent linear process. Ann. Inst. Statist. Math. 52: 599–611. [13] Hsing T. 1999. On the asymptotic distributions of partial sums of functionals of infinitevariance moving averages. Ann. Probab. 27: 1579–1599. [14] K¨allberg D, Leonenko N and Seleznjev O. 2012. Statistical inference for R´enyi entropy functionals. Lecture Notes in Comput. Sci. 7260: 36–51. [15] K¨allberg D, Leonenko N and Seleznjev O. 2014. Statistical estimation of quadratic R´enyi entropy for a stationary m-dependent sequence. Journal of Nonparametric Statistics 26: 385– 411. [16] Krishnamurthy A, Kandasamy K, Pocz´os B and Wasserman L. 2015. On estimating L22 divergence. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), San Diego, CA, USA. JMLR: W&CP 38. [17] Laurent B. 1996. Efficient estimation of integral functionals of a density. Ann. Statist. 24: 659–681. [18] Laurent B. 1997. Estimation of integral functionals of a density and its derivatives. Bernoulli 3: 181–211. 29
[19] Leonenko N, Pronzato L and Savani V. 2008. A class of R´enyi information estimators for multidimensional densities. Ann. Statist. 36: 2153–2182. Corrections, Ann. Statist. 38 (2010), 3837–3838. [20] Leonenko N and Seleznjev O. 2010. Statistical inference for the ε-entropy and the quadratic R´enyi entropy. J. Multivariate Anal. 101: 1981–1994. [21] Mittnik S, Rachev ST and Paolella MS. 1998. Stable paretian modeling in finance: Some empirical and theoretical aspects. In : Adler, R. J., Feldman, R.E., Taqqu, M.S., eds. A Practical Guide to Heavy Tails. Birkh¨auser: 79–110. [22] Nadaraya EA. 1989. Nonparametric Estimation of Probability Densities and Regression Curves. Kluwer Academic Pub. [23] Parzen E. 1962. On estimation of a probability density and mode. Ann. Math. Stat. 31: 1065– 1079. [24] Penrose M and Yukich JE. 2013. Limit theory for point processes in manifolds. Ann. Appl. Probab. 23: 2161–2211. [25] R´enyi A. 1970. Probability Theory Amsterdam: North-Holland. [26] Rosenblatt M. 1956. Remarks on some nonparametric estimates of a density function, Ann. Math. Statist. 27: 832–837. [27] Sang H and Sang Y. 2017. Memory properties of transformations of linear processes. Stat. Inference Stoch. Process. 20: 79-103. [28] Schimek MG. 2000. Smoothing and Regression : Approaches, Computation, and Application. John Wiley & Sons. [29] Scott DW. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization 2. John Wiley & Sons. [30] Shannon CE. 1948. A mathematical theory of communication. Bell Syst. Tech. Jour. 27: 379–423, 623–656. [31] Silverman BW. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. [32] Tran LT. 1992. Kernel density estimation for linear processes. Stochastic Process. Appl. 41: 281–296. [33] Wand MP and Jones MC. 1995. Kernel Smoothing. London: Chapman and Hall. [34] Wu WB. 2006. Unit root testing for functionals of linear processes. Econometric Theory 22: 1–14. [35] Wu WB, Huang Y and Huang Y. 2010. Kernel estimation for time series: An asymptotic theory. Stochastic Process. Appl. 120: 2412–2431. [36] Wu WB and Mielniczuk J. 2002. Kernel density estimation for linear processes. Ann. Statist. 30: 1441–1459. [37] Zhou Z. 2014. Inference of weighted V-statistics for non stationary time series and its applications. Ann. Statist. 42: 87–114.
30