2011/15 Semiparametrically Efficient Inference ... - Semantic Scholar

1 downloads 0 Views 834KB Size Report
A pleasant corollary from this invariance structure, which follows from the fact that the group .... Corollary 3.1. ...... 50, Avenue F.D. Roosevelt, CP114/04. B-1050 ...
2011/15 Semiparametrically Efficient Inference Based on Signed Ranks in Symmetric Independent Component Models Paulina ILMONEN Davy PAINDAVEINE

SEMIPARAMETRICALLY EFFICIENT INFERENCE BASED ON SIGNED RANKS IN SYMMETRIC INDEPENDENT COMPONENT MODELS By Pauliina Ilmonen∗ University of Tampere and By Davy Paindaveine† Universit´e Libre de Bruxelles We consider semiparametric location-scatter models for which the p-variate observation is obtained as X = ΛZ + µ, where µ is a p-vector, Λ is a full-rank p × p matrix, and the (unobserved) random p-vector Z has marginals that are centered and mutually independent but are otherwise unspecified. As in blind source separation and independent component analysis (ICA), the parameter of interest throughout the paper is Λ. On the basis of n i.i.d. copies of X, we develop, under a symmetry assumption on Z, signed-rank one-sample testing and estimation procedures for Λ. The proposed procedures enjoy all usual nice properties of rank procedures. Besides, we exploit the uniform local and asymptotic normality (ULAN) of the model to define signed-rank procedures that are semiparametrically efficient under correctly specified densities. Yet, as usual in rank-based inference, the proposed procedures remain valid (correct asymptotic size under the null, for hypothesis testing, and root-n consistency, for ∗ †

Supported by the Academy of Finland. Supported by an A.R.C. contract of the Communaut´e Fran¸caise de Belgique. Davy

Paindaveine is also member of ECORE, the association between CORE and ECARES. AMS 2000 subject classifications: Primary 62G05, 62G10; secondary 62G20,62H99 Keywords and phrases: Independent component analysis, Invariance principle, Local asymptotic normality, Rank-based inference, Semiparametric efficiency, Signed ranks

1

2

ILMONEN AND PAINDAVEINE point estimation) under a very broad range of densities. We derive the asymptotic properties of the proposed procedures and investigate their finite-sample behavior through simulations.

1. Introduction. In multivariate statistics, concepts of location and scatter are usually defined through affine transformations of a noise vector. To be more specific, assume that the observation X is obtained through (1)

X = ΛZ + µ,

where µ is a p-vector, Λ is a full-rank p×p matrix, and Z is some standardized random vector. The exact nature of the resulting location parameter µ and scatter parameter Σ = ΛΛ0 —or equivalently, mixing matrix parameter Λ, say—crucially depends on the standardization adopted. The most classical assumption on Z specifies that Z is standard p-normal. Then µ and Σ simply coincide with the mean vector E[X] and variancecovariance matrix Var[X] of X, respectively. In robust statistics, it is often rather assumed that Z is spherically symmetric about the origin of Rp —in the sense that the distribution of OZ does not depend on the orthogonal p×p matrix O. The resulting model in (1) is then called the elliptical model. If Z has finite-second order moments, then µ = E[X] and Σ = cVar[X] for some c > 0, but this also defines µ and Σ in the absence of any moment assumption. This paper focuses on an alternative standardization of Z, stating that Z has mutually independent marginals with common median zero. The resulting model in (1)—the independent component (IC) model, say—is more flexible than the elliptical model, even if one restricts, as we will do, to vectors Z with symmetrically distributed marginals. The IC model indeed allows for heterogeneous marginal distributions for X, whereas, in contrast, marginals in the elliptical model all share—up to location and scale—the

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

3

same distribution, hence also the same tail weight. This severely affects the relevance of elliptical models for practical applications, particularly so for moderate to large dimensions, since it is then very unlikely that all variables share, e.g., the same tail weight. The IC model provides the most standard setup for independent component analysis (ICA), in which the mixing matrix Λ is to be estimated on the basis of n independent copies X1 , . . . , Xn of X, the objective being to recover (up to a translation) the original unobservable independent sigˆ −1 . It is nals Z1 , . . . , Zn —by premultiplying the Xi ’s with the resulting Λ well-known in ICA, however, that the mixing matrix Λ is severely unidentified : for any p × p permutation matrix P and any full-rank diagonal matrix D, one can always write (2)

˜ Z˜ + µ, X = ΛP D (P D)−1 Z + µ = Λ 





where Z˜ still has independent marginals with median zero. Provided that Z has at most one Gaussian marginal, two matrices Λ1 and Λ2 may lead to the same distribution for X in (1) if and only if they are equivalent (we will write Λ1 ∼ Λ2 ) in the sense that Λ2 = Λ1 P D for some matrices P and D as in (2); see, e.g., Theis (2004). In other words, under the assumption that Z has at most one Gaussian marginal, permutations (P ), sign changes and scale transformations (D) of the independent components are the only sources of unidentifiability for Λ. This paper considers inference on the mixing matrix Λ. More precisely, because of the identifiability issues above, we rather consider a normalized version L of Λ, where L is a well-defined representative of the class of mixing matrices that are equivalent to Λ. This parameter L is actually the parameter of interest in ICA : an estimate of L will indeed allow to recover the independent signals Z1 , . . . , Zn equally well as an estimate of any other Λ with Λ ∼ L. Interestingly, the situation is extremely similar when consider-

4

ILMONEN AND PAINDAVEINE

ing inference on Σ in the elliptical model. There, Σ is only identified up to a positive scalar factor, and it is often enough to focus on inference about the well-defined shape parameter V = Σ/(det Σ)1/p (in PCA, e.g., principal directions, proportions of explained variance, etc. can be computed from V ). Just as L is a normalized version of Λ in the IC model, V is similarly a normalized version of Σ in the elliptical model, and in both classes of models, the normalized parameters actually are the natural parameters of interest in many inference problems. The similarities further extend to the semiparametric nature of both models : just as the density gk·k of kZk in the elliptical model, the pdf gr of the various independent components Zr , r = 1, . . . , p, in the IC model, can hardly be assumed to be known in practice. These strong similarities motivate the approach we adopt in this paper : we plan to conduct inference on L in the IC model by adopting the methodology that proved extremely successful in Hallin and Paindaveine (2006) and Hallin et al. (2006) for inference on V in the elliptical model. This methodology combines semiparametric efficiency and invariance arguments. In the IC model, the fixed-(µ, Λ) nonparametric submodels (indexed by g1 , . . . , gp ) indeed enjoy a strong invariance structure that is parallel to the one of the corresponding elliptical submodels (indexed by gk·k ). As in Hallin and Paindaveine (2006) and Hallin et al. (2006), we exploit this invariance structure through a general result from Hallin and Werker (2003) that allows to derive invariant versions of efficient central sequences, on the basis of which one can define semiparametrically efficient (at fixed target densities gr = fr , r = 1, . . . , p) invariant procedures. As the maximal invariant associated with the invariance structure considered is actually the vector of marginal signed ranks of the residuals, the proposed procedures are of a signed-rank nature. Semiparametric inference actually has already been considered in Chen

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

5

and Bickel (2006). Interestingly, this paper—that focuses on estimation of the mixing matrix—classifies estimation methods into two categories : (i) the first class of methods specifies the densities of the independent components (gr = fr , r = 1, . . . , p, say), and allows to achieve parametric efficiency under correctly specified densities, but may also lead to inconsistent estimates under misspecified densities; (ii) the second class of methods, recommended in Chen and Bickel (2006), adopts a semiparametric approach. These methods require to estimate the densities of the independent components, and are uniformly semiparametrically efficient (that is, they achieve semiparametric efficiency irrespective of the underlying densities). The methods we propose in this paper actually belong to a third group. They require to specify a p-tuple of densities for the independent components, but, unlike in (i), root-n consistency holds even under misspecificied densities. Now, unlike in (ii), they achieve semiparametric efficiency under correctly specified densities only. However, it turns out that their performances do not depend much on the target p-tuple of densities, so that the overall behavior of the proposed procedures is close to achieving uniform semiparametric efficiency. Most importantly, by construction, our procedures are invariant ones and do not require to estimate densities. As such, they enjoy all nice properties usually associated with rank-based methods : distribution-freeness (for hypothesis testing), robustness, ease of computation, etc. The paper is organized as follows. In Section 2, we describe the model and fix the notation (Section 2.1), and then state the uniformly locally and asymptotically normal (ULAN) property that allows to determine semiparametric efficiency bounds (Section 2.2). In Section 3, we discuss the invariance structure of IC models (Section 3.1) and study the asymptotic properties of invariant versions of central sequences (Section 3.2). In Section 4, we fo-

6

ILMONEN AND PAINDAVEINE

cus on hypothesis testing, and derive and study the proposed one-sample signed-rank tests for the mixing matrix. In Section 5, we introduce one-step signed-rank estimators and study their asymptotic properties (Section 5.1). Our estimators actually require the delicate estimation of 2p(p − 1) “crossinformation coefficients”; we solve this issue in Section 5.2 by generalizing the method recently developed in Cassart et al. (2010), that, in its original form, allows for estimating one cross-information coefficient only. To illustrate the theory, a simulation study is provided in Section 6. Finally, the Appendix collects technical proofs. 2. The model, ULAN property, and semiparametric efficiency. 2.1. The model. As already explained, the IC model above—obtained from (1) by imposing that Z has mutually independent marginals with common median zero—suffers from severe identifiability issues for Λ, that should be solved before considering inference on the mixing matrix. We achieve this by mapping each Λ onto a unique representative L = Π(Λ) of the collection ˜ that satisfy Λ ˜ ∼ Λ (the equivalence class of Λ with of mixing matrices Λ respect to ∼). We propose the mapping Λ 7→ Π(Λ) = ΛD1+ P D2 , where D1+ is the positive definite diagonal matrix that makes each column of ΛD1+ have Euclidean norm one, P is the permutation matrix for which the matrix B = (bij ) = ΛD1+ P satisfies |bii | > |bij | for all i < j, and D2 is the diagonal matrix such that all diagonal entries of Π(Λ) = ΛD1+ P D2 are equal to one. If one restricts to the collection Mp of mixing matrices Λ for which no ties occur in the permutation step above, it can easily be shown that, for any Λ1 , Λ2 ∈ Mp , we have that Λ1 ∼ Λ2 iff Π(Λ1 ) = Π(Λ2 ), so that this mechanism succeeds in identifying a unique representative in each class of

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

7

equivalence (this is ensured with the double scaling scheme, which may seem a bit complicated at first). Besides, Π is then a continuously differentiable mapping from Mp onto M1p := Π(Mp ). While ties may always be taken care of in some way (e.g., by basing the ordering on subsequent rows of B above), they may prevent the mapping Π to be continuous, hence would cause severe problems and would prevent us from using the Delta method in the sequel. It is clear, however, that the restriction to Mp only gets rid of a few particular mixing matrices, and will not have any implications in practice. The parametrization of the IC model we consider is then associated with (3)

X = LZ + µ,

where µ ∈ Rp , L ∈ M1p , and Z has independent marginals with common median zero. Throughout, we further assume that Z admits a density with respect to the Lebesgue measure on Rp , and that it has p symmetrically distributed marginals, among which at most one is Gaussian (as explained in the Introduction, this limitation on the number of Gaussian components is needed for L to be identifiable). We will denote by F the resulting collection of densities for Z. Of course, any g ∈ F naturally factorizes into g(z) = Qp

r=1 gr (zr ),

where gr is the symmetric density of Zr .

The hypothesis under which n mutually independent observations Xi , i = 1, . . . , n are obtained from (3), where Z has density g ∈ F, will be (n)

denoted as Pϑ,g , with ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ = Rp × vecd◦ (M1p ), or (n)

alternatively, as Pµ,L,g ; for any p × p matrix A, we write vecd◦ A for the p(p − 1)-vector obtained by removing the p diagonal entries of A from its usual vectorized form vec A (diagonal entries of L are all equal to one, hence should not be included in the parameter). The resulting semiparametric model is then (4)

(n)

P (n) := ∪g∈F Pg(n) := ∪g∈F ∪ϑ∈Θ {Pϑ,g }.

8

ILMONEN AND PAINDAVEINE

As usual, performing semiparametrically efficient (at a fixed f ∈ F) infer(n)

ence on ϑ typically requires the corresponding parametric model Pf

to be

uniformly locally and asymptotically normal (ULAN). 2.2. ULAN property and semiparametric efficiency. As always, the ULAN property requires technical regularity conditions on f . In the present context, we need that each corresponding univariate pdf fr , r = 1, . . . , p, is absolutely continuous, with a derivative fr0 that satisfies σf2r :=

Z ∞ −∞

y 2 fr (y) dy < ∞,

Ifr :=

Z ∞ −∞

ϕ2fr (y)fr (y) dy < ∞,

and Jfr := where we let ϕfr :=

Z ∞ −∞

−fr0 /fr .

y 2 ϕ2fr (y)fr (y) dy < ∞,

In the sequel, we denote by Fulan the corre-

sponding collection of pdfs f . For any f ∈ Fulan , let γrs (f ) := Ifr σf2s , define the optimal p-variate location score function ϕf : Rp → Rp through z = (z1 , . . . , zp )0 7→ ϕf (z) = (ϕf1 (z1 ), . . . , ϕfp (zp ))0 , and denote by If the diagonal matrix with diagonal entries Ifr , r = 1, . . . , p. Further write I` for the `-dimensional identity matrix and define C :=

p p−1 X X

(er e0r ⊗ us e0s+δs≥r ),

r=1 s=1

where er and ur stand for the rth vectors of the canonical basis of Rp and Rp−1 , respectively, δs≥r is equal to one if s ≥ r and to zero otherwise, and ⊗ is the usual Kronecker product. The following ULAN result then easily follows from Proposition 2.1 in Oja et al. (2010) by using a simple chain rule argument.

9

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

Fix f ∈ Fulan . Then the collection of probability dis-

Proposition 2.1. tributions

(n) Pf

is ULAN, with central sequence

(5) (n)

(n) ∆ϑ,f

=

∆ϑ,f ;1

n−1/2 (L−1 )0

!

=

(n)

n−1/2 C(Ip ⊗ L−1 )0

∆ϑ,f ;2

Pn

!

i=1 ϕf (Zi )

Pn

0 i=1 vec(ϕf (Zi )Zi

, − Ip )

where Zi = Zi (ϑ) = L−1 (Xi − µ), and full-rank information matrix ΓL,f =

ΓL,f ;1

0

0

ΓL,f ;2

!

,

where ΓL,f ;1 := (L−1 )0 If L−1 and −1 0

ΓL,f ;2 := C(Ip ⊗ L

)

" p X

(Jfr − 1)(er e0r ⊗ er e0r )

r=1 p X

+

#

γsr (f )(er e0r



es e0s )

+

(er e0s



 es e0r )

(Ip ⊗ L−1 )C 0 .

r,s=1,r6=s

More precisely, for any ϑn = ϑ + O(n−1/2 ) (with ϑ = (µ0 , (vecd◦ L)0 )0 ) and 2

(n)

any bounded sequence (τn ) in Rp , we have that, under Pϑn ,f as n → ∞, (n) −1/2 τ ,f n +n n

log(dPϑ

1 (n) (n) /dPϑn ,f ) = τn0 ∆ϑn ,f − τn0 ΓL,f τn + oP (1), 2

(n)

and ∆ϑn ,f converges in distribution to a p2 -variate normal distribution with mean zero and covariance matrix ΓL,f . This ULAN result allows to derive parametric efficiency bounds at f and to construct the corresponding parametrically optimal inference procedures. When testing H0 : L = L0 against H0 : L 6= L0 , parametrically ef(n)

ficient tests are based on the asymptotic normal distribution—under Pϑ0 ,f , ϑ0 = (µ, (vecd◦ L0 )0 )0 —of ∆ϑ0 ,f ;2 , and reject the null at asymptotic level α 2 Γ−1 L0 ,f ;2 ∆ϑˆ0 ,f ;2 exceeds the α-upper quantile χp(p−1),1−α of the chi-square distribution with p(p − 1) degrees of freedom (here, ϑˆ0 stands

whenever ∆0ϑˆ

0 ,f ;2

for the vector obtained by replacing µ with an appropriate estimate in ϑ0 ).

10

ILMONEN AND PAINDAVEINE (n)

Under alternatives of the form ∪µ∈Rp Pµ,L

−1/2 τ ,f 0 +n 2

, these tests have asymp-

totic power 1 − Ψp(p−1) χ2p(p−1),1−α ; τ20 Γ−1 L0 ,f ;2 τ2 , 

(6)

where Ψp(p−1) ( · ; δ) stands for the cumulative distribution function of the non-central χ2p(p−1) distribution with non-centrality parameter δ. This settles the parametrically optimal (at f ) performance for hypothesis testing. As for ˆ is parametrically efficient at f iff point estimation, an estimator L √  L ˆ − L) → n vecd◦ (L Np(p−1) 0, Γ−1 L,f ;2 .

(7)

In practice, of course, the underlying density f is unspecified, which leads to considering the semiparametric model P (n) . In this model, semiparametrically efficient (still at f ) inference procedures are based on the efficient (n)

central sequence ∆∗ϑ,f resulting from ∆ϑ,f by performing adequate tangent (n)

space projections; see Bickel et al. (1993). Under Pϑ,f , the efficient central sequence ∆∗ϑ,f typically is still asymptotically normal with mean zero, but now with covariance matrix Γ∗L,f ;2 (the efficient information matrix at f ). This information matrix Γ∗L,f ;2 is smaller than or equal to its parametric counterpart ΓL,f ;2 (as usual, in the sense that ΓL,f ;2 − Γ∗L,f ;2 is positive semi-definite), and the difference measures the cost of the unspecification of f when performing inference on L. In relation with this, Γ∗L,f ;2 determines semiparametric efficiency bounds; more precisely, a test is semiparametrically efficient at f if its asymptotic powers under the same alternatives as above are given by (8)

1 − Ψp(p−1) χ2p(p−1),1−α ; τ20 (Γ∗L0 ,f ;2 )−1 τ2 , 

ˆ is semiparametrically efficient at f if and only if whereas an estimator L (9)



L ˆ − L) → n vecd◦ (L Np(p−1) 0, (Γ∗L,f ;2 )−1 .



RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

11

The semiparametrically optimal behavior in (8)-(9) is to be compared to the parametrically optimal one in (6)-(7). 3. Invariance, ranks, and semiparametric efficiency. As already mentioned, semiparametrically optimal procedures are based on the efficient central sequence ∆∗ϑ,f . Classically, ∆∗ϑ,f is obtained by performing tangent space computations, which is the approach adopted in Chen and Bickel (2006). When, however, the semiparametric model at hand enjoys a strong invariance structure, the efficient central sequence ∆∗ϑ,f —or more precisely, a version of this central sequence, since central sequences are always defined up to oP (1)’s only—can alternatively be obtained by conditioning the original central sequence ∆ϑ,f with respect to the corresponding maximal invariant quantities; see Hallin and Werker (2003). When derived in this way, efficient central sequences are themselves invariant, and thus comply with the invariance principle. As we will show, the resulting inference procedures are of a rank-based nature, hence can be expected to enjoy all nice properties usually associated with rank-based methods : distribution-freeness (for hypothesis testing), robustness, ease of computation, etc. 3.1. The relevant invariance structure. In order to describe the invariance structure that is relevant in the present context, consider the collection H of all transformations h : Rp → Rp of the form h((z1 , . . . , zp )0 ) = (h1 (z1 ), . . . , hp (zp ))0 , where each hr , r = 1, . . . , p, is continuous, odd, monotone increasing, and fixes +∞. Then the (nonparametric) fixed-ϑ submodels (10)

(n)



(n)

:= ∪g∈F {Pϑ,g },

ϑ ∈ Θ,

are invariant under the group G ϑ = {ghϑ , h ∈ H}, ◦ of componentwise monotone increasing transformations ghϑ : Rp × . . . × Rp → Rp × . . . × Rp (x1 , . . . , xn )

7→ (Lh(z1 (ϑ)) + µ, . . . , Lh(zn (ϑ)) + µ),

12

ILMONEN AND PAINDAVEINE

where we let zi (ϑ) := L−1 (xi − µ). In such a situation, the invariance principle suggests that inference should be based on the corresponding maximal invariant, which, in the present setup, is (S1 (ϑ), . . . , Sn (ϑ), R1+ (ϑ), . . . , Rn+ (ϑ)),

(11)

+ + with Si (ϑ) = (Si1 (ϑ), . . . , Sip (ϑ))0 and Ri+ (ϑ) = (Ri1 (ϑ), . . . , Rip (ϑ))0 , where + Sir (ϑ) is the sign of Zir (ϑ) = (L−1 (Xi −µ))r and Rir (ϑ) is the rank of |Zir (ϑ)|

among |Z1r (ϑ)|, . . . , |Znr (ϑ)|. This is what leads to considering signed-rank inference procedures when performing inference on the mixing matrix L. A pleasant corollary from this invariance structure, which follows from the (n)

fact that the group G ϑ actually generates the corresponding submodel Pϑ (n)

in (10), states that signed-rank statistics are distribution-free in Pϑ . 3.2. A rank-based efficient central sequence. The main result of Hallin and Werker (2003) shows that conditioning the original parametric efficient central sequence on signed ranks provides a version of the efficient central (n)

(n)

sequence at f . More precisely, denoting by Eϑ,f expectation under Pϑ,f , (n)

Eϑ,f [∆ϑ,f ;2 | S1 (ϑ), . . . , Sn (ϑ), R1+ (ϑ), . . . , Rn+ (ϑ)] = ∆∗ϑ,f + oL2 (1) (n)

as n → ∞, under Pϑ,f . In order to obtain an explicit expression for this rank-based central sequence (in the sequel, we often write rank-based instead of the heavier signedrank-based ), define, for any ϑ ∈ Θ and f ∈ Fulan , T ϑ,f := n 1 X Ri+ (ϑ) odiag √ Si (ϑ) ϕf F+−1 n i=1 n+1

"









Si (ϑ)

F+−1

 +   Ri (ϑ) 0

n+1

,

where odiag(A) denotes the matrix obtained from A by replacing all diagonal entries with zeros, is the Hadamard (i.e., entrywise) product of two

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

13

vectors, and where z 7→ F+ (z) = (F+1 (z1 ), . . . , F+r (zp ))0 , with F+r (t) := (n)

Pϑ,f [|Zr (ϑ)| < t] = 2(

Rt

−∞ fr (s) ds) − 1

for t ≥ 0. We then have the following

result. Fix ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ and f ∈ Fulan . Then

Theorem 3.1.

∆∗ϑ,f ;2 := C(Ip ⊗ L−1 )0 vec T ϑ,f (n)

(n)

=

Eϑ,f [∆ϑ,f ;2 | S1 (ϑ), . . . , Sn (ϑ), R1+ (ϑ), . . . , Rn+ (ϑ)] + oL2 (1)

=

∆∗ϑ,f + oL2 (1) (n)

as n → ∞, under Pϑ,f . As explained above, invariant semiparametrically optimal (at f ) inference on L can be based on statistics measurable in ∆∗ϑ,f ;2 . To choose the appropriate statistics and to investigate the asymptotic properties of the resulting invariant procedures, we will need Theorem 3.2 below, whose statement requires to introduce the following notation. Define (12)

Γ∗L,f,g;2 := C(Ip ⊗ L−1 )0 Gf,g (Ip ⊗ L−1 )C 0 := C(Ip ⊗ L−1 )0

"

×

p X

#

γsr (f, g)(er e0r



es e0s )

+

ρrs (f, g)(er e0s



 es e0r )

(Ip ⊗ L−1 )C 0 ,

r,s=1,r6=s

where we let Z 1

(13) γrs (f, g) := 0

ϕfr (Fr−1 (u)) ϕgr (G−1 r (u)) du

×

Z 1 0

Fs−1 (u) G−1 s (u) du

and Z 1

(14) ρrs (f, g) := 0

Fr−1 (u) ϕgr (G−1 r (u)) du

×

Z 1 0

ϕfs (Fs−1 (u)) G−1 s (u) du.

We also let Γ∗L,f ;2 := Γ∗L,f,f ;2 and Gf := Gf,f , that involve γrs (f, f ) = γrs (f ) and ρrs (f, f ) = 1. We then have the following result.

14

ILMONEN AND PAINDAVEINE

Fix ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ and f ∈ Fulan . Then (i) for

Theorem 3.2. any g ∈ F,

∆∗ϑ,f ;2 = ∆∗ϑ,f,g;2 + oL2 (1) (n)

Pn √1 i=1 n (n) Pϑ+n−1/2 τ,g ,

as n → ∞, under Pϑ,g , where ∆∗ϑ,f,g;2 := C(Ip ⊗ L−1 )0 vec odiag 

(Si ϕf (F+−1 (G+ (|Zi |))))(Si F+−1 (G+ (|Zi |)))0 . (ii) Under 

with τ = (τ10 , τ20 )0 ∈ Rp × Rp(p−1) and g ∈ Fulan , L

∆∗ϑ,f ;2 → Np(p−1) (Γ∗L,f,g;2 τ2 , Γ∗L,f ;2 ), as n → ∞ (for τ = 0, the result only requires that g ∈ F). (iii) Still with τ = (τ10 , τ20 )0 ∈ Rp × Rp(p−1) and g ∈ Fulan , ∆∗ϑ+n−1/2 τ,f ;2 − ∆∗ϑ,f ;2 = (n)

−Γ∗L,f,g;2 τ2 + oP (1) as n → ∞, under Pϑ,g . Both for hypothesis testing and point estimation, we will need estimating the parameter ϑ by some root-n consistent estimator ϑˇ(n) and controlling the asymptotic behavior of the resulting aligned central sequence ∆∗ϑˇ(n) ,f ;2 . Such a control readily follows from Theorem 3.2(iii), provided that one restricts (n) to locally asymptotically discrete sequences of estimators (recall that (ϑˇ ) #

is said to be locally asymptotically discrete if the number of possible values (n) of ϑˇ in balls with O(n−1/2 ) radius centered at ϑ is bounded as n → ∞). #

For such estimators, Lemma 4.4 from Kreiss (1987) indeed allows to replace τ in Theorem 3.2(iii) with the random vector n1/2 (ϑˇ(n) − ϑ). More precisely, we have the following result. Corollary 3.1.

Fix ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ and f, g ∈ Fulan . Let ϑˇ# =

(n) ˇ # )0 )0 be a locally asymptotically discrete sequence of ranϑˇ# = (ˇ µ0# , (vecd◦ L (n) dom vectors satisfying n1/2 (ϑˇ# − ϑ) = OP (1) as n → ∞, under P . Then ϑ,g

∆∗ϑˇ ,f ;2 # under

− ∆∗ϑ,f ;2 (n) Pϑ,g .

=

ˇ# −Γ∗L,f,g;2 n1/2 vecd◦ (L

− L) + oP (1), still as n → ∞,

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

15

Note that a sequence of estimators ϑˇ(n) can always be discretized by let(n) ting (ϑˇ )` := (cn1/2 )−1 sign((ϑˇ(n) )` )dcn1/2 |(ϑˇ(n) )` |e, ` = 1, . . . , p2 , for some #

arbitrary constant c > 0. Subscripts

#

in the sequel are used for estimators

that are locally asymptotically discrete. It should be noted, however, that this discreteness property has no implications in practice, where n is fixed, as c can be chosen arbitrarily large. 4. Hypothesis testing. In this section, we consider the problem of testing the null hypothesis H0 : L = L0 against the alternative H1 : L 6= L0 , with unspecified underlying density g. 4.1. The proposed test statistics. The objective here is to define a test that is semiparametrically efficient at some target density f , yet that remains valid—in the sense that it meets asymptotically the level constraint—under a very broad class of densities g. As we will show, this objective is achieved by the signed-rank test— φf , say—that rejects H0 at asymptotic level α ∈ (0, 1) whenever Qf := (∆∗ϑˆ

(15)

0# ,f ;2

)0 (Γ∗L0 ,f ;2 )−1 ∆∗ϑˆ

0# ,f ;2

> χ2p(p−1),1−α ,

ˆ# that where ϑˆ0# = (ˆ µ0# , (vecd◦ L0 )0 )0 is based on a sequence of estimators µ is root-n consistent (under the null) and locally asymptotically discrete. Possible choices for µ ˆ# include (discretized versions of) the sample mean ¯ := X

1 n

Pn

i=1 Xi

or the transformation-retransformation componentwise me-

−1 dian µ ˆMed := L0 Med[L−1 0 X1 , . . . , L0 Xn ], where Med[·] returns the vector

of univariate medians. We favor the sign estimator µ ˆMed , since it is very much in line with the signed-rank test φf and enjoys good robustness properties. However, we stress that, provided that µ ˆ# is indeed root-n consistent and locally asymptotically discrete, the asymptotic properties of φf , at any g ∈ Fulan , are not affected by the choice of µ ˆ# . This actually fol-

16

ILMONEN AND PAINDAVEINE

lows from Theorem 4.2 below, that states the asymptotic properties of the proposed tests. Before considering asymptotic properties, we first derive a simple and explicit expression of the proposed signed-rank statistic Qf , which actually requires the following result. Lemma 4.1.

Fix ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ and f, g ∈ Fulan . Then

(Ip ⊗ L−1 )C 0 (Γ∗L,f,g;2 )−1 C(Ip ⊗ L−1 )0 = p X

n

αrs (f, g) er e0r ⊗ L2rs er e0r + es e0s − Lrs er e0s − Lrs es e0r



r,s=1,r6=s

+βrs (f, g) er e0s ⊗ Lrs Lsr er e0s − Lrs er e0r − Lsr es e0s + es e0r

o

,

where we let  γrs (f, g)   α (f, g) :=    rs γrs (f, g)γsr (f, g) − ρrs (f, g)ρsr (f, g)

(16)

     βrs (f, g) :=

−ρrs (f, g) , γrs (f, g)γsr (f, g) − ρrs (f, g)ρsr (f, g)

and where Lrs denotes the entry (r, s) of L. As announced above, this lemma allows to rewrite the proposed test statistics in a more explicit way. Theorem 4.1. Qf (17)

Fix f ∈ Fulan . Then the test statistic Qf rewrites

= (vec T ϑˆ0# ,f )0 Mf (vec T ϑˆ0# ,f ) =

p X

αrs (f )(T ϑˆ0# ,f )2sr + βrs (f )(T ϑˆ0# ,f )rs (T ϑˆ0# ,f )sr , 

r,s=1,r6=s

where we let Mf = Mf,f , αrs (f ) = αrs (f, f ), and βrs (f ) = βrs (f, f ). It can easily be checked that, if one lets fσ (z) :=

Qp

−1 r=1 σr fr (zr /σr ),

p with σ = (σ1 , . . . , σp ) ∈ (R+ 0 ) , then we have φf = φf . Hence, describing a σ

17

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

particular test φf in the class of proposed signed-rank tests {φf , f ∈ Fulan } may be done by providing only the p corresponding target density types— that is, the p densities fr , r = 1, . . . , p up to their scales. This will be used in the simulations of Section 6, and has implications on the optimality properties of the proposed tests (see below). 4.2. Asymptotic properties. The following result states the main asymptotic properties of the proposed signed-rank tests. (n)

(n) −1/2 τ,g , 0 +n

Theorem 4.2. Fix f ∈ Fulan . Then (i) under Pϑ0 ,g and under Pϑ

with ϑ0 = (µ0 , (vecd◦ L0 )0 )0 , τ = (τ10 , τ20 )0 ∈ Rp × Rp(p−1) , and g ∈ Fulan , L

L

Qf → χ2p(p−1) and Qf → χ2p(p−1) (τ20 (Γ∗L0 ,f,g;2 )0 (Γ∗L0 ,f ;2 )−1 Γ∗L0 ,f,g;2 τ2 ), (n)

respectively, as n → ∞. (ii) The sequence of tests φf (n)

has asymptotic level α (n)

under ∪µ∈Rp ∪g∈Fulan {Pµ,L0 ,g }. (iii) The sequence of tests φf

is semipara-

metrically efficient, still at asymptotic level α, when testing H0 : L = L0 against H1f : L 6= L0 with noise density f (i.e., when testing ∪µ∈Rp ∪g∈Fulan (n)

(n)

{Pµ,L0 ,g } against ∪µ∈Rp ∪L∈M1p \{L0 } {Pµ,L,f }). We stress that the test φf does not assume that the joint density of the vector of independent components is known to be f , as the notation might suggest. Here, f is only the density at which optimality is sought but the test φf remains asymptotically valid under a very broad range of densities, namely under any g ∈ Fulan . Actually, the validity can even be extended to any g ∈ F, which then allows to get rid of any finite moment condition—this can be achieved along the same lines as in Oja et al. (2010) (Lemma 4.2) and merely requires to adopt f -score rank-based estimators for µ ˆ# . As pointed out in Section 4.1, we have that φf = φf for any fσ (z) := σ

Qp

−1 r=1 σr fr (zr /σr ).

Of course, this implies that φf achieves optimality at any

18

ILMONEN AND PAINDAVEINE

such fσ : optimality therefore does not require to specify correctly densities, but rather only density types. Still, the choice of f , of course, might seem quite arbitrary. This is actually quite standard in rank-based inference: most procedures based on (signed) ranks involve score functions to be chosen by the practitioner, and that will generate sign tests, Wilcoxon (signed)-rank tests, etc. The choice of the target densities is based on the practitioner’s prior belief on the underlying densities. If he/she has no such prior belief, estimated (e.g., by kernel) densities fˆr can also be used as “target densities”. It is important to realize that using such data-driven target densities will not affect the asymptotic properties stated in Theorem 4.2 above, because kernel density estimators are measurable with respect to the order statistics of the |Zir (ϑˆ0# )|’s, that, asymptotically, are stochastically independent of the signed ranks Sir (ϑˆ0# ), R+ (ϑˆ0# ) used in φ ; see Hallin and Werker (2003) ir

f

for details. In the present context, one should not worry too much about the choice of f , since the performances of φf do not depend much on f ; see Section 6. Our tests are therefore close to achieving uniform semiparametric efficiency (note that, as shown in Amari (2002), strictly achieving uniform semiparametric efficiency does require to estimate the densities of the independent components). We end this section by mentioning that Theorem 4.2 allows to measure the asymptotic powers of φf under sequences of local alternatives of the (n)

form Pµ,L

0 +n

−1/2 H,g

, where H is an arbitrary p × p matrix with zero diago-

nal entries (only such a H provides a perturbed mixing matrix L0 + n−1/2 H that belongs—for n large enough—to the parameter space M1p ). The corresponding asymptotic powers are given by 1 − Ψp(p−1) χ2p(p−1),1−α ; (vecd◦ H)0 (Γ∗L0 ,f,g;2 )0 (Γ∗L0 ,f ;2 )−1 Γ∗L0 ,f,g;2 (vecd◦ H) , 

where Ψ` ( · ; δ) was defined in page 10. By using the fact that C 0 (vecd◦ H) =

19

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

(vec H) and then applying Lemma 4.1, this noncentrality parameter, after painful yet straightforward computations, simplifies to p X

(18)

−1 −1 2 ξrs (f, g) ((L−1 0 H)sr ) + ηrs (f, g) (L0 H)rs (L0 H)sr ,



r,s=1,r6=s

with ξrs (f, g) =

2 (f, g) + ρ2 (f, g)γ (f ) − 2ρ (f, g)γ (f, g) γrs (f )γsr sr rs sr rs γrs (f )γsr (f ) − 1

and ηrs (f, g) =

ρsr (f, g)(γrs (f )γsr (f, g) − ρrs (f, g)) γrs (f )γsr (f ) − 1 +

At g = f , this reduces to

γrs (f, g)(γsr (f )ρrs (f, g) − γsr (f, g)) . γrs (f )γsr (f ) − 1

Pp

−1 −1 −1 2 r,s=1,r6=s (γsr (f ) ((L0 H)sr ) +(L0 H)rs (L0 H)sr ).

In the simulations of Section 6, we will compare the ranking of finite-sample rejection frequencies associated with various tests φf with the corresponding theoretical ranking derived from (18). Finally, note that, for any f ∈ Fulan , the test φf remains valid even if there are more than one Gaussian independent components (the target density f should contain at most one Gaussian density fr , though). 5. Point estimation. We turn to the problem of estimating the mixing matrix L, which is more complicated—but also more important—than the hypothesis testing problem considered in the previous section. Denoting by Qf = Qf (L0 ) the signed-rank test statistic for H0 : L = L0 in (15), a natural signed-rank estimator of L is obtained by “inverting the corresponding test”: ˆ f ;argmin = arg min Q (L). L f L∈M1p

This estimator, however, is not satisfactory: as any signed-rank-based quantity, the objective function L 7→ Qf (L) is piecewise constant, hence discon-

20

ILMONEN AND PAINDAVEINE

tinuous and non-convex, which makes it very difficult to derive the asympˆ f ;argmin . It is also virtually impossible to compute L ˆ f ;argmin totic properties of L in practice, since this lack of smoothness and convexity essentially forces computing the estimator by simply running over a grid of possible values of the p(p−1)-dimensional parameter L—a strategy that cannot provide a reaˆ f ;argmin , even for moderate values of p. Finally, sonable approximation of L ˆ f ;argmin , there is no way to estimate the asymptotic covariance matrix of L which rules out the possibility to derive confidence zones for L, hence drastically restricts the practical relevance of this estimator. We therefore propose an alternative solution that can be thought as a onestep version of the estimator above, and that does not suffer any of the aforementioned drawbacks. Our one-step signed-rank estimators—in the sequel, we simply speak of one-step rank estimators or one-step R-estimators—can easily be computed in practice, their asymptotic properties can be derived explicitly, and their asymptotic covariance matrix can be estimated consistently. 5.1. One-step R-estimators of L. To initiate the proposed one-step procedure, a preliminary estimator of the parameter is needed. In the present context, we will assume that a root-n consistent and locally asymptotically ˜ # )0 )0 is available. As we will show, discrete estimator ϑ˜# = (˜ µ0 , (vecd◦ L #

the asymptotic properties of the proposed one-step R-estimators will not be affected by the choice of ϑ˜# . Practical choices will be provided in Section 6. Describing our one-step R-estimators requires Assumption (A). For all r 6= s ∈ {1, . . . , p}, we dispose of sequences of estimators γˆrs# (f ) and ρˆrs# (f ) that (i) are locally asymptotically discrete and (ii) satisfy γˆrs# (f ) = γrs (f, g) + oP (1) and ρˆrs# (f ) = ρrs (f, g) + oP (1) (n)

as n → ∞, under ∪ϑ∈Θ ∪g∈Fulan {Pϑ,g }.

21

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

Sequences of estimators fulfilling this assumption will be provided in Section 5.2 below. At this point, just note that plugging in (12) the estimators ˜ # , defines a statistic— from Assumption (A) and the preliminary estimator L ˆ∗ Γ ˜ L

(n)

# ,f ;2

, say—that consistently estimates Γ∗L,f,g;2 under ∪ϑ∈Θ {Pϑ,g }. Inci-

dentally, the estimators γˆrs# (f ) and ρˆrs# (f )—through (16)—similarly yield consistent estimators α ˆ rs# (f ) and βˆrs# (f ) for the quantities αrs (f, g) and βrs (f, g), respectively. We are now ready to describe our one-step R-estimators. Again, let us ˆ that is semiparametrically assume that we want to define an estimator L efficient at f ∈ Fulan , which (see Section 2.2) means that √

(19)

L ˆ − L) → n vecd◦ (L Np(p−1) 0, (Γ∗L,f ;2 )−1



(n)

as n → ∞, under ∪µ∈Rp {Pµ,L,f }. The one-step R-estimator we propose is ˆ f # (with values in M1p ) defined through then the statistic L ˜ # ) + n−1/2 (Γ ˆ ∗˜ ˆ f # = (vecd◦ L vecd◦ L L

(20)

# ,f ;2

ˆ∗ where Γ ˜ L

# ,f ;2

)−1 ∆∗ϑ˜

# ,f ;2

,

is the consistent estimate of Γ∗L,f,g;2 just defined. The following

result states the asymptotic properties of this estimator. (n)

Theorem 5.1. Fix f ∈ Fulan . Then (i) under Pϑ,g , with ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ and g ∈ Fulan , we have that, as n → ∞, √

=

C 0 (Γ∗L,f,g;2 )−1 ∆∗ϑ,f ;2 + oP (1)

(22)

=

C 0 (Γ∗L,f,g;2 )−1 ∆∗ϑ,f,g;2 + oP (1)

(23)

→ Np(p−1) 0, C 0 (Γ∗L,f,g;2 )−1 Γ∗L,f ;2 (Γ∗L,f,g;2 )−10 C .

(21)

ˆ f # − L) n vec(L

L



ˆ f # is semiparametrically efficient at f . (ii) The estimator L ˆ f # an R-estimator since it shows The result in (21) justifies calling L ˆ f # − L) is asymptotically equivalent to a random matrix that that n1/2 (L

22

ILMONEN AND PAINDAVEINE

is measurable with respect to the signed ranks Si (ϑ), Ri+ (ϑ) in (11). The asymptotic equivalence in (22) gives a Bahadur-type representation result ˆ f # with summands that are independent and identically distributed, for L hence leads trivially to the asymptotic normality result in (23). Recalling ˆ∗ that Γ ˜ L

# ,f ;2

(n)

consistently estimates Γ∗L,f,g;2 under ∪ϑ∈Θ {Pϑ,g }, it is clear that

asymptotic (signed-rank) confidence zones for L may easily be obtained from this asymptotic normality result. Parallel to hypothesis testing, Lemma 4.1 allows for a simple and explicit expression of the proposed estimators. ˆf # = (Aˆ0 T ˜ ) + (Bˆ0 Fix f ∈ Fulan . Let N f# f# ϑ# ,f ˆ ˆ ˆ ), where we let Af # = (ˆ αrs# (f )) and Bf # = (βrs# (f )), with α ˆ rr# (f ) :=

Theorem 5.2. T 0ϑ˜

# ,f

ˆ f # rewrites 0 =: βˆrr# (f ), r = 1, . . . , p. Then the estimator L (24)

  ˆf # ) , ˆf# = L ˆf # − diag(L ˜ #N ˜ # + √1 L ˜# N L n

where diag(A) = A − odiag(A) stands for the diagonal matrix with the same diagonal entries as A. The simple expression in (24) shows even more clearly that the proˆ f # is a one-step improvement of the preliminary estiposed estimator L ˜ # . It can be checked straightforwardly that the role of the term mator L ˜ # diag(L ˜ #N ˆf # ) in the one-step correction − √1n L

˜ √1 L n #

ˆf # −diag(L ˜ #N ˆf # ) N





ˆ f # remain equal to one, is merely to ensure that the diagonal entries of L ˆ f # takes values in M1p (for n large enough). hence that L ˆ f # enjoy very nice properties: their As shown above, the estimators L asymptotic behavior is completely characterized, they are semiparametrically efficient under correctly specified densities, yet remain root-n consistent and asymptotically normal under a broad range of densities g, their asymptotic covariance matrix can easily be estimated consistently, etc. However,

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

23

their implementation requires to define estimates γˆrs# (f ) and ρˆrs# (f ) that fulfill Assumption (A). We now provide such estimates. 5.2. Estimation of cross-information coefficients. Of course, it is always possible to estimate consistently the cross-information coefficients γrs (f, g) and ρrs (f, g) by replacing g in (13)-(14) with appropriate window or kernel density estimates—this can be achieved since the residuals Zir (ϑ˜# ), i = 1, . . . , n typically are asymptotically i.i.d. with density gr . Rank-based methods, however, intend to eliminate—through invariance arguments—the nuisance g without estimating it, so that density estimation methods simply are antinomic to the spirit of rank-based methods : if estimated densities are to be used, indeed, using them all the way by basing semiparametrically efficient estimates on estimated scores as in Chen and Bickel (2006) seems more coherent than considering ranks. Therefore, we rather propose a solution that is based on ranks and avoids estimating the underlying nuisance g. The method, that relies on the asymptotic linearity—under g—of an appropriate rank-based statistic S ϑ,f , was first used in Hallin et al. (2006), where there is only one cross-information coefficient J(f, g) to be estimated. There, it is crucial that J(f, g) is involved as a scalar factor in the asymptotic covariance matrix, under g, between the rank-based central sequence ∆ϑ,f and the parametric central sequence ∆ϑ,g . Cassart et al. (2010) extended the method to allow for the estimation of a cross-information coefficient that appears as a scalar factor in the linear term of the asymptotic linearity, under g, of an arbitrary (possibly vector-valued) rank-based statistic S ϑ,f . In all cases, thus, this method was only used to estimate a single crossinformation coefficient that appears as a scalar factor in some structural— typically, cross-information—matrix. In this respect, our problem, which requires to estimate 2p(p − 1) cross-information quantities appearing in var-

24

ILMONEN AND PAINDAVEINE

ious entries of the cross-information matrix Γ∗L,f,g;2 , is much more complex. Yet, as we now show, it allows for a solution relying on the same basic idea of exploiting the asymptotic linearity, under g, of an appropriate f -score rank-based statistic. ˜ # )0 )0 at hand, Based on the preliminary estimator ϑ˜# := (˜ µ0# , (vecd◦ L ˜ γrs )0 )0 , λ ≥ 0, with define ϑ˜γrs := (˜ µ0 , (vecd◦ L λ#

#

λ#

˜ # (er e0s − diag(L ˜ # er e0s )), ˜ γrs := L ˜ # + n−1/2 λ(T ˜ )rs L L λ# ϑ# ,f rs ˜ ρrs )0 )0 , λ ≥ 0, with and ϑ˜ρλ# := (˜ µ0# , (vecd◦ L λ#

˜ ρrs := L ˜ # + n−1/2 λ(T ˜ )sr L ˜ # (er e0 − diag(L ˜ # er e0 )); L s s λ# ϑ# ,f rs rs note that, at λ = 0, ϑ˜γλ# = ϑ˜ρλ# = ϑ˜# We then have the following result,

that is crucial for the construction of our estimators γˆrs# (f ) and ρˆrs# (f ). Fix ϑ ∈ Θ, f, g ∈ Fulan , and r 6= s ∈ {1, . . . , p}. Then

Lemma 5.1.

hγ#rs (λ) := (T ϑ˜# ,f )rs (T ϑ˜γrs ,f )rs = (1 − λγrs (f, g)) ((T ϑ˜# ,f )rs )2 + oP (1) and λ#

hρ#rs (λ) := (T ϑ˜# ,f )sr (T ϑ˜ρrs ,f )sr = (1−λρrs (f, g)) ((T ϑ˜# ,f )sr )2 +oP (1) as n → λ#

∞, under

(n) Pϑ,g .

The mappings λ 7→ hγ#rs (λ) and λ 7→ hρ#rs (λ) assume a positive value in λ = 0, and, as shown by Lemma 5.1, are—up to oP (1)’s as n → ∞ (n)

under Pϑ,g —monotone decreasing functions that become negative at λ = (γrs (f, g))−1 and λ = (ρrs (f, g))−1 , respectively. Restricting to a grid of values of the form λj = j/c for some large discretization constant c (which is needed to achieve the required discreteness), this naturally leads—via linear interpolation—to the estimators γˆrs# (f ) and ρˆrs# (f ) defined through (25)

−1

(ˆ γrs# (f ))

:= λγrs # :=

λ− γrs #

=

λ− γrs #

+

+

γrs − − (λ+ γrs # − λγrs # )h# (λγrs # ) γrs + hγ#rs (λ− γrs # ) − h# (λγrs # )

c−1 hγ#rs (λ− γrs # ) γrs + hγ#rs (λ− γrs # ) − h# (λγrs # )

,

25

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS γrs + − 1 with λ− γrs # := inf{j ∈ N : h# (λj+1 ) < 0} and λγrs # := λγrs # + c , and

(26)

−1

(ˆ ρrs# (f ))

:= λρrs # :=

λ− ρrs #

+

c−1 hρ#rs (λ− ρrs # ) ρrs + hρ#rs (λ− ρrs # ) − h# (λρrs # )

,

ρrs + − 1 with λ− ρrs # := inf{j ∈ N : h# (λj+1 ) < 0} and λρrs # := λρrs # + c . Provided that the preliminary estimator ϑ˜# asymptotically leaves all off-diagonal en-

tries of T ϑ˜# ,f bounded away from zero (recall that the diagonal entries of T ϑ˜# ,f are exactly equal to zero), the estimators γˆrs# (f ) and ρˆrs# (f ) in (25)-(26) then satisfy Assumption (A). More precisely, we have the following result. Theorem 5.3. Fix ϑ ∈ Θ and f, g ∈ Fulan . Assume that the preliminary estimator ϑ˜# is such that, for all ε > 0, there exist δε and Nε such that (27)

(n) 



Pϑ,g (T ϑ˜# ,f )rs ≥ δε ≥ 1 − ε,

for all n ≥ Nε , r 6= s ∈ {1, . . . , p}. Then, for any such r, s, γˆrs# (f ) = (n)

γrs (f, g) + oP (1) and ρˆrs# (f ) = ρrs (f, g) + oP (1), as n → ∞ under Pϑ,g . Theorem 5.3 confirms the consistency of the estimators γˆrs# (f ) and ρˆrs# (f ) defined above. A finite-sample illustration in the bivariate case p = 2 for the 2p(p − 1) = 4 cross-information coefficients γrs (f, g) and ρrs (f, g) to be estimated will be given in Figure 3; see Section 6. We point out that the assumption in (27) is extremely mild, as it only requires that there is no couple (r, s), r 6= s, for which (T ϑ˜# ,f )rs asymptotically has an atom in zero. It therefore rules out preliminary estimators ϑ˜# defined as the solution of the (rank-based) f -likelihood equation (T ϑ,f )rs = 0. The preliminary estimators we use in Section 6 satisfy this technical assumption. 6. Simulations. We performed simulations for both hypothesis testing and point estimation.

26

ILMONEN AND PAINDAVEINE

6.1. Hypothesis testing. We considered the trivariate case p = 3 and concentrated on the particular case for which the null value of L is L0 = Ip . For three trivariate densities of the form z 7→ g(z) = g (d) (z) = {1, 2, 3}, we generated M = 5, 000 independent random (d,m)

(Zi1

(d,m)

, Zi2

(d,m) 0 ),

, Zi3

(d) r=1 gr (zr ), d ∈ (d,m) samples Zi =

Q3

i = 1, . . . , n, m = 1, . . . , M, of size n = 500. The

pdfs g (d) have the following marginals: (d)

(d)

(i) In Setup d = 1, g1 , g2

(d)

and g3

are the pdfs of the standard normal

distribution (N ), the Student distribution with 6 degrees of freedom (t6 ), and the beta distribution with parameters 3 and 3 (β3,3 ), respectively; (d)

(d)

(d)

(ii) In Setup d = 2, g1 is t6 , g2 is β3,3 , and g3 is the pdf of the doubleexponential distribution with scale parameter one (d-exp); (d)

(d)

(d)

(iii) In Setup d = 3, g1 is t6 , g2 is d-exp, and g3 is the pdf of the logistic distribution with scale parameter one (log). We then generated samples of n observations X1 , . . . , Xn according to (d,m)

(28)

Xi

(d,m)

= (L0 + a ξ (d) H) Zi

+ µ,

with a = 0, 1, 2, 3, 4, ξ (1) ! = ξ (2) (3) ξ

.002 ! .007 , H = .0025

0 1 2 ! 1 0 3 , and µ = 2 2 0

0 ! 0 . 0

Clearly, these samples correspond to the null hypothesis for a = 0 and to increasingly severe alternatives for a = 1, 2, 3, 4. The quantities ξ (d) were chosen in such a way that the rejection frequencies obtained for a = 4 were approximately .95 for all d. All samples were subjected, at asymptotic level α = 5%, to the signed-rank tests φf (j) , j = 1, 2, 3, 4, where f (j) = g (j) for j = 1, 2, 3, and where f (4) uses a t3 pdf for each marginal density. The first three tests therefore achieve asymptotic optimality in Setups 1 to 3,

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

27

respectively. In all tests, the location estimate µ ˆ# is the componentwise median defined in page 15. Rejection frequencies are plotted against a in the first column of Figure 1. These rejection frequencies indicate that, when based on their asymptotic chi-square critical values, the signed-rank tests are conservative and significantly biased at the sample size considered. In order to remedy this, we also implemented versions of each of the signed-rank procedures based on estimations of the (distribution-free) quantile of the test statistic under known parameter values µ and L0 . These estimations, just as the asymptotic chisquare quantile, are consistent approximations of the corresponding exact quantiles under the null, and were obtained, for each of the four tests above, (n)

as the empirical 0.05-upper quantiles q.95 of each signed-rank test statistic (n)

in a collection of 106 simulated multinormal samples, yielding q.95 = 10.34, 11.56, 10.88, and 9.74, respectively. These bias-corrected critical values are all smaller than the asymptotic chi-square one χ26;.95 = 12.60, so that the resulting tests are uniformly less conservative than the original ones. The resulting rejection frequencies are plotted in the second column of Figure 1, where it is readily seen that all tests now are roughly unbiased. At the sample size n = 500, the asymptotic properties derived in Section 4 do not show so clearly in the simulation results, not only because the signedrank tests are biased, but also because the test φf (d) does not seem to be the most powerful one in Setup d. To question correctness of our asymptotic results, we reran the same simulation as above, but now with n = 10, 000 and with (ξ (1) , ξ (2) , ξ (3) )0 divided by

p

10, 000/500. The resulting simulated

(n)

critical values are given by q.95 = 11.59, 12.38, 11.83, and 11.46, respectively, and are all much closer to the asymptotic one χ26;.95 = 12.60, so that the signed-rank tests, in their asymptotic versions, may only suffer a small bias for this large sample size. Consequently, it is justified to restrict to these

28

ILMONEN AND PAINDAVEINE

asymptotic versions. The corresponding rejection frequencies are plotted in the last column of Figure 1 and confirm, under any g (d) , d = 1, 2, 3, both the optimality of φf (d) and—more generally—the whole ranking of the local asymptotic powers of φf (j) , j = 1, 2, 3, 4, which can be obtained from (18). Finally, we point out that, for each fixed sample size, setup, and type of critical values considered, the performances of the various signed-rank tests are very similar. This implies that one should not worry to much about the choice of the target density f . 6.2. Point estimation. Before describing the simulations conducted for ˜ we the point estimation problem, we present the preliminary estimators L used there, which are based on Oja et al. (2006) and Tyler et al. (2009). Assuming that p-variate observations Xi , i = 1, . . . , n, are available, consider two scatter matrices Sa = Sa (X1 , . . . , Xn ) and Sb = Sb (X1 , . . . , Xn )— here, we call scatter matrix a p × p symmetric and positive definite matrixvalued statistic S = S(X1 , . . . , Xn ) such that S(AX1 + b, . . . , AXn + b) = AS(X1 , . . . , Xn )A0 for all invertible p × p matrices A and all p-vectors b. ˜ 0 whose columns are all eigenvecForm then an arbitrary invertible matrix Γ ˜ =Γ ˜ −1 into L ˜ by using the scheme described tors of Sb−1 Sa , and standardize Λ in Section 2.1. Then it follows from Oja et al. (2006) and the Delta method √ ˜ (n) − L) = OP (1) as n → ∞, under Pϑ,g , ϑ = (µ0 , (vecd◦ L)0 )0 ∈ Θ, that n(L provided that Sa and Sb are root-n consistent for different quantities un(n) ˜ = L(S ˜ a , Sb ). It is easy to check that this der Pϑ,g . Below we write L

consistency result also holds if Sa and/or Sb is only a shape matrix— in the sense that, for all invertible p × p matrices A and all p-vectors b, there exists c ∈ R (that may depend on A, b, and the sample) such that S(AX1 + b, . . . , AXn + b) = cAS(X1 , . . . , Xn )A0 . In the present simulation, we focused on the bivariate case p = 2, and, quite similarly to hypothesis testing, we generated, for three different setups

29

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS (d,m)

indexed by d ∈ {1, 2, 3}, M = 2, 000 independent random samples Zi (d,m) (d,m) (Zi1 , Zi2 )0 , i (d) (d) g1 (z1 )g2 (z2 ) the

=

= 1, . . . , n, of size n = 4, 000. Denoting by g (d) (z) = (d,m)

common pdf of Zi

m = 1, . . . , M , the marginals densities (d)

(i) In Setup d = 1, g1

(d)

and g2

(d) g1

(d,m)

= (Zi1 and

(d) g2

(d,m) 0 ),

, Zi2

i = 1, . . . , n,

are as follows.

are the pdfs of the standard normal dis-

tribution (N ) and the Student distribution with 5 degrees of freedom (t5 ), respectively; (d)

(ii) In Setup d = 2, g1

is the pdf of the logistic distribution with scale (d)

parameter one (log), and g2 (iii) In Setup d = 3,

(d) g1

is t5 ; (d)

is t8 and g2

is t5 . (d,m)

We chose to use L = Ip and µ = (0, 0)0 , so that Xi (d,m) Zi

(d,m)

= LZi

+µ =

are the observations themselves (other values of L and µ led to ex-

tremely similar results). For each sample, we computed three preliminary estimators from the pro˜ = L(S ˜ cov , Scov4 ), cedure described in the beginning of this section, namely L ˜ = L(S ˜ cov , SHOP ), and L ˜ = L(S ˜ MCD , SHOP ), where Scov := L

1 n

Pn

i=1 (Xi



¯ ¯ 0 X)(X i − X) is the regular empirical covariance matrix, Scov4

n 1X −1 ¯ 0 Scov ¯ (Xi − X)(X ¯ ¯ 0 (Xi − X) (Xi − X) := i − X) n i=1

is a fourth-order scatter matrix, SHOP is the van der Waerden rank-based shape matrix estimator from Hallin et al. (2006), and SMCD is the minimum covariance determinant scatter matrix from Rousseeuw (1984, 1985). Unlike Scov and Scov4 , for which root-n consistency requires finite fourth-order moments and finite eight-order moments, respectively, root-n consistency of SHOP and SMCD does not require any moment assumption (other shape matrices that avoid any moment assumption, and that might have been used here, are those defined in Tyler (1987) and D¨ umbgen (1998)). The estima˜ = L(S ˜ cov , Scov4 ) is known as the FOBI estimator in the literature. tor L

30

ILMONEN AND PAINDAVEINE

Starting from each of these three preliminary estimators, we computed ˆ (j) , j = 1, 2, 3, achieving semiparametric efficiency the three R-estimators L f in the three setups considered (that is, we took f (j) = g (j) , j = 1, 2, 3). We therefore considered 9 R-estimators. Again, the location estimate µ ˆ# used in ˜ Med[L ˜ −1 X1 , . . . , L ˜ −1 Xn ] each case is the componentwise median µ ˆMed := L ˜ used to initiate the one-step procedure). (based on the L For each of the 12 estimators of L considered (the three preliminary es˜ and the resulting 9 R-estimators) and each setup d, Figure 2 timators L reports a boxplot for the M observed squared errors (29) (d,m)

ˆ kL(X 1

, . . . , Xn(d,m) ) − Lk2Fr =

p X

ˆ rs (X (d,m) , . . . , X (d,m) ) − Lrs 2 , L n 1 

r,s=1,r6=s

where kAkFr = (Trace AA0 )1/2 is the Frobenius norm of the p × p matrix A. 



The results show that, for each target density f (j) , setup d, and pre˜ our one-step R-estimators provide a clear improveliminary estimator L, ˜ The improvement may be quite dramatic, particularly so for ment over L. ˜= very robust (hence poorly efficient) preliminary estimators, that is, for L ˜ MCD , SHOP ). As expected, for any fixed setup d, the distribution of L ˆ (j) L(S f does not seem to depend on the preliminary estimator used (with the pos˜ = L(S ˜ MCD , SHOP ) in setup 2, where that preliminary sible exception of L ˆ (d) in Setup d is conestimate actually behaves poorly). Also, optimality of L f firmed in most cases at the sample size considered. It should be noted that the results are remarkably good, even in the cases for which the preliminary ˜ = L(S ˜ cov , Scov4 )). estimator used is not root-n consistent (that is, for L Finally, we illustrate the method we proposed in Section 5.2 for the estimation of the cross-information coefficients. We focused on the first 50 replications in Setup 1 above (g = g (1) ) and on the target density f = f (3) (6= g (1) ). The cross-information coefficients to be estimated then are γ12 (f, g) ≈ 1.478, γ21 (f, g) ≈ 0.862, ρ12 (f, g) ≈ 1.149, and ρ21 (f, g) ≈ 0.887. The upper left

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

31

picture in Figure 3 shows 150 plots of the mapping λ 7→ hγ#12 (λ) (based on f = f (3) ), among which the 50 pink curves are based on the prelimi˜ = L(S ˜ cov , Scov4 ), the 50 green curves are based on L ˜ = nary estimators L ˜ cov , SHOP ), and the 50 blue ones are based on L ˜ = L(S ˜ MCD , SHOP ). The L(S upper right, bottom left, and bottom right pictures of the same figure provide the corresponding plots for the mappings λ 7→ hγ#21 (λ), λ 7→ hρ#12 (λ), and λ 7→ hρ#21 (λ), respectively. The value at which each curve crosses the λ-axis is the resulting estimate of the inverse of the associated cross-information coefficient. To be able to evaluate the results, we plotted, in each subfigure, a vertical black line at the corresponding theoretical value, namely at 1/γ12 (f, g), 1/γ21 (f, g), 1/ρ12 (f, g), and 1/ρ21 (f, g). Clearly, the results are excellent, and there does not seem to be much dependence on the prelimi˜ used. nary estimator L APPENDIX A.1. Proofs of Theorems 3.1 and 3.2. The proofs of this section make use of the H´ ajek projection theorem for linear signed-rank statistics (see, e.g., Puri and Sen (1985), Chapter 3), which states that, if Yi = Sign(Yi )|Yi |, i = 1, . . . , n are i.i.d. with (absolutely continuous) cdf G and if K : (0, 1) → R is a continuous and square-integrable score function that can be written as the difference of two monotone increasing functions, then n 1 X √ Sign(Yi )K(G+ (|Yi |)) n i=1

(30)

n  R+  1 X i =√ Sign(Yi ) K + oL2 (1) n i=1 n+1

(31)

n   1 X =√ Sign(Yi ) E K(G+ (|Yi |)) | Ri+ + oL2 (1) n i=1

as n → ∞, where G+ stands for the common cdf of the |Yi |’s and Ri+ denotes the rank of |Yi | among |Y1 |, . . . , |Yn |. The quantities in (30) and (31)

32

ILMONEN AND PAINDAVEINE

Fig 1. Rejection frequencies (out of M = 5, 000 replications), under the null (a = 0) and increasingly severe alternatives (a = 1, 2, 3, 4), of the signed-rank tests φf (j) , j = 1, 2, 3, 4; see Section 6.1 for details. The sample size is n = 500 in both first columns and n = 10, 000 in the third one. In the first and third columns, tests are based on their asymptotic distribution, whereas the second column uses simulated critical values, obtained from 106 standard multinormal samples.

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

33

ˆ − Lk2Fr (see (29)) obtained in M = 2, 000 Fig 2. Boxplots of the squared errors kL replications from setups d = 1, 2, 3 (associated with underlying distributions g (d) , d = ˜ = L(S ˜ cov , Scov4 ), L ˜ = L(S ˜ cov , SHOP ), and L ˜ = 1, 2, 3) for the preliminary estimators L ˜ ˆ L(SMCD , SHOP ), and the nine estimators R-estimators Lf resulting from all combinations of a target density f (j) = g (j) , j = 1, 2, 3, and one of the three preliminary estimators available; see Section 6.2 for details. The sample size is n = 4, 000.

34

ILMONEN AND PAINDAVEINE

Fig 3. Top left: 150 plots of the mapping λ 7→ hγ#12 (λ) based on f = f (3) , associated with the first 50 replications from Setup 1 (g = g (1)) in Figure 2: the 50 curves in pink, green, ˜ = L(S ˜ cov , Scov4 ), L ˜ = L(S ˜ cov , SHOP ), and blue are based on the preliminary estimators L ˜ ˜ and L = L(SMCD , SHOP ), respectively. Top right, bottom left, and bottom right: the corresponding plots for the mappings λ 7→ hγ#21 (λ), λ 7→ hρ#12 (λ), and λ 7→ hρ#21 (λ), respectively.

35

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

are linear signed-rank quantities that are said to be based on approximate and exact scores, respectively. In the rest of this section, we fix ϑ ∈ Θ, f ∈ Fulan , and g ∈ F. We write throughout Zi , Si , and Ri+ , for Zi (ϑ), Si (ϑ), and Ri+ (ϑ), respectively. We (n)

also write Eh instead of Eϑ,h , with h = f, g. We then start with the proof of Theorem 3.2(i). Proof of Theorem 3.2(i). Fix r 6= s ∈ {1, . . . , p} and two score functions Ka , Kb : (0, 1) → R with the same properties as K above. Then, (n)

by using (i) Eg [Sir ] = 0, (ii) the independence (under Pϑ,g ) between the Sir ’s and the (Rir , |Zir |)’s, and (iii) the independence between the Zir ’s and the Zis ’s, we obtain Eg

n h 1 X



n



Sir Sis Ka (G+r (|Zir |))Kb (G+s (|Zis |)) − Ka

i=1

 R+  ir

Kb

n+1

 R+ 2 i is

n+1

n h  R+   R+ 2 i 1X ir is Eg Ka (G+r (|Zir |))Kb (G+s (|Zis |)) − Ka Kb = n i=1 n+1 n+1

≤ 2Eg

h

Ka (G+r (|Zir |)) − Ka h

 R+ i 2 ir

+2Eg Ka

n+1

Eg

 R+ 2 i ir

n+1

h

i

Eg Kb2 (G+s (|Zis |))

h

Kb (G+s (|Zis |)) − Kb

 R+ 2 i is

n+1

.

Consequently, the square integrability of Ka , Kb , and the convergence to R+

ir zero of both Eg [(Ka (G+r (|Zir |)) − Ka ( n+1 ))2 ] and Eg [(Kb (G+r (|Zis |)) −

R+

is Kb ( n+1 ))2 ] (which directly follows from (30)) entail

n 1 X √ Sir Sis Ka (G+r (|Zir |))Kb (G+s (|Zis |)) n i=1 n  R+   R+  1 X ir is =√ Sir Sis Ka Kb + oL2 (1) n i=1 n+1 n+1 (n)

−1 as n → ∞, under Pϑ,g . Theorem 3.2(i) follows by taking Ka = ϕfr ◦ F+r −1 and Kb = F+s .

36

ILMONEN AND PAINDAVEINE

We go on with the proof of Theorem 3.1, for which it is important to note that, by proceeding as in the proof of Theorem 3.2(i) but with (31) instead of (30), we further obtain that n 1 X √ Sir Sis Ka (G+r (|Zir |))Kb (G+s (|Zis |)) n i=1 n  R+   R+  1 X ir is Kb + oL2 (1) =√ Sir Sis Ka n i=1 n+1 n+1 n   1 X =√ Sir Sis E Ka (G+r (|Zir |)) | R+ir n i=1

(32)





× E Kb (G+s (|Zis |)) | R+is + oL2 (1), (n)

still as n → ∞ under Pϑ,g . Proof of Theorem 3.1. We have to show that, for any r, s ∈ {1, . . . , p}, (33) n  1 X ϕf (Zi )Zi0 − Ip rs | S1 , . . . , Sn , R1+ , . . . , Rn+ = (T ϑ,f )rs + oL2 (1) Ef √ n i=1





(n)

as n → ∞, under Pϑ,f . Now, the left-hand side of (33) rewrites 

Ef

n  1 X √ ϕf (Zi )Zi0 − Ip rs | S1 , . . . , Sn , R1+ , . . . , Rn+ n i=1



n   1 X Ef Sir Sis ϕf (|Zir |)|Zis | − δrs | S1 , . . . , Sn , R1+ , . . . , Rn+ =√ n i=1 n   1 X + + + + (34) = √ , . . . , Rns − δrs . , . . . , Rnr , R1s Sir Sis Ef ϕf (|Zir |)|Zis | | R1r n i=1

For r 6= s, this yields 

Ef

n  1 X √ ϕf (Zi )Zi0 − Ip rs | S1 , . . . , Sn , R1+ , . . . , Rn+ n i=1



n   1 X + + + + =√ , . . . , Rns Sir Sis Ef ϕf (|Zir |) | R1r , . . . , Rnr ] Ef |Zis | | R1s n i=1 n   R+   R+  1 X −1 −1 ir is =√ Sir Sis ϕfr F+r F+r + oL2 (1) n i=1 n+1 n+1

= (T ϑ,f )rs + oL2 (1)

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

37

(n)

−1 as n → ∞, under Pϑ,f , where we have used (32), still with Ka = ϕfr ◦ F+r −1 and Kb = F+s , but this time at g = f . This establishes (33) for r 6= s. −1 −1 As r = s, (34) now entails (writing Kab (u) := ϕf (F+r (u)) × F+r (u) for

all u) n  1 X ϕf (Zi )Zi0 − Ip rs | S1 , . . . , Sn , R1+ , . . . , Rn+ Ef √ n i=1





=

n √  1 X + + √ Ef ϕf (|Zir |)|Zir | | R1r , . . . , Rnr − n n i=1





= Ef (35)



n √ 1 X + + √ Kab (F+r (|Zir |)) | R1r , . . . , Rnr − n n i=1



n  R+  √ 1 X i =√ Kab − n + oL2 (1) n i=1 n+1 n  i  √ 1 X Kab − n + oL2 (1) =√ n i=1 n+1

(36)

√ Z = n

1

Kab (u) du −

0

(37)



n + oL2 (1)

= oL2 (1), (n)

still as n → ∞, under Pϑ,f , where (35), (36), and (37) follow from the H´ajek projection theorem for linear rank (not signed-rank ) statistics (see, e.g., Puri and Sen (1985), Chapter 2), the square-integrability of Kab (·) (see the proof of Proposition 3.2(i) in Hallin et al. (2006)), and integration by parts, respectively. This further proves (33) for r = s, hence also the result. Proof of Theorem 3.2(ii)-(iii). (ii) In view of Theorem 3.2(i), it is sufficient to show that both asymptotic normality results hold for ∆∗ϑ,f,g;2 . (n)

The result under Pϑ,g then straightforwardly follows from the multivariate CLT. As for the result under local alternatives, it is obtained as usual, (n)

(n)

(n)

by establishing the joint normality under Pϑ,g of log(dPϑ+n−1/2 τ,f /dPϑ,g ) and ∆∗ϑ,f,g;2 , then applying Le Cam’s third Lemma; the required joint nor-

38

ILMONEN AND PAINDAVEINE

mality follows from a routine application of the classical Cram´er-Wold device. (iii) The proof, that is long and tedious, is also a quite trivial adaptation of the proof of Proposition A.1. in Hallin et al. (2006). We therefore omit it. A.2. Proofs of Theorems 4.1 and 4.2. Proof of Theorem 4.1. By using Lemma 4.1 with g = f in (15), we obtain Qf = (vec T ϑˆ0# ,f )0 p X

"

×

n

αrs (f ) er e0r ⊗ L20rs er e0r + es e0s − L0rs er e0s − L0rs es e0r



r,s=1,r6=s

# o  0

+βrs (f ) er e0s ⊗ L0rs L0sr er e0s − L0rs er e0r − L0sr es e0s + es er

(vec T ϑˆ0# ,f ),

which, as all diagonal entries of T ϑˆ0 ,f are equal to zero, indeed yields Qf = (vec T ϑˆ0# ,f )0 Mf (vec T ϑˆ0# ,f ). The equality (17) then easily follows from the identity (C 0 ⊗ A)(vec B) = vec(ABC). Proof of Theorem 4.2. (i) Applying Corollary 3.1, with ϑˇ# := ϑˆ0# = (ˆ µ0# , (vecd◦ L0 )0 )0 and ϑ := ϑ0 = (µ0 , (vecd◦ L0 )0 )0 , entails that ∆∗ϑˆ

0# ,f

∆∗ϑ0 ,f (38)

+ oP (1) as n → ∞ under

(n) Pϑ0 ,g .

=

Consequently, we have that

Qf = (vec ∆∗ϑ0 ,f ;2 )0 (Γ∗L0 ,f ;2 )−1 (vec ∆∗ϑ0 ,f ;2 ) + oP (1), (n)

(n) −1/2 τ,g 0 +n

still as n → ∞, under Pϑ0 ,g —hence also under Pϑ

(from contigu-

ity). The result then follows from Theorem 3.2(ii). (ii) It directly follows (n) −1/2 τ,f , 0 +n

from (i) that, under the sequence of local alternatives Pϑ

(n)

φf

has

asymptotic power 1 − Ψp(p−1) χ2p(p−1),1−α ; τ20 (Γ∗L0 ,f ;2 )−1 τ2 . This establishes 

the result, since these local powers coincide with the semiparametrically optimal (at f ) powers in (8).

39

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

A.3. Proofs of Lemma 5.1 and Theorems 5.1, 5.2, and 5.3. Proof of Theorem 5.1. (i) Fix ϑ ∈ Θ and g ∈ Fulan . From (20), (n)

ˆ∗ the fact that Γ ˜ L

# ,f ;2

− Γ∗L,f,g;2 = oP (1) as n → ∞ under Pϑ,g , and The-

orem 3.2(iii), we obtain √

√ ˜ # − L) + (Γ ˆ ∗˜ n vecd◦ (L )−1 ∆∗ϑ˜ ,f ;2 L# ,f ;2 # √ ◦ ˜ ∗ −1 ∗ = n vecd (L# − L) + (ΓL,f,g;2 ) ∆ϑ˜ ,f ;2 + oP (1)

ˆ f # − L) = n vecd◦ (L

#

=

(39)

(Γ∗L,f,g;2 )−1 ∆∗ϑ,f ;2

+ oP (1)

(n)

as n → ∞ under Pϑ,g . Consequently, Theorem 3.2(i)-(ii) entails that, still (n)

as n → ∞ under Pϑ,g , (40)



ˆ f # − L) n vecd◦ (L

(41)

=

(Γ∗L,f,g;2 )−1 ∆∗ϑ,f,g;2 + oP (1)

L

→ Np(p−1) 0, (Γ∗L,f,g;2 )−1 Γ∗L,f ;2 (Γ∗L,f,g;2 )−10 . 

Now, by using the fact that C 0 (vecd◦ H) = (vec H) for any p × p ma√ ˆ f # − L) = trix H with only zero diagonal entries, we have that n vec(L √ ˆ f # − L), so that (21), (22), and (23) follow from (39), (40), n C 0 vecd◦ (L and (41), respectively. (ii) The asymptotic covariance matrix of



(n)

ˆ f # − L), under P , n vecd◦ (L ϑ,f

reduces to (Γ∗L,f ;2 )−1 (let g = f in (41)), which establishes the result. Proof of Theorem 5.2. By using again the fact that C 0 (vecd◦ H) = (vec H) for any p × p matrix H with only zero diagonal entries, and then

40

ILMONEN AND PAINDAVEINE

Lemma 4.1, we obtain ˆf# − L ˜ #) vec(L ˆf# − L ˜ # ) = √1 C 0 (Γ ˆ ∗˜ ˜ −1 )0 vec T ˜ = C 0 vecd◦ (L )−1 C(Ip ⊗ L # ϑ# ,f L# ,f ;2 n 1 ˜ #) = √ (Ip ⊗ L n p X

"

×

n

˜ 2#rs er e0r + es e0s − L ˜ #rs er e0s − L ˜ #rs es e0r α ˆ rs# (f ) er e0r ⊗ L



r,s=1,r6=s

+βˆrs# (f )

er e0s



˜ #rs L ˜ #sr er e0s L



˜ #rs er e0r L



˜ #sr es e0s L

+

 es e0r

# o

vec T ϑ˜# ,f .

Since all diagonal entries of T ϑ˜# ,f are zeros, we have that ˆf# − L ˜ # ) = √1 (Ip ⊗ L ˜ #) vec(L n

p X

"

n

˜ #rs er e0 α ˆ rs# (f ) er e0r ⊗ es e0s − L s

r,s=1,r6=s

 ˜ #rs er e0 +βˆrs# (f ) er e0s ⊗ es e0r − L r

(42)

# o

vec T ϑ˜# ,f .

The identity (C 0 ⊗ A)(vec B) = vec(ABC) then yields ˜ # ) vec ˆf# − L ˜ # ) = √1 (Ip ⊗ L vec(L n

"

p X

#

ˆf # )sr es e0r (N



 ˜ #rs er e0r L

r,s=1,r6=s

Hence, we have p

ˆf# − L ˜# = L



X  1 ˜ ˆf # )sr es e0 − L ˜ #rs er e0 √ L (N # r r n r,s=1,r6=s p

=

 1 ˜ X ˆ ˜ #rs er e0r √ L (Nf # )sr es e0r − L # n r,s=1

=

 X 1 ˜  ˜ #rs (N ˆf # )sr er e0 √ L L # Nf # − r n r,s=1

=

 X 1 ˜ ˆ ˜ # Nf # )rr er e0r √ L (L # Nf # − n r=1

=

1 ˜ ˆ ˜ √ L # (Nf # − diag(L# Nf # )), n

p

p

.

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

41

which proves the result. Proof of Lemma 5.1. In this proof, all stochastic convergences are as (n) ˇ # )0 )0 is an arn → ∞ under Pϑ,g . First note that, if ϑˇ# := (ˇ µ0# , (vecd◦ L bitrary locally asymptotically discrete root-n consistent estimator for ϑ = (µ0 , (vecd◦ L)0 )0 , we then have that √ ˇ −1 )C 0 n vecd◦ (L ˇ # − L) + oP (1) (43) vec(T ϑˇ# ,f − T ϑ,f ) = −Gf,g (Ip ⊗ L # (compare with Corollary 3.1). Incidentally, note that (43) implies that vec T ϑˇ# ,f is OP (1) (by proceeding exactly as in the proof of Theorem 3.2(i)(n)

(ii), we can indeed show that, under Pϑ,g , vec T ϑ,f is asymptotically multinormal, hence stochastically bounded). Now, from (43), we obtain vec(T ϑ˜γrs ,f − T ϑ˜# ,f ) λ#

√ ˜ γrs − L ˜ # ) + oP (1) ˜ −1 )C 0 n vecd◦ (L = −Gf,g (Ip ⊗ L λ# # ˜ −1 )C 0 vecd◦ (L ˜ # er e0 − L ˜ # diag(L ˜ # er e0 )) + oP (1), = −λ(T ϑ˜# ,f )rs Gf,g (Ip ⊗ L s s # which, by using the fact that C 0 (vecd◦ H) = (vec H) for any p × p matrix H with only zero diagonal entries, leads to vec(T ϑ˜γrs ,f − T ϑ˜# ,f ) λ#

˜ −1 )vec(L ˜ # er e0s − L ˜ # diag(L ˜ # er e0s )) + oP (1) = −λ(T ϑ˜# ,f )rs Gf,g (Ip ⊗ L # ˜ # er e0s )) + oP (1). = −λ(T ϑ˜# ,f )rs Gf,g vec(er e0s − diag(L This yields vec(T ϑ˜γrs ,f − T ϑ˜# ,f ) λ#

= −λ(T ϑ˜# ,f )rs Gf,g vec(er e0s ) + oP (1) = −λ(T ϑ˜# ,f )rs (γrs (f, g)vec(er e0s ) + ρrs (f, g)vec(es e0r )) + oP (1).

42

ILMONEN AND PAINDAVEINE

Premultiplying by (T ϑ˜# ,f )rs (es ⊗ er )0 , we then obtain (T ϑ˜# ,f )rs (T ϑ˜γrs ,f )rs − ((T ϑ˜# ,f )rs )2 = −λ((T ϑ˜# ,f )rs )2 γrs (f, g) + oP (1) λ#

(recall indeed that T ϑ˜# ,f = OP (1)), which establishes the γ-part of the lemma. The proof of the ρ-part follows along the exact same lines, but for the fact that the premultiplication is by (T ϑ˜# ,f )sr (er ⊗ es )0 . Proof of Theorem 5.3. We fix ϑ ∈ Θ, f, g ∈ Fulan , and r 6= s ∈ {1, . . . , p}, and concentrate on establishing that γˆrs# (f ) = γrs (f, g) + oP (1), (n)

as n → ∞ under Pϑ,g (again, the proof of the ρ-result is entirely similar). In the sequel, we stress the dependence in n of the various statistics with superscripts

(n) . (n)

(n)−

(n)+

Let us first show that, under Pϑ,g , λγrs # , hence also λγrs # , is OP (1) as n → ∞. Assume therefore it is not: then, there exist  > 0 and a se(n )

(n )−

quence ni % ∞ such that, for all ` ∈ R and i, Pϑ,gi [λγrsi # > `] > . This (n )

(n )γrs

implies, for arbitrarily large `, that Pϑ,gi [h# i

(`) > 0] > , hence, in view

of Lemma 5.1, (n ) 

(n )γrs

Pϑ,gi (1 − `γrs (f, g)) h# i

(0) + ζ (ni ) > 0 >  

for all i, where ζ (n) , n ∈ N is some oP (1) sequence. For ` > (γrs (f, g))−1 , this entails, for all i, (n ) 

(n )γrs

Pϑ,gi 0 < h# i

(0) < (`γrs (f, g) − 1)−1 |ζ (ni ) | > , 

(n)−

(n)

which contradicts (27). It follows that λγrs # is OP (1) under Pϑ,g . By using again (27), there exist, for all η > 0, a positive real number δη and an integer Nη such that (n)  (n)γrs

Pϑ,g h#



(0) ≥ δη ≥ 1 −

η 2

43

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS (n)−

(n)+

for all n ≥ Nη . Since λγrs # and λγrs # are OP (1), Lemma 5.1 implies that, for all η > 0 and ε > 0, there exists an integer Nε,δ ≥ Nη such that, for all (n)±

(n)−

(n)+

n ≥ Nε,δ (with λγrs # standing for either λγrs # or λγrs # ), (n) 

(n)±

(n)γrs

Pϑ,g (1 − λγrs # γrs (f, g))h#

(n)γrs

(0) ∈ [h#

 η (n)± (λγrs # ) ± ε] ≥ 1 − . 2

It follows that for all η > 0, ε > 0 and n ≥ Nε,δ , letting δ = δη , (n) 

(n) 

(n) 

(n)±

(n)γrs

Pϑ,g Aε,δ := Pϑ,g (1 − λγrs # γrs (f, g))h#

(n)γrs

(0) ∈ [h# (n)γrs

and h#

(n)±

(λγrs # ) ± ε]

(0) ≥ δ



≥ 1 − η. (n) ˆ (n) , D(n) , and D± Next, denote by D the graphs of the mappings (n)γrs

λ 7→ h#

(n)−

(n)−

(n)γrs

(λγrs # ) − c(λ − λγrs # )(h# (n)γrs

λ 7→ (1 − λγrs (f, g)) h#

(n)−

(n)γrs

(λγrs # ) − h#

(n)+

(λγrs # ))

(0),

and (n)γrs

λ 7→ (1 − λγrs (f, g)) h#

(0) ± ,

respectively. These graphs take the form of four random straight lines, (n)

intersecting the horizontal axis at λγrs # (our estimator of (γrs (f, g))−1 ), (n)+

λ0 := (γrs (f, g))−1 , λ0

(n)−

and λ0

(n)

, respectively. Since D± and D(n) are (n)−

parallel, with a negative slope, we have that λ0 der

(n) Aε,δ ,

(n)+

≤ λ0 ≤ λ0

. Un-

that common slope has absolute value at least δγrs (f, g), which (n)+

implies that λ0

(n)−

− λ0



2ε δγrs (f,g) .

(n)

Still under Aε,δ , for λ values be-

(n)− (n)+ ˆ (n) (n) (n) tween λγrs # and λγrs # , D is lying between D− and D+ , which entails (n)−

λ0

(n)

(n)+

≤ λγrs # ≤ λ0

.

Summing up, for all η > 0 and ε > 0, there exist δ = δη > 0, and (n)

N = Nεγrs (f,g)δ/2,δ such that, for any n ≥ N , with Pϑ,g probability larger (n)

(n)+

than 1 − η, |λγrs # − λ0 | ≤ λ0

(n)−

− λ0

≤ ε.

44

ILMONEN AND PAINDAVEINE

REFERENCES Amari, S. (2002). Independent component analysis and method of estimating functions. IEICE Trans. Fundamentals Electronics, Communications and Computer Sciences E85-A 540–547. Bickel, P. J., Klaassen, C. A. J., Ritov Y., and Wellner J. A. (1993). Efficient and Adaptive Statistical Inference for Semiparametric Models, Johns Hopkins University Press, Baltimore. Cassart, D., Hallin, M. and Paindaveine, D. (2010). On the estimation of crossinformation quantities in R-estimation. In J. Antoch, M. Huˇskov´ a and P.K. Sen, Editors: Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in Honor of Professor Jana Jureˇckov´ a, I.M.S. Monographs-Lecture Notes, 35–45. Chen, A. and Bickel, P. J. (2006). Efficient independent component analysis. Ann. Statist. 34 2825–2855. ¨ mbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension. Ann. Inst. Du Statist. Math. 50 471–491. Hallin, M., Oja, H. and Paindaveine, D. (2006). Semiparametrically efficient rank-based inference for shape. II. Optimal R-Estimation of Shape. Ann. Statist. 34 2757–2789. Hallin, M. and Paindaveine, D. (2006). Semiparametrically efficient rank-based inference for shape I: Optimal rank-based tests for sphericity. Ann. Statist. 34 2707–2756. Hallin, M. and Paindaveine, D. (2008). Optimal rank-based tests for homogeneity of scatter (with M. Hallin). Ann. Statist. 36 1261–1298. Hallin, M., Vermandele, C. and Werker, B. J. M. (2006). Serial and nonserial signand-rank statistics: asymptotic representation and asymptotic normality. Ann. Statist. 34 254–289. Hallin, M. and Werker, B. J. M. (2003). Semiparametric efficiency, distributionfreeness, and invariance. Bernoulli 9 55–65. Kreiss, J-P. (1987). On adaptative estimation in stationary ARMA processes. Ann. Statist. 15 112–133. Nordhausen, K., Oja, H. and Paindaveine, D. (2009). Signed-rank tests for location in the symmetric independent component model. J. of Multivariate Anal. 100 821–834. ¨ , S. and Eriksson, J. (2006). Scatter matrices and independent component Oja, H., Sirkia analysis. Austrian Journal of Statistics 35 175–189. Oja, H., Paindaveine, D. and Taskinen S. (2010). Parametric and nonparametric tests for multivariate independence in IC models. Submitted.

RANK-BASED INFERENCE IN SYMMETRIC IC MODELS

45

Puri, M. L. and Sen, P. K. (1985). Nonparametric Methods in General Linear Models. J. Wiley, New York. Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880. Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point, in Mathematical Statistics and Applications (W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, Eds.), Vol. B, pp. 283–297, Reidel, Dordrecht. ¨ , S., Taskinen, S. and Oja, H. (2007). Symmetrised M-estimators of multivariate Sirkia scatter. J. Multivariate Anal. 98 1611–1629. Theis, F. J. (2004). A new concept for separability problems in blind source separation. Neural Comput. 16 1827–1850. ¨ mbgen, L. and Oja, H. (2009). Invariant co-ordinate Tyler, D. E., Critchley, F., Du selection. Journal of the Royal Statistical Society, Series B 71 549–592. Tyler, D. E. (1987). A distribution-free M-estimator of multivariate scatter. Ann. Statist. 15 234–251. Pauliina Ilmonen

Davy Paindaveine

Tampere School of Health Sciences

E.C.A.R.E.S.,

University of Tampere

´partement de Mathe ´matique and De

FIN-33014 University of Tampere

´ libre de Bruxelles Universite

Finland

50, Avenue F.D. Roosevelt, CP114/04

E-mail: [email protected]

B-1050 Brussels, Belgium E-mail: [email protected] http://homepages.ulb.ac.be/˜dpaindav

Suggest Documents