A note on the distribution of the partial correlation coefficient with ...

9 downloads 0 Views 103KB Size Report
Jan 24, 2011 - test for conditional independence with applications to granger causality. Technical report. Goodman, L. A. (1959). Partial tests for partial tau.
arXiv:1101.4616v1 [math.ST] 24 Jan 2011

A note on the distribution of the partial correlation coefficient with nonparametrically estimated marginal regressions Wicher Bergsma London School of Economics and Political Science January 25, 2011

Abstract There has been much interest in the nonparametric testing of conditional independence in the econometric and statistical literature, but the simplest and potentially most useful method, based on the sample partial correlation, seems to have been overlooked, its distribution only having been investigated in some simple parametric instances. The present note shows that an easy to apply permutation test based on the sample partial correlation with nonparametrically estimated marginal regressions has good large and small sample properties.

1

Introduction

Various authors have developed tests of conditional independence without assuming normality of the variables. For example, Kendall (1942), Goodman (1959) and Gripenberg (1992) proposed partial versions of Kendall’s tau. Recently, there has been a focus on incorporating modern nonparametric methods to test for conditional independence (Su & White, 2007, 2008; Song, 2009; Huang, 2010; Bouezmarni, Rombouts, & Taamouti, 2010). Conditional independence relations are the building blocks of graphical models, which can be used to investigate causal relations for economic and other data. Surprisingly however, even though the partial correlation is very well-known, little seems to be known about its sampling distribution unless very strong assumptions are made. The present note fills this gap in the literature and shows that tests based on the partial correlation are easy to apply while simulations indicate good small samples properties. Consider the random triple (X, Y, Z), with Y and Z real and X arbitrary, and suppose interest lies in the question whether Y and Z are conditionally independent given X, denoted Y ⊥ ⊥Z|X. If Y = g(X) + εY

and

1

Z = h(X) + εZ

(1)

for certain functions g and h and with E(εY |X = x) = E(εZ |X = x) = 0 for all x, the partial correlation coefficient is the correlation between the error terms, i.e.,

If Y ⊥ ⊥Z|X, then E(εY εZ |X) =

Z

EεY εZ ρY Z.X = q Eε2Y Eε2Z E(εY εZ |X = x)dFX (x) =

Z

E(εY |X = x)E(εZ |X = x)dFX (x) = 0

Hence, conditional independence implies that the partial correlation equals zero. This paper considers a conditional independence test for a sample {(Xi , Yi , Zi )} of independent replications of (X, Y, Z), based on the sample partial correlation rY Z.X . If g and h are known, conditional independence can easily be tested by a permutation test using rY Z.X . In practice, however, g and h are unknown, and our main result, Theorem 1 in Section 2, shows that replacing the regression functions g and h by appropriate estimates does not affect the asymptotic distribution of rY Z.X . The small sample distribution of the resulting estimator is typically analytically untractable, but the simulation study in Section 3, with n = 20 and n = 100 and using cubic spline smoothers for estimating g and h, shows close-to-nominal Type I error rates and little loss of power due to estimating the marginal regressions.

2

Large sample distribution of sample partial correlation with estimated marginal regressions

Assume (1) holds and suppose (X1 , Y1 , Z1 ), . . . , (Xn , Yn , Zn ) are independent replications of (X, Y, Z). Then with εYi = Yi − g(Xi )

and

εZi = Zi − h(Xi )

the sample partial correlation coefficient is Pn

i=1 εYi εZi n 2 Pn ε2 i=1 εYi i=1 Zi

rY Z.X = qP

(2)

If g and h are known, conditional independence can be tested using the permutation test of independence for the {(εYi , εZi )} based on rY Z.X , and the power of such a test is evidently the same as for an unconditional test between directly observed εY and εZ . In practice, ˆ This yields estimated however, we need to estimate g and h, say by estimators gˆ and h. errors εˆYi = Yi − gˆ(Xi )

and 2

ˆ i) εˆZi = Zi − h(X

and an estimated sample partial correlation coefficient Pn

ˆYi εˆZi i=1 ε P n ˆ2Yi ni=1 εˆ2Zi i=1 ε

rˆY Z.X = qP

(3)

Very little appears to have been published about the large or small sample distribution of rˆY Z.X , except if (X, Y, Z) has a normal distribution; then g and h are linear, and with d ˆ the least squares estimators of the regression planes, the dimension of X and gˆ and h √

rˆY Z.X n − 2 − dq 1 − rˆY2 Z.X

(4)

has a t-distribution with n − 2 − d degrees of freedom if conditional independence holds (Kendall & Stuart, 1973, Section 27.22). The corresponding unconditional test statistic has a t-distribution with n − 2 degrees of freedom under independence, so there is a loss of d degrees of freedom, or d observations, due to the conditioning on X. The distribution of rˆY Z.X based on linear regressions with nonnormality was studied by Steiger and Browne (1984) and Boik and Haaland (2006); the asymptotic distribution of (4) was shown to be standard normal under broad conditions if conditional independence is true. If, however, at least one of g or h is nonlinear, a test based on (4) with linear gˆ and ˆ plugged in can break down entirely (see Section 3). The main result of this paper is h given by the following theorem, which shows that (2) and (3) have identical asymptotic behaviour for appropriately estimated regression curves. We will make use of the following assumptions: A1: E(ε2Y ) < ∞ and E(ε2Z ) < ∞ ˆ A2: For some q1 , q2 > 0 and all x, nq1 (ˆ g (x)− g(x)) = Op (1) and nq2 (h(x)− h(x)) = Op (1), with uniform convergence on any compact set. Theorem 1 Suppose A1 and A2 hold. Then n1/2 (ˆ rY Z.X − ρY Z.X ) = n1/2 (rY Z.X − ρY Z.X ) + Op (n− min(q1 ,q2 ) ) Proof: By standard arguments, 

εˆYi = εYi 1 + Op [n−q1 ] so



εˆZi = εZi 1 + Op [n−q2 ] ,

and





εˆYi εˆZi = εYi εZi 1 + Op [n− min(q1 ,q2 ) ] Hence, using A1 and A2 and with covY Z.X = n−1

Pn

i=1 εYi εZi

and cov ˆ Y Z.X = n−1

n1/2 (cov ˆ Y Z.X − covY Z.X ) = Op [n− min(q1 ,q2 ) ] 3

Pn

ˆYi εˆZi , i=1 ε

Λ=0.1

Λ=0.3

1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 0.0 0.2 0.4 0.6 0.8 1.0

1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 0.0 0.2 0.4 0.6 0.8 1.0

Λ=0.5

Λ=0.7

1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 0.0 0.2 0.4 0.6 0.8 1.0

1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 0.0 0.2 0.4 0.6 0.8 1.0

Figure 1: Randomly generated data with different noise-to-signal ratios λ. Here, σ0 = 1 and so σε = λ/σ0 = λ. The theorem then follows because n

−1

n X i=1

εˆ2Yi

×n

−1

n X i=1

εˆ2Zi

!−1/2

=

n

−1

n X i=1

=

n−1

n X i=1

ε2Yi

×n

−1

ε2Yi × n−1

n X

ε2Zi

+ Op [n

ε2Zi

!−1/2

i=1

n X i=1

−2 min(q1 ,q2 )

!−1/2

]

+ Op [n− min(q1 ,q2 ) ]

2 Nonparametric estimators of g and h are local polynomial or cubic spline estimators (see, e.g., Wasserman, 2006). To summarize, Theorem 1 justifies using the following procedure. First, estimate gˆ ˆ from the sample, then plug the estimates into rY Z.X to obtain rˆY Z.X , and finally and h calculate a p-value for the conditional independence hypothesis using the permutation test applied to rˆY Z.X .

4

3

Simulation study

In our simulation study, the functions g and h were generated from an integrated Wiener process, i.e., g(x) = σ0

Z

x

0

W1t dt

and

h(x) = σ0

Z

0

x

W2t dt

(5)

where W1t and W2t are independent Wiener processes. Note that, with probability one, g and h are once differentiable. The error pairs {(εY (x), εZ (x)} were generated from a bivariate normal distribution with correlation ρY Z.X and equal marginal standard deviations, the common value denoted σε . A noise-to-signal ratio can be defined as λ = σε /σ0 . In Figure 1, randomly generated (centered) curves and data are plotted with σ0 = 1 and λ ∈ {0.1, .0.3, 0.5, 0.7}. We first simulated data according to the above model and fitted linear regression curves using the least squares method, which is the standard (and in this case, wrong) method for calculating partial correlation. Conditional independence was then tested using the permutation test for the estimated partial correlation. This procedure broke down completely, for example, with n = 100, α = 0.05 and λ = 0.5, we found the type I error probability to be 0.71. We next simulated data again according to the above model (5) with normal errors but now estimated g and h according to the same, and thus correct, model. Such estimators are cubic splines (Green & Silverman, 1994; Wahba, 1990). Only the estimates for the data points are required, which are given by 

ˆ2 I gˆ = H 2 H 2 + λ y where

−1

y

ˆy = σ λ ˆε,y /ˆ σ0,y



ˆ ˆ2I h(z) = H2 H2 + λ z

and

and

−1

y

ˆz = σ λ ˆε,z /ˆ σ0,z

and H 2 is the covariance matrix for the integrated Wiener process (e.g., Green & Silverman, 1994). For σ ˆ0 and σ ˆε we used the true values by which the data were generated. We then did a permutation test for independence on the {(ˆ εYi , εˆZi )} using the estimated partial correlation rˆY Z.X . Figure 2 shows power and Type I error rates and loss of power due to conditioning for several values of the partial correlation (based on 50,000 replications). It can be seen that for n = 20 and n = 100, unless the noise-to-signal ratio is very small, Type I errors error rates are close to nominal and there is very little loss of power due to conditioning. The method breaks down when the error standard deviation becomes very small, which is to be expected as it leads to strong overfitting. Figure 3, based on 500,000 replications, shows the effect of over- or undersmoothing on the Type I error rate, and hence gives an indication of robustness. It is seen that 5

Probability H0 is rejected

1.0

Ρ=0.8

0.8

Ρ=0.6

0.6

Ρ=0.4

0.4

0.2 Ρ=0.2 Ρ=0.0 0.0 0.0

0.1

0.2

0.3

0.4

Noise-to-signal ratio Λ

(a) n = 20 1.0

Ρ=0.4 Ρ=0.3

Probability H0 is rejected

0.8

0.6 Ρ=0.2 0.4

0.2

Ρ=0.1 Ρ=0.0

0.0 0.0

0.1

0.2

0.3

0.4

Noise-to-signal ratio Λ

(b) n = 100

Figure 2: Power curves and Type I error rates using α = 0.05 for several values of the partial correlation ρ. The dotted lines give the probabilities H0 is rejected for the corresponding test of marginal independence and give an asymptote for the curves.

6

0.12 0.11 Probability of a Type I error

n=20 0.10 0.09 0.08 n=100 0.07 0.06 0.05 0.04 0.0

0.5

1.0

1.5

2.0

2.5

3.0

Value of smoothness parameter Λ used for model fit

Figure 3: The effect of undersmoothing (λ < 0.5) and oversmoothing (λ > 0.5) on Type I error rates for α = 0.05, with data generated using λ = 0.5. The plot indicates strong robustness against oversmoothing up to a factor three for λ (0.5 < λ < 1.5). Severe underor oversmoothing leads to breakdown of type I error rates. undersmoothing has a negative effect on the Type I error rate, while oversmoothing can be done by a factor of about three for λ without negatively affecting the error rate. (In fact, some oversmoothing is has a positive effect on the error rate.) Note, however, that for n = 100, undersmoothing by a factor as large as three for λ still gives a Type I error rate of 6% (α = 5%), which should still be acceptable for most practical purposes.

4

Conclusions

Theorem 1 proves that, for sufficiently large samples, the partial correlation coefficient with appropriately estimated marginal regressions can be used in a very simple way, using the permutation test, to test for conditional independence. Simulation studies show that this method also works well for samples as small as n = 20, and with little loss of efficiency; this is in line with what was known to be the case for normal distributions. Our method is very robust to oversmoothing of the marginal regressions, but severe under- or oversmoothing leads to a breakdown of Type I error rates.

7

References Boik, R., & Haaland, B. (2006). Second-order accurate inference on simple, partial, and multiple correlations. Journal Of Modern Applied Statistical Methods, 5, 283-308. Bouezmarni, T., Rombouts, J. V., & Taamouti, A. (2010). A nonparametric copula based test for conditional independence with applications to granger causality. Technical report. Goodman, L. A. (1959). Partial tests for partial tau. Biometrika, 46, 425-432. Green, P. J., & Silverman, B. W. (1994). Nonparametric regression and generalized linear models (Vol. 58). London: Chapman & Hall. (A roughness penalty approach) Gripenberg, G. (1992). Partial rank correlations. Journal of the American Statistical Association, 87, 546-551. Huang, T. M. (2010). Testing conditional independence using maximal nonlinear conditional correlation. Ann. Stat., 38, 2047-2091. Kendall, M. G. (1942). Partial rank correlation. Biometrika, 32, 277-283. Kendall, M. G., & Stuart, A. (1973). The advanced theory of statistics. Vol. 2 (Third ed.). Hafner Publishing Co., New York. (Inference and relationship) Song, K. (2009). Testing conditional independence via Rosenblatt transforms. Ann. Stat., 37, 4011-4045. Steiger, J. H., & Browne, M. W. (1984). The comparison of interdependent correlations between optimal linear composites. Psychometrika, 49, 11-24. Su, L., & White, H. (2007). A consistent characteristic-function-based test for conditional independence. J. of Econometrics, 141, 807-834. Su, L., & White, H. (2008). A nonparametric Hellinger metric test for conditional independence. Econometric Theory, 24, 829-864. Wahba, G. (1990). Spline models for observational data (Vol. 59). Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Wasserman, L. (2006). All of nonparametric statistics. New York: Springer.

8

Suggest Documents