The Cube of the correlation coefficient.

58 downloads 0 Views 14KB Size Report
representations, the cube of the correlation coefficient, is given as the ratio of the ... The Galton-Pearson correlation coefficient is perhaps the most broadly ...
The Cube of the Correlation Coefficient Yadolah Dodge1 and Valentin Rousson1,2 1

Statistics Group, University of Neuchâtel, 2000 Neuchâtel, Switzerland [email protected]

2

Centre for Mathematics and its Applications, Australian National University, Canberra, ACT 0200, Australia

Abstract : In this paper, we derive some simple formulas to express the correlation coefficient between two random variables in the case of a linear relationship. One of the representations, the cube of the correlation coefficient, is given as the ratio of the skewness of the response variable and the skewness of the explanatory variable. This result provides researchers with a criteria for choosing the response variable as well as testing the linearity in a simple regression problem. Key words : Correlation ; Least squares ; Linear Regression ; Response variable ; Direction dependence regression ; Skewness 1. Introduction The Galton-Pearson correlation coefficient is perhaps the most broadly applied coefficient in statistics. Thirteen ways to look at the correlation coefficient along with a brief history of the development of correlation and regression have been described by Rodgers and Nicewander (1988). In this article we introduce some other representations of the Galton-Pearson r in which an asymetrical element appears in the correlation coefficient. In particular, one of these representations expresses the cube of the correlation coefficient as the ratio of the skewness of the response and the explanatory variable. Such an expression allows the researchers to test the linearity of a regression line as well as the direction of the dependence of the response variable. 2. The Cube of the Correlation Coefficient Let us consider two random variables X and Y that are related by (1)

Y = α + βX + ε

where ε is an error random variable independent of X. Then the covariance between X and Y is given by Cov ( X , Y ) = E [(X − E [ X ])(Y − E [Y ])] = βσ X2

and the correlation coefficient between X and Y is (2)

ρ XY =

σ Cov ( X , Y ) =β X σ XσY σY

where σ X2 and σ Y2 are the variances of X and Y respectively. Other simple formulas involving the slope β of the linear model (1) can be obtained by generalizing the notion of the covariance between two to three random variables

1

(3)

Cov3( X , Y , Z ) = E [(X − E[ X ])(Y − E[Y ])(Z − E[ Z ])].

We call this covtri among X, Y and Z. The analogue of the correlation coefficient for three random variables is then defined as Cov 3( X , Y , Z ) . σ Xσ Yσ Z

ρ XYZ = In particular

ρ XXX

is equal to

γ X , the skewness coefficient of X, that is,

γ X = E (X − E[ X ]) / σ . The covtri have similar properties as the covariance. It is a symmetric, tri-linear function of the three random variables involved, and is such that 3

3 X

Cov 3( X , Y , Z ) = Cov 3( X , X , Y ) = Cov 3( X , Y , Y ) = Cov 3(Y , Y , Z ) = Cov 3( Z , Z , X ) = 0

if X, Y and Z are independent. Using these properties, it can be shown that under (1) we have (as long as γ X ≠ 0 ) (4)

3 ρ XY =

γY . γX

The representation (4) is quite interesting since it expresses the correlation coefficient as the ratio of two measure of skewness of X and Y in contrast to the usual formula for ρ XY in (2) which is the ratio of covariance of X, and Y by their variances from which no additional information can be obtained. If a linear model such as (1) is assumed, it is impossible to distinguish the explanatory variable from the response variable with formula (2), while it is possible with the expression (4). Dodge and Rousson (1999) use the sample representation of this formula to test the validity of the linear model, and put forward a simple procedure based on the cube of the correlation coefficient to decide which variable should be the response variable in a simple linear regression problem.

REFERENCES [1]

Dodge, Y. and Rousson (1999). Assymetric View of the Correlation Coefficient. Submitted for publication.

[2]

Rodgers, J.L. and Nicewander, W.A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42, 59-66.

RÉSUMÉ Dans cet article, on trouve certaines expression pour le coefficient de corrélation entre deux variables aléatoires dans le cas d'une relation linéaire. Une des relations, le cube du coefficient de corrélation est donné comme le rapport entre l'asymétrie de la variable dépendante et l'asymétrie de la variable indépendante. Ce résultat apporte aux chercheurs un critère pour choisir la variable dépendante ainsi que pour tester la linéarité dans un problème de régression linéaire simple.

2

Suggest Documents