More Correlation Coefficients

2 downloads 0 Views 181KB Size Report
May 9, 2012 - More Correlation Coefficients. Lesson Overview. Why so many Correlation Coefficients. Point Biserial Coefficient. Phi Coefficient. Measures of ...
05/09/12

More Correlation Coeficients

Back to the Table of Contents

Applied Statistics - Lesson 13

More Correlation Coefficients Lesson Overview Why so many Correlation Coefficients Point Biserial Coefficient Phi Coefficient Measures of Association: C, V, Lambda Biserial Correlation Coefficient Tetrachoric Correlation Coefficient Rank-Biserial Correlation Coefficient Coefficient of Nonlinear Relationship (eta) Homework

Why so many Correlation Coefficients We introduced in lesson 5 the Pearson product moment correlation coefficient and the Spearman rho correlation coefficient. There are more. Remember that the Pearson product moment correlation coefficient required quantitative (interval or ratio) data for both x and y whereas the Spearman rho correlation coefficient applied to ranked (ordinal) data for both x and y. You should review levels of measurement in lesson 1 before we continue. It is often the case that the data variables are not at the same level of measurement, or that the data might instead of being quantitative be catagorical (nominal or ordinal). In addition to correlation coefficients based on the product moment and thus related to the Pearson product moment correlation coefficient, there are coefficients which are instead measures of association which are also in common use. For the purposes of correlation coefficients we can generally lump the interval and ratio scales together as just quantitative. In addition, the regression of x on y is closely related to the regression of y on x, and the same coefficient applies. We list below in a table the common choices which we will then discuss in turn. Variable Y\X

Quantitiative X

Quantitative Y Pearson r

Ordinal X Biserial rb

Ordinal Y

Biserial rb

Nominal Y

Point Biserial rpb Rank Bisereal rrb

Nominal X Point Biserial rpb

Spearman rho/Tetrachoric rtet Rank Biserial rrb Phi, L, C, Lambda

Before we go on we need to clarify different types of nominal data. Specifically, nominal data with two possible outcomes are call dichotomous.

Point-Biserial www.andrews.edu/~calkins/math/edrm611/edrm13.htm

1/4

05/09/12

More Correlation Coeficients

The point-biserial correlation coefficient, referred to as rpb , is a special case of Pearson in which one variable is quantitative and the other variable is dichotomous and nominal. The calculations simplify since typically the values 1 (presence) and 0 (absence) are used for the dichotomous variable. This simplification is sometimes expressed as follows: rpb = (Y1 - Y0) • sqrt(pq) / Y, where Y0 and Y1 are the Y score means for data pairs with an x score of 0 and 1, respectively, q = 1 - p and p are the proportions of data pairs with x scores of 0 and 1, respectively, and Y is the population standard deviation for the y data. An example usage might be to determine if one gender accomplished some task significantly better than the other gender.

Phi Coefficient If both variables instead are nominal and dichotomous, the Pearson simplifies even further. First, perhaps, we need to introduce contingency tables. A contingency table is a two dimensional table containing frequencies by catagory. For this situation it will be two by two since each variable can only take on two values, but each dimension will exceed two when the associated variable is not dichotomous. In addition, column and row headings and totals are frequently appended so that the contingency table ends up being n + 2 by m + 2, where n and m are the number of values each variable can take on. The label and total row and column typically are outside the gridded portion of the table, however. As an example, consider the following data organized by gender and employee classification (faculty/staff). (htm doesn't provide the facility to grid only the table's interior). Class.\Gender Female (0) Male (1) Totals Staff

10

5

15

Faculty

5

10

15

Totals:

15

15

30

Contingency tables are often coded as below to simplify calculation of the Phi coefficient. Y\X

0

1

Totals

1

A

B

A+B

0

C

D

C+D

Totals: A + C B + D

N

With this coding: phi = (BC - AD)/sqrt((A+B)(C+D)(A+C)(B+D)). For this example we obtain: phi = (25-100)/sqrt(15•15•15•15) = -75/225 = -0.33, indicating a slight correlation. Please note that this is the Pearson correlation coefficient, just calculated in a simplified manner. However, the extreme values of |r| = 1 can only be realized when the two row totals are equal and the two column totals are equal. There are thus ways of computing the maximal values, if desired.

Measures of Association: C, V, Lambda As product moment correlation coefficients, the point biserial, phi, and Spearman rho are all special cases of www.andrews.edu/~calkins/math/edrm611/edrm13.htm

2/4

05/09/12

More Correlation Coeficients

the Pearson. However, there are correlation coefficients which are not. Many of these are more properly called measures of association, although they are usually termed coefficients as well. Three of these are similar to Phi in that they are for nominal against nominal data, but these do not require the data to be dichotomous. One is called Pearson's contingency coefficient and is termed C whereas the second is called Cramer's V coefficient. Both utilize the chi-square statistic so will be deferred into the next lesson. However, the Goodman and Kruskal lambda coefficient does not, but is another commonly used association measure. There are two flavors, one called symmetric when the researcher does not specify which variable is the dependent variable and one called asymmetric which is used when such a designation is made. We leave the details to any good statistics book.

Biserial Correlation Coefficient Another measure of association, the biserial correlation coefficient, termed rb , is similar to the point biserial, but pits quantitative data against ordinal data, but ordinal data with an underlying continuity but measured discretely as two values (dichotomous). An example might be test performance vs anxiety, where anxiety is designated as either high or low. Presumably, anxiety can take on any value inbetween, perhaps beyond, but it may be difficult to measure. We further assume that anxiety is normally distributed. The formula is very similar to the point-biserial but yet different: rb = (Y1 - Y0) • (pq/Y) / Y, where Y0 and Y1 are the Y score means for data pairs with an x score of 0 and 1, respectively, q = 1 - p and p are the proportions of data pairs with x scores of 0 and 1, respectively, and Y is the population standard deviation for the y data, and Y is the height of the standardized normal distribution at the point z, where P(z'

Suggest Documents