The Cronbach-Mesbah Curve for assessing the unidimensionality of ...

23 downloads 530 Views 327KB Size Report
Cronbach-Mesbah curve (CM curve) proposed in [1]. This is a simple but effective method, based on the Cronbach coefficient of reliability, that can be used for ...
Cameletti, M.

The R package CMC

THE CRONBACH-MESBAH CURVE FOR ASSESSING THE UNIDIMENSIONALITY OF AN ITEM SET: THE R PACKAGE CMC Cameletti, M. (*) and Caviezel, V. Affiliation: University of Bergamo, Via dei Caniana, 2 24127 Bergamo Email: [email protected] Phone: +39 035 205 2519 Corresponding author: [email protected]

Keywords: Cronbach reliability coefficient, R software. Topic area of the submission: Classical Test Theory.

Abstract This contribution describes an R package, CMC, that calculates and plots the step-by-step Cronbach-Mesbah curve (CM curve) proposed in [1]. This is a simple but effective method, based on the Cronbach coefficient of reliability, that can be used for checking the unidimensionality of a measurement scale adopted in a psychological or social science survey. So far this graphical tool was implemented only in a SAS macro called ANAQOL [2] and therefore its diffusion was somewhat limited to SAS users. To extent the use of this method to a wider scientific community the package is written in the open-source R language.

The Cronbach-Mesbah curve An objective measurement should satisfy some properties such as order, unidimensionality, local independence, linearity and specific objectivity. Unidimensionality can be theoretically defined as the property of an instrument to measure a single characteristic of the measured object. If an instrument is characterized by unidimensionality, then the difference between two subjects in responding to a set of items depends only on the single latent trait. To study the unidimensionality of a set of observable items parallel model [4] where j (j=1,…,k). The term

we consider the

is the observed score (response) of a subject i (i=1,…,n) on item can be written as

, where

, the true

Cameletti, M. score, and

The R package CMC , the error score, are two unknown random variables generally assumed to be

independent; variance

is a variable fixed effect and

, whereas

is a random effect with zero mean and

is a random effect with zero mean and variance

.

Under the assumption that the observed items follow the parallel model, the reliability of any item is equal to

. Moreover, it is possible to prove that

is also the

constant correlation between any two items. The reliability of the sum of k items is given by Spearman-Brown formula

and its maximum likelihood estimator is the well known

Cronbach alpha coefficient, denoted by !. In particular,

is an increasing function of the

item number k; thus, if every item follows the parallel model, then the reliability of the sum increases when the number of items increases. Therefore, if ! does not increase or even decreases as k increases, that means that the added items do not belong to the same dimension, i.e. do not follow the parallel model. Moreover, it is possible to show that there is a direct connection between ! and the percentage of variance explained by the first component of the Principal Component Analysis (PCA) performed on the k items. The PCA is usually based on the analysis of the roots of the correlation matrix R of the k variables which, under the hypothesis of a parallel model, contains only two different elements: “1” in the diagonal and ! elsewhere. The roots of this matrix are:

and

. Thus,

indicates that there is a monotonic relationship between

, estimated by

. This , and the first root

that gives the percentage of variance explained by the first principal component. Thus, is considered as a measure of unidimensionality. To assess the unidimensionality of a set of items, it is possible to plot a curve, called step-bystep CM curve, which reports the number of items (from 2 to k) on the x-axis and the corresponding maximum

coefficient on the y-axis obtained through the following steps:

1. first of all the Cronbach coefficient

is computed using all the k items.

Cameletti, M.

The R package CMC

2. One at a time, the i-th item (i = 1, …, k) is left out and the Cronbach coefficient, denoted by

, is computed using the remaining (k-1) items. All the coefficients are collected in a set

given by

where the apex refers to the number of item removed at

each time. Then, the maximum of

is detected and the corresponding item is taken out.

3. The procedure of step 2 is repeated conditionally on the item removed previously. Supposing that item j was removed, the remaining items are left out one at a time and the corresponding Cronbach coefficient is calculated using (k-2) items. This gives rise to the following set of (k-1) coefficients

, where the

apex refers to the fact that 2 items are taken out. The item corresponding to the maximum of is then removed definitely. This procedure is repeated until only 2 items remain. At each step the removed item is the one which leaves the scale with its maximum value of the Cronbach coefficient a poor item, the

. If we remove

coefficient will increase, whereas if we remove a good item

is expected

toe decrease. Thus, a decrease of the CM curve, after adding a variable, would suggest that the added variable does not constitute an unidimensional set with the other variables. On the other hand, if the CM curve increases monotonically, then all the items contribute to measure the same latent trait and the bank of items is characterized by unidimensionality.

The CMC package The code for computing the CM curve has been organized in a library developed using the open-source R programming environment [5]. The result is a package called CMC (from the acronym of CM curve) which is composed by two functions. The first one, called alpha.cronbach, computes the Cronbach reliability coefficient

for an object of class

data.frame (or matrix) with n subjects (rows), k items (columns) and no missing values. The second function included in the package is called alpha.curve and computes the CM curve following the procedure described previously. The function takes as input an object of class data.frame (or matrix) with n subjects (rows), k items (columns) and no missing values. The output of the function is given by 2 objects:

Cameletti, M.

The R package CMC

1. an object of class data.frame with 3 columns. The first column (N.Item) contains the number of item used for computing the Cronbach coefficients. The second one (Alpha.Max) refers to the maximums of the Cronbach coefficients calculated at each step of the procedure, that is

. The third column (Removed.Item)

reports the name of the item removed at each step. 2. The plot of the corresponding CM curve created using the columns N.Item and Alpha.Max of the data.frame described above. With regard to the cain dataset included in the package, we obtain the outputs reported in Table 1 and Figure 1, using the following code: R> alpha.curve(cain) N.Item 2 3 4 5 6 7 8 9 10 11 12

Alpha.Max 0.67 0.7 0.74 0.75 0.77 0.78 0.8 0.81 0.82 0.83 0.83

Removed.Item I20 I16 I13 I14 I9 I7 I25 I11 I23 I22 --

Tab. 1: The output of CMC package for the cain dataset

Fig. 1: CM curve for the cain dataset of CMC package.

References: [1] Mesbah, M. Statistical Quality of Life. In N. Balakrishnan (ed.), Method and Applications of Statistics in the Life and Health Sciences, Wiley, 2010; 839-864. [2] Hardouin, J.; Mesbah, M. The SAS Macro-Program AnaQol to Estimate the Parameters of Item Responses Theory Models. Communications in Statistics - Simulation and Computation, 2007, 36, 437-453. [3] Fischer, G.H.; Molenaar, I.W. (Eds.) Rasch Models: Foundations, Recent Developments and Applications, New York, Springer-Verlag, 1994. [4] Lord, F.; Novick, M. Statistical Theories of Mental Test Scores. Addison-Wesley Publishing Company, 1968. [5] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org, 2010.