Reliability Generalization as a Seal of Quality of ... - SAGE Journals

0 downloads 0 Views 421KB Size Report
Reliability Generalization as a Seal of Quality of Substantive. Meta-Analyses: The Case of the VIA Inventory of. Strengths (VIA-IS) and. Their Relationships to.
Article

Reliability Generalization as a Seal of Quality of Substantive Meta-Analyses: The Case of the VIA Inventory of Strengths (VIA-IS) and Their Relationships to Life Satisfaction

Psychological Reports 0(0) 1–22 ! The Author(s) 2018 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0033294118779198 journals.sagepub.com/home/prx

Mercedes Ovejero Bruna Department of Biometry, SERMES CRO, Madrid, Spain

Andreea C. Brabete Faculty of Nursing, Universite´ de Montre´al, Quebec, Canada

Jesu´s M. Alvarado Izquierdo Department of Psychobiology & Behavioral Sciences Methods/ Biofunctional Studies Institute, Universidad Complutense de Madrid, Spain

Abstract Reliable test scores are essential to interpret the results obtained in statistical analyses correctly. In this study, we used the Values in Action Inventory of Strengths (VIA-IS) as an example of a widely applied assessment instrument to analyze its metric quality in what is known as reliability generalization (RG). In addition, we conducted a meta-analysis of the correlations between character strengths and life satisfaction to examine the potential relationship between the reliability of test scores and the intensity of these correlations. The overall variability of alpha coefficients supports the argument that reliability is sample dependent. Indeed, there were statistically significant mean reliability differences for scores across the 24 scales, with the highest level of reliability observed for Creativity and the lowest for scores on Self-regulation. Significant moderators such as the standard deviation of the scores and the sample type contribute to understand the high variability observed in the reliability estimation. The second meta-analysis showed that Zest, Hope, Gratitude, Curiosity, and Love were the character strengths that were highly Corresponding Author: Mercedes Ovejero Bruna, Department of Biometry, SERMES CRO, Rufino Gonza´lez 14, Escalera 1a 2º Derecha, Madrid 28037, Spain. Email: [email protected]

2

Psychological Reports 0(0)

related to life satisfaction, while Modesty and Prudence were less related to life satisfaction. Furthermore, the high heterogeneity between samples might be an indicator of the relationship between the variability of reliability of character strengths’ scores and the intensity of their correlations with life satisfaction. Those character strengths with high-potential RG are related or unrelated to life satisfaction, whereas character strengths with less-potential RG showed unstable correlation patterns. The results of both studies point out the role of the relationship between the reliability of test scores and substantive studies, such as Pearson’s correlations meta-analysis. Keywords Character strengths, VIA-IS, life satisfaction, meta-analysis, reliability generalization, Pearson’s correlation

Introduction ‘‘What you measure affects what you do. If you don’t measure the right thing, you don’t do the right thing.’’ (Joseph Stiglitz)

Psychological tests are the main tools used by researchers and clinical professionals to quantify the level of an individual’s psychological construct. Although both validity and reliability are psychometrical concepts that help to assess the quality of a psychological instrument (Kimberlin & Winterstein, 2008), the importance of the reliability was often overlooked (Vacha-Haase & Thompson, 2011). Reliability refers to the extent to which a test or any measuring procedure yields the same result on repeated trials (Carmines & Zeller, 1979). It is considered to be a necessary but not sufficient condition for validity (Sullivan, 2011; Warne, 2008). There are important reasons why reliability should be reported in each study. For instance, reliable instrument scores are essential to interpret the results obtained by a statistical analysis correctly. On the contrary, the unreliability of the test scores would question the use of General Linear Model analysis (e.g., regression, analysis of variance, etc.; Vacha-Haase & Thompson, 2011). In other words, unreliable scores may call into question the validity of the statistical conclusion obtained. Therefore, the first step in any quantitative research study should be the assessment of the reliability of the instrument scores (Sullivan, 2011; Vacha-Haase & Thompson, 2011; Wilkinson & APA Task Force, 1999). Considering the significance and the possible negative consequences of the unreliability of the test scores, Vacha-Haase (1998) proposed the reliability generalization (RG) meta-analysis. The RG meta-analysis was based on the validity generalization meta-analytic method, developed by Schmidt and Hunter (1977).

Ovejero Bruna et al.

3

Through systematical exploration of the different results of the reliabilities of any specific test, an RG meta-analysis contributes to a better comprehension of the factors involved in the variability of the reliability and what role they play in the study results. Moreover, this type of study can be very helpful to applied researchers, test administrators, and decision-making groups of people and individuals (Vacha-Haase, Henson, & Caruso, 2002) since RG studies would allow (a) to know to what extent the accuracy of a measure is generalizable across samples and studies, (b) to establish the sources of variability (methodological vs. substantive sources) that may affect the reliability estimation, and (c) to define the potential generalizability of reliability by taking into account different sources of error (measurement error vs. estimation error) that might be confounded when interpreting a substantive result (Vacha-Haase, 1998). It is appropriate to apply an RG only if the test under study was applied widely and if there is a reasonable number of empirical studies that estimated the reliability of the scores (Henson & Thompson, 2002). In this current study, we considered the Values in Action Inventory of Strengths (VIA-IS), a measure based on character strengths (Peterson & Seligman, 2004), since these requisites are accomplished. It is considered that a concrete application of the VIA-IS (or any test) to a specific sample is not enough to establish psychometric properties and the quality of the measure, since the results are related to scores on a particular application, which may vary in a second application even when it is applied to the same sample (Botella, Suero, & Gambara, 2010).

The VIA-IS The model of strengths and virtues known as the VIA Classification of Character Strengths and Virtues was designed after reviewing several documents related to historic, literature, moral, and religious traditions (Dahlsgaard, Peterson, & Seligman, 2005; Peterson, 2006; Peterson & Seligman, 2004). As a result, Peterson and Seligman (2004) proposed 6 cross-cultural virtues and 24 character strengths (see Supplementary Table 1 for a description of each character strength). Character strengths are defined as positively valued trait-like individual differences with demonstrable generality across different situations and stability across time (Peterson & Seligman, 2004). They are fulfilling; intrinsically valuable, in an ethical sense (gifts, skills, aptitudes, and expertise can be squandered, but character strengths and virtues cannot). They are nonrivalrous as well as not the opposite of a desirable trait (a counterexample is steadfast and flexible, which are opposites but are both commonly seen as desirable). In addition, character strengths are trait-like (habitual patterns that are relatively stable over time); personified (at least in the popular imagination) by people made famous through history; yet, they are not a combination of the other character strengths. Finally, they are nurtured by societal norms and institutions which explains why they may be absent in some individuals

4

Psychological Reports 0(0)

(Peterson & Seligman, 2004). Character strengths were thus conceptualized as more personal prosocial trait variables compared with other positive traits such as talents (McGrath, 2016). This elaboration served as the foundation for the creation of the VIA-IS. This self-assessment instrument is composed of 240 items that measures behaviors that are representative of different character strengths (Peterson & Park, 2006, 2009; Peterson, Park, & Seligman, 2005; VIA Institute, 2012). There are 10 items for each of the 24 strengths in the VIA classification, and a Likert 5-point scale (1: Not at all like me to 5: Very much like me) is used to rate them. The VIA-IS was initially created as a universal measure of what Peterson and Seligman (2004) called the good character. The inclusion of cross-cultural character strengths allowed the translation of the instrument in several languages such as Spanish, Croatian, Chinese, Dutch, Israeli, German, and Hindi (Azan˜edo, Ferna´ndez-Abascal, & Barraca, 2014; Brdar & Kashdan, 2010; Duan et al., 2011; Jo´nsdo´ttir, 2010; Littman-Ovadia, & Lavi, 2012a; Ruch et al., 2010; Singh & Choubisa, 2009, 2010). Its large application to different populations and settings in psychological research as well as in different countries supports the objective of examining whether its psychometric properties, and in particular the scores’ reliability, can be generalized across studies that have used this inventory. The study of character strengths occupies a central role in Positive Psychology research because the good character contributes to variables such as pleasure, flow, well-being, life satisfaction, and other positive experiences (Niemiec, 2013; Park & Peterson, 2009; Proyer, Gander, Wellenzohn, & Ruch, 2013). Regarding the relationship between character strengths and life satisfaction, Hope, Zest, Gratitude, Curiosity, and Love have a consistent relationship with character strengths (Brdar & Kashdan, 2010; Ruch, Huber, Beermann, & Proyer, 2007; Shimai, Otake, Park, Peterson, & Seligman, 2006). In contrast, the associations between life satisfaction and Modesty, Creativity, Appreciation of beauty and excellence, Judgment, and Love of learning (Park, Peterson, & Seligman, 2004a, 2004b) are lower or even nonsignificant. However, the substantive interpretation of the correlations between variables might be affected if there is a high heterogeneity of the reliability estimation (Vacha-Haase, 1998). In other words, the variability in the reliability of the test scores alters the probability of Type II error (decreases the statistical power) and directly weakens the correlations.

The present study The aim of the present study is twofold. First, it attempts to investigate the reliability of scores obtained by VIA-IS applied in a diverse set of contexts. Second, after examining the heterogeneity of the estimation of the reliability of scores (RG), we aim to determine how this influences the relationship between

Ovejero Bruna et al.

5

character strengths and life satisfaction. Therefore, this article aims to illustrate the need to perform an RG study before performing a meta-analytical study, since different sources of error may be confounded when interpreting the correlation patterns (measurement error vs. estimation error). For example, the measurement error is associated with a lack of precision of the instrument, and this might increase the estimation errors of the correlations between two variables (Vacha-Haase, 1998). To achieve these aims, we present two studies: the RG (Study I) and the correlations meta-analyses (Study II).

Study I: RG meta-analysis Method Literature search. We retrieved articles from PsychINFO, PsicARTICLES, Psicodoc, PubMed, and Psychological Abstracts databases. The literature search encompassed articles published between 2004 (January) and 2016 (December). The articles were searched for by using the following two groups search terms and keywords: Group 1: Values In Action Inventory of Strengths; VIA-IS; Character strengths; VIA; Group 2: Psychometric properties, reliability, internal consistency, alpha, Cronbach’s alpha, coefficient alpha. The truncation symbol was added to the most basic word stem for each keyword to ensure all associated terms were included in the search. To prevent the file drawer problem related to missing values, we contacted the VIA Institute on Character and seven investigators in the field, two of which reported information about missing values that was then included in the current study. The criteria to contact these authors were to have at least one publication that met the inclusion criteria (and none of the exclusion criteria), but all data related to character strengths’ reliability were not reported in the published article. The procedure was to write an e-mail explaining the aim of our research and to ask for unpublished data. E-mail content was the same in all cases. All articles were reviewed by three referees, and discrepancies were resolved through discussion (see Supplementary Table 2 for a description of the inclusion and exclusion criteria, and Supplementary Table 3 for the studies included in the current meta-analysis). Those discrepancies were mostly related to the number of items in each scale and to reliabilities not reported in the studies selected. Interrater reliability for Study I was .92. Flow diagram for Study I is reported in Figure 1. Data analysis. To carry out the RG study, the Cronbach’s alpha of each character strength was collected from each sample. The pooled estimate of the internal consistency of each trait was computed using Bonett’s transformation (Bonett, 2010) considering the inverse of its sample variance (Sa´nchez-Meca & Lo´pezPina, 2008) and assuming a model of random effects. Q test was used to

6

Psychological Reports 0(0)

Figure 1. Flow chart for Study I.

determine whether there were any significant violations of homogeneity in the effect size distributions. Since Q test has a low power because of the small number of the studies (Higgins & Thompson, 2002), I2 was also calculated. A categorization of values for I2 would tentatively assign adjectives of low, moderate, and high to I2 values of 25%, 50%, and 75% (Huedo-Medina, Sa´nchezMeca, Marı´ n-Martı´ nez, & Botella, 2006). Trim-and-fill procedures (Duval, 2005; Sterne et al., 2011) were used to account for a potential publication bias (see Supplementary Table 5 for more details). This method can be used to estimate the number of studies missing from a meta-analysis due to the suppression of the most extreme results on one side of the funnel plot. The effect of the moderating variables (standard deviation, test version, and sample type) on the variability of the reliability estimates was evaluated by means of analyses of variance for categorical variables and regression model for the continuous variable, assuming a model of mixed effects. The data were analyzed using R software (R Core Development Team, 2018) and metafor package (Viechtbauer, 2010).

Ovejero Bruna et al.

7

Results Estimate of average reliability. After applying the inclusion and exclusion criteria, we found 24 samples for the RG meta-analysis. Supplementary Table 4 presents the studies included and the sample characteristics. Table 1 shows the main summary statistics for the coefficients alpha of each VIA-IS scale. For the transformed coefficients, the (weighted) mean coefficient alpha varies from .71 (Self-regulation) to .86 (Creativity). Coefficients alpha for each character strength showed a statistically significant heterogeneity, ranging from 87.91% (Social intelligence) to 98.51% (Spirituality). Almost all I2 values exceed 90%, suggesting that there are several moderator variables that might play a role in reliability variability. Moderator analyses. Characteristics such as the standard deviation of VIA-IS scores, the test version (English vs. translated versions of the inventory), and the sample type (college-general population) were analyzed to explain the variability of the reliability. Detailed moderator analyses are presented as supplementary material (see Supplementary Tables 6 to 8). As it can be noted in Table 2, the variability of the test scores (character strengths’ standard deviation) affected reliability estimates. The highest explained variance proportion of all of the moderator variables is tested here. In those strengths in which sample type and test version are significant moderators, the explained variance is lower.

Study II: Applied meta-analysis Method Literature search. To identify studies for the correlational meta-analysis, a literature search in the electronic databases such as PsycINFO, PsicARTICLES, Psicodoc, PubMed, and Psychological Abstracts was carried out to find empirical studies that computed Pearson’s correlations between character strengths’ scores and Satisfaction With Life Scale (Diener, Emmons, Larsen, & Griffin, 1985). The following keywords and search terms were combined in the electronic search for the period between January 2002 and December 2016: Group 1: Values In Action Inventory of Strengths; VIA-IS; Character strengths; VIA, Group 2: Satisfaction with life; Life Satisfaction; Satisfaction With Life Scale (SWLS). Following the same procedure as Study I, and to prevent the file drawer problem associated to the presence of unreported correlations, we contacted the VIA Institute on Character and investigators in the field using the same criteria as Study I, but data to report were correlations between character strengths and SWLS. Three of the authors then sent information about those correlations computed that were not included in the article published.

8

Psychological Reports 0(0)

Table 1. Pooled estimates of character strengths’ scores reliability coefficients. Weighted analyses N Creativity Curiosity Judgment Love of learning Perspective Bravery Persistence Honesty Zest Love Kindness Social intelligence Teamwork Fairness Leadership Forgiveness Modesty Prudence Self-regulation Appreciation of beauty Gratitude Hope Humor Spirituality

23 23 23 23 23 23 23 24 24 23 23 23 23 23 22 24 24 23 22 24 24 24 24 24

 .86 .80 .81 .81 .78 .75 .84 .76 .78 .75 .75 .76 .76 .78 .77 .81 .75 .74 .71 .80 .80 .81 .85 .84

Homogeneity

95% CI .78 .75 .71 .75 .72 .64 .71 .56 .66 .68 .67 .70 .68 .73 .65 .67 .63 .65 .59 .69 .72 .71 .75 .69

.91 .85 .88 .85 .83 .83 .81 .87 .86 .81 .82 .81 .81 .83 .85 .89 .84 .81 .79 .87 .85 .87 .91 .92

Q

I2 (%)

441.50 208.70 1126.67 166.29 117.46 416.53 1380.92 2855.64 1027.31 306.99 327.38 231.37 178.76 133.99 200.28 1047.78 496.25 138.06 304.71 452.67 363.25 575.07 755.85 1254.41

96.78 89.58 96.66 90.85 91.04 95.29 98.14 98.33 96.58 90.90 92.77 87.91 91.32 88.53 96.42 97.99 96.08 93.19 94.35 96.70 93.78 96.37 97.38 98.51

Note: Alpha back-transformed from Bonett’s procedure. All Q indices were significant (p < .001). N: samples included.

All articles were analyzed by three independent reviewers, and discrepancies were resolved through discussion (see Supplementary Table 3 for more details). Interrater reliability was .94 (see Supplementary Table 9 for a description of the inclusion and exclusion criteria and Supplementary Table 10 for the studies included in Study II). Figure 2 shows the flow diagram for Study II. Statistical analyses. Pearson’s correlation coefficients (r) were extracted from the included studies. Following Hafdahl and Williams’ (2009) recommendations, correlation coefficients were transformed using Fisher’s Zr. A random effects

Ovejero Bruna et al.

9

Table 2. Moderator analyses model summary. SD Creativity Curiosity Judgment Love of learning Bravery Zest Love Fairness Forgiveness Modesty Prudence Self-regulation Appreciation of beauty Gratitude Hope Humor Spirituality

1.26 2.03** 1.35*** – – 2.01** 1.64* 1.63* 3.00** 1.79** 2.29*** 1.33* – 1.13 0.50 0.09 1.61*

Test version

Sample type

Intercept

R2adj

.20 .11* – – .36* .14 .10 – – – – – .26* .18* .13 – –

.26* – .17* .15** – .08 – – – .26*** – .17** – .08 .24** .40*** –

0.98 0.51 0.89*** 1.54*** 0.44 0.32 0.14 0.72 0.08 0.27 0.09 0.36 1.60*** 0.89** 1.25** 1.72* 0.50

.60 .79 .70 .30 .44 .78 .73 .33 .44 .78 .77 .85 .19 .67 .54 .52 .18

Note. Only character strengths with significant moderators are included. SD: standard deviation. Predictor’s raw weight is included. R2adj ¼ proportion of variance explained; Test version: English (1), translated (0). Sample type: College (0), general population (1). *p