A hierarchical Bayesian modeling approach

4 downloads 0 Views 1MB Size Report
How does aging affect recognition-based inference? A hierarchical. Bayesian modeling approach☆. Sebastian S. Horn a,⁎,1, Thorsten Pachur a,1, Rui Mata a,b ...
Acta Psychologica 154 (2015) 77–85

Contents lists available at ScienceDirect

Acta Psychologica journal homepage: www.elsevier.com/ locate/actpsy

How does aging affect recognition-based inference? A hierarchical Bayesian modeling approach☆ Sebastian S. Horn a,⁎,1, Thorsten Pachur a,1, Rui Mata a,b,1 a b

Max Planck Institute for Human Development, Berlin, Germany University of Basel, Switzerland

a r t i c l e

i n f o

Article history: Received 16 June 2014 Received in revised form 16 October 2014 Accepted 3 November 2014 Available online xxxx PsycINFO classification: 2340 2860 Keywords: Aging Recognition memory Heuristics Strategy selection Hierarchical Bayesian modeling

a b s t r a c t The recognition heuristic (RH) is a simple strategy for probabilistic inference according to which recognized objects are judged to score higher on a criterion than unrecognized objects. In this article, a hierarchical Bayesian extension of the multinomial r-model is applied to measure use of the RH on the individual participant level and to re-evaluate differences between younger and older adults' strategy reliance across environments. Further, it is explored how individual r-model parameters relate to alternative measures of the use of recognition and other knowledge, such as adherence rates and indices from signal-detection theory (SDT). Both younger and older adults used the RH substantially more often in an environment with high than low recognition validity, reflecting adaptivity in strategy use across environments. In extension of previous analyses (based on adherence rates), hierarchical modeling revealed that in an environment with low recognition validity, (a) older adults had a stronger tendency than younger adults to rely on the RH and (b) variability in RH use between individuals was larger than in an environment with high recognition validity; variability did not differ between age groups. Further, the r-model parameters correlated moderately with an SDT measure expressing how well people can discriminate cases where the RH leads to a correct vs. incorrect inference; this suggests that the r-model and the SDT measures may offer complementary insights into the use of recognition in decision making. In conclusion, younger and older adults are largely adaptive in their application of the RH, but cognitive aging may be associated with an increased tendency to rely on this strategy. © 2014 Elsevier B.V. All rights reserved.

1. Introduction A central tenet of Herbert Simon's (1956) time-honored concept of bounded rationality is that the use of simple mental tools—if attuned to the environmental structure—can often lead to surprisingly good decisions. Consider the following question: Which city is has more residents, Nashville or Tulsa? As a prime example of a frugal inference strategy, the recognition heuristic (RH; Goldstein & Gigerenzer, 2002) assumes that individuals base their judgments in this sort of task solely on whether or not the options are recognized (and ignore any further knowledge). That is, the RH predicts that a recognized object (e.g., Nashville) has a higher value on the criterion (city population) than an unrecognized one (e.g., Tulsa). The success of this simple strategy depends on its ecological ☆ This research was supported by a fellowship from the Max Planck Society to Sebastian Horn. ⁎ Corresponding author at: Center for Adaptive Rationality (ARC), Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany. Tel.: +49 30 82406 202. E-mail address: [email protected] (S.S. Horn). 1 Sebastian S. Horn, Thorsten Pachur, and Rui Mata, Center for Adaptive Rationality (ARC), Max Planck Institute for Human Development, Berlin, Germany. Rui Mata is now at the University of Basel, Institute for Psychology, Department for Cognitive and Decision Sciences, Basel, Switzerland.

http://dx.doi.org/10.1016/j.actpsy.2014.11.001 0001-6918/© 2014 Elsevier B.V. All rights reserved.

rationality: it exploits the phenomenon that known objects differ from unknown ones in systematic ways in many natural environments (e.g., larger cities, more successful athletes, and higher mountains tend to be recognized more often; Pachur, Todd, Gigerenzer, Schooler, & Goldstein, 2011, 2012). One key issue surrounding research on the RH is to what extent it is used adaptively: When do people use the RH and how do they adjust their reliance across different situations (Gigerenzer & Goldstein, 2011; Pachur et al., 2011)? Research indicates that younger adults are largely sensitive to characteristics of the environment, such as the relation between recognition of an object and the criterion (the recognition validity),2 time pressure, or available cognitive resources (e.g., Pachur & Hertwig, 2006; Pohl, Erdfelder, Hilbig, Liebke, & Stahlberg, 2013). However, there is also considerable diversity across people in reliance on the RH, suggesting that individual-level variables may moderate its use 2 The predictive power of recognition is usually quantified in terms of the recognition validity α (Goldstein & Gigerenzer, 2002). For a given environment, it is calculated as α = CRU / (CRU + IRU), where CRU and IRU are frequencies of correct and incorrect inferences, respectively, that the recognition heuristic would predict across all trials in which one of the objects is recognized (RU cases). Knowledge validity β is calculated as β = CRR / (CRR + IRR), where CRR and IRR are correct and incorrect inferences in cases in which both objects are recognized (RR cases).

78

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

(Marewski, Gaissmaier, Schooler, Goldstein, & Gigerenzer, 2010; Pachur, Bröder, & Marewski, 2008; cf. Hilbig & Pohl, 2008). Here, we investigate one potential source of individual differences in the use of RH: cognitive aging. Aging is associated with significant changes on various psychological dimensions that may impact the use of the RH, including decrements in fluid cognitive abilities and increments in knowledge and experience (e.g., Baltes, Staudinger, & Lindenberger, 1999). Pachur, Mata, and Schooler (2009) examined this issue by asking younger and older adults to make inferences regarding pairs of cities (“Which city has more inhabitants?”) and infectious diseases (“Which disease has a higher incidence rate?”). Referring to environment or task adaptivity as people's sensitivity to differences in recognition validity between domains (i.e., cities vs. diseases), they found that participants chose recognized objects more frequently over unrecognized ones in a domain with high recognition validity (cities) than in a domain with low recognition validity (diseases). The proportion of choices of the recognized object—the adherence rate—did not differ between younger and older adults: Both age groups showed a similarly lower adherence rate in the environment with low recognition validity than in the environment with high recognition validity. This result may seem surprising: Other research has found that older adults generally tend to rely more on simple strategies (e.g., the take-the-best heuristic; Gigerenzer & Goldstein, 1996) than younger adults, even in environments in which another strategy may be more appropriate (e.g., Mata, Schooler, & Rieskamp, 2007). Importantly, however, Hilbig, Erdfelder, and Pohl (2010) pointed out that adherence rates might be an inappropriate measure of people's reliance on the RH and proposed a multinomial processing tree (MPT) model—the r-model—as a more valid measurement approach to recognition-based inference. Our goal in this article is to re-evaluate possible age differences as well as individual variability in the use of the RH in the Pachur et al. (2009) study using the r-model. We extend on previous applications of the rmodel by implementing a hierarchical approach, which is particularly suitable for studying individual differences. A secondary goal is to explore how parameters of the r-model relate to measures of people's use of recognition and further knowledge derived from signal-detection theory (SDT; e.g., Macmillan & Creelman, 2005): Pachur et al. proposed a discriminability index to measure how well people can distinguish between cases where recognition leads to correct versus incorrect inference. On this SDT index, older adults showed lower discriminability than younger adults when inferring which of two diseases was more frequent in Germany.3 It is currently unclear to what extent SDT measures of recognition-based inference and the parameters of the r-model offer alternative or complementary perspectives on how younger and older adults use recognition to make inferences about the world. Next, we describe in greater detail the multinomial approach to measuring use of the RH, followed by the hierarchical extension of the r-model. 1.1. The r-model: a multinomial processing tree approach to measuring RH use Many investigations of the RH—including Pachur et al.'s (2009) aging study—have relied on adherence rates as a measure of people's use of this strategy (e.g., Goldstein & Gigerenzer, 2002; McCloy, Beaman, Frosch, & Goddard, 2010). Yet, as Hilbig, Erdfelder, et al. (2010) have emphasized, adherence rates may lead to erroneous conclusions regarding the use of the RH: In many natural environments, not only recognition but also other knowledge is correlated with the criterion. The choice of a recognized object may therefore also be due to 3

Note that discriminability between cases where the RH leads to a correct vs. incorrect inference (including the ability to suspend the RH on specific trials) implies the use of further information, beyond mere recognition. Discriminability has been assumed to correlate with fluid abilities (Pachur et al., 2009); moreover, RH-inconsistent decisions take longer than RH-consistent decisions (Pachur & Hertwig, 2006) and are associated with evaluative frontal brain activation (Volz et al., 2006). These findings suggest that discriminability may incur cognitive costs.

reliance on other knowledge, which according to the RH is ignored. Consequently, “the adherence or accordance rate … is not a valid measure of use of the RH versus incorporation of further knowledge, because recognition and knowledge are necessarily confounded” (p. 123). To address this issue, Hilbig and colleagues introduced the r-model, an MPT model, to disentangle pure reliance on the RH from the use of further information (for a critical discussion of the use of this model, see Pachur, 2011). The strength of the MPT modeling framework is that it provides a foundation for improved measurement of cognitive components underlying a task (cf. Jacoby, 1991) and a well-developed statistical machinery for model comparison and goodness-of-fit tests (for overviews, see Batchelder & Riefer, 1999; Erdfelder et al., 2009). As is generally the case in MPT models (Batchelder & Riefer, 1999), the r-model accounts for the frequency of responses in different outcome categories through combinations of a set of latent parameters, assuming that the data follow a multinomial distribution. The parameters are estimated simultaneously in such a way that a loss function (e.g., G2) between the observed and predicted categorical frequencies is minimized. The rmodel considers three possible cases in a comparative judgment task, represented by the J = 3 separate trees in Fig. 1: In the upper tree, a decision maker recognizes both objects (RR case) and therefore has to recruit further information beyond recognition, leading to a correct inference with probability b and to an incorrect inference with complementary probability 1 − b. Parameter b thus indexes the validity of the decision maker's further knowledge (comparable to knowledge validity β in the original RH formulation, see footnote 2; a visualization of the relationship between the recognition and knowledge validities α and β and the estimates for the a and b parameters for the present data are provided in the online supplemental materials). The second tree represents the situation in which one of the two objects is recognized (RU case) and the RH can thus be applied. With probability r, the decision maker uses the RH and chooses the recognized item. This leads to a correct inference with probability a and to an incorrect inference with probability 1 − a. Parameter a is thus conceptually equivalent to the recognition validity α (footnote 2) and reflects the strength of association between recognition and the criterion variable (e.g., city size). Importantly, with complementary probability 1 − r, the RH is not applied and the inference is based on further information beyond recognition (or any other strategy). This leads to a correct inference with probability b. In this case, the recognized object is chosen with probability a and the unrecognized object is chosen with probability 1 − a. With probability 1 − b, the inference is incorrect. In this case, the unrecognized item is chosen with probability a and the recognized item is chosen with probability 1 − a. The model thus acknowledges that the observed choice of a recognized object (outcome categories C21 and C22) may result from the use of the RH (upper two branches of the RU tree) or, alternatively, from the use of further knowledge or another strategy (lower branches in the RU tree). In the bottom tree, neither of the objects is recognized (UU case) and the decision maker has to guess, leading to a correct inference with probability g and to an incorrect inference with probability 1 − g.4 Several studies have shown that estimated reliance on the RH is lower when using the r-model than when using adherence rates (Hilbig, Erdfelder, et al., 2010; Hilbig & Richter, 2011). In fact, the two measures sometimes even lead to different conclusions: Using the r-model, Pohl et al. (2013) found that cognitive depletion led to greater reliance on the RH. In contrast, they found no reliable difference in the adherence rates of depleted versus nondepleted participants. Such results call into question the use of adherence rates as a measure of RH use. For this 4 In principle, unique values for a, b, and r could be obtained with a single-tree model for RU trials only; however, this approach would not provide any degrees of freedom for testing goodness of fit. By considering the other possible cases in the comparative judgment task and by constraining b to be equal across the RR and RU cases (i.e., by assuming that the probability of valid knowledge is the same across these situations, bRR = bRU; for further discussion, see Castela, Kellen, Erdfelder, & Hilbig, 2014), the model is testable with a χ2 statistic with df = 1 and comprises a set of four free parameters θ = (a, b, g, r) and eight outcome categories, five of which are independent.

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

79

Fig. 1. Illustration of the multinomial r-model (Hilbig, Erdfelder, et al., 2010). Observable events are represented by rectangles. In each of the J = 3 model trees, possible responses are assigned to one of K mutually exclusive categories (Cjk) distinguishing between choice accuracy (correct/false) and choice of recognized (+) versus unrecognized (−) objects. r = probability of applying the recognition heuristic (RH); a = recognition validity; b = validity of further knowledge; g = probability of a correct guess.

reason, we applied the r-model to re-evaluate Pachur et al.'s (2009) finding of no age differences in the use of the RH, which was based on adherence rates. Importantly, we went beyond previous applications of the rmodel by using a hierarchical Bayesian approach, which is described next.

1.2. A hierarchical Bayesian implementation of the r-model In MPT modeling (Batchelder & Riefer, 1999), responses in the different outcome categories are often aggregated across items and participants (e.g., Pohl et al., 2013; Schnitzspahn, Horn, Bayen, & Kliegel, 2012). It has frequently been noted, however, that data pooling ignores potential heterogeneity between individuals and can lead to severe problems when drawing conclusions with nonlinear models (e.g., Estes, 1956; Rouder, Morey, & Pratte, in press; Siegler, 1987). Simulations have indicated that the aggregation of data can yield distorted MPT parameter estimates: standard errors can be underestimated, leading to hypothesis tests (including tests of parameter equality or goodness of fit) with inflated Type I error rates (see Klauer, 2006, 2010, for further details). Further, as noted by Smith and Batchelder (2010, p. 179), two groups may differ only in their variances, “but … inappropriate aggregate analysis may lead the researcher to interpret this difference in terms of a mean difference in a parameter between groups.” An alternative approach is to treat each participant as unique and to fit separate models to each individual. This approach can be problematic, however, when the number of observations is small; in this case, the data likely yield unreliable parameter estimates that are prone to measurement error5 (e.g., Chechile, 2009; Cohen, Sanborn, & Shiffrin, 2008).

5 The r-model can, in principle, be fitted separately to each individual participant (e.g., Hilbig & Richter, 2011). However, it has been shown that reliable parameter estimation requires sufficiently large numbers of trials on the individual level (e.g., Hilbig, Erdfelder, et al., 2010 p. 126, estimated that approximately 500 trials are required for the r-model) to accurately recover the “true” population parameters, which is sometimes not feasible.

To address these issues, we adopted a hierarchical implementation of the r-model. Hierarchical approaches provide a theoretically grounded, principled compromise between the reliance on complete pooling versus individual-level fitting and are becoming increasingly popular in cognitive modeling (Lee & Wagenmakers, 2013; Lee & Webb, 2005; Rouder et al., in press; Vandekerckhove, Tuerlinckx, & Lee, 2011). Hierarchical models explicitly account for both the similarities and the differences between individuals by specifying a group-level hyperdistribution, which is estimated along with individual-level parameters; the estimates within a group mutually inform each other by “borrowing strength” from the group-level structure (Gelman, Carlin, Stern, & Rubin, 2004). Consequently, hierarchical cognitive modeling is a powerful tool for the study of individual differences that is particularly useful when several participants provide only a limited number of observations (Busemeyer & Diederich, 2010) or when populations are likely to be heterogeneous, as in developmental or clinical studies (Batchelder & Riefer, 2007). Importantly, hierarchical modeling allows us to explicitly test specific hypotheses about variability. In cognitive aging research, for instance, it has been argued that individual differences in performance increase with age (e.g., Rabbitt, 2011; for reviews, see Li & Lindenberger, 1999; Nelson & Dannefer, 1992). In this study, we also examined whether there are age-related increases in variability between individuals when using the RH. Bayesian graphical modeling (see Lee & Wagenmakers, 2013, for an overview) enables a relatively straightforward implementation of even complicated, hierarchical structures, due to recent advances in computational methodology; researchers can conveniently calculate and statistically evaluate indices of interest (e.g., effect sizes, differences in group means and in variances) in one step of analysis from the information provided in the posterior distribution, without the need for multiple testing (Kruschke, 2010). Moreover, the method circumvents several conceptual problems that have plagued frequentist statistical inference (Wagenmakers, Lee, Lodewyckx, & Iverson, 2008).

80

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

Fig. 2. Graphical model for the BUGS implementation of the hierarchical Beta-MPT approach. The graph structure represents dependencies (i.e., probabilistic and deterministic relations) among data and latent model parameters (see Lee & Newell, 2011, or Lee & Wagenmakers, 2013, for further details). Following conventional notation, observed variables are symbolized by shaded nodes, unobserved variables by unshaded nodes, continuous variables by circular nodes, and discrete variables by square nodes. The graph structure represents the multinomial r-model (for one experimental condition), with plates indicating replications over the S = 4 free model parameters and over I individuals who respond to nij trials (where j denotes the model tree), that is, to ni1 trials in which both objects are recognized (RR cases), to ni2 RU trials, and to ni3 UU trials. For each individual i, the response data xij (a vector with a participant's responses across categories in a tree, with length K = 2 for trees 1 and 3 and length K = 4 for tree 2) follow a multinomial distribution with category probabilities P(Cjk|θi) (where k denotes the response category; see also Fig. 1). θi1 = bi; θi2 = ri; θi3 = ai; θi4 = gi.

In the present analysis, we applied Smith and Batchelder's (2010) Beta-MPT approach (Fig. 2; for related hierarchical developments and applications for the class of MPT models, see Kellen, Singmann, & Klauer, 2014; Klauer, 2006, 2010; Matzke, Dolan, Batchelder, & Wagenmakers, 2013). In essence, the idea is that individual-level estimates for each parameter stem from independent group-level Beta distributions,6 whose variance reflects the variability between individuals. Beta-MPTs have several useful properties: The hierarchical hyperdistribution implements the plausible assumption that traits often vary in a continuous, unimodal fashion across individuals, and it naturally restricts parameters within the [0,1] interval, such that they are readily interpretable as probabilities. Finally, the method has fared well in empirical studies (see Arnold, Bayen, Kuhlmann, & Vaterrodt, 2013; Smith & Batchelder, 2010).

(discriminability parameter d′).7 We examine how and how strongly these SDT indices are related to the parameters provided by the r-model. To summarize, we use a hierarchical implementation of the r-model to re-examine conclusions that were based on adherence rates (which Hilbig, Erdfelder, et al., 2010, identified as problematic measures of RH use). Notably, the hierarchical approach also permits the direct investigation of age differences in variability of RH use and to explore the relationship between individual-level parameters of the r-model and other indices of reliance on recognition (adherence rates, and SDT indices).

2. Method 2.1. Participants and design

1.3. SDT measures for recognition-based inference Estimating the r-model parameters on the individual level also allows us to explore their empirical relationship with the SDT indices proposed by Pachur et al. (2009; Pachur & Hertwig, 2006; for a related index, see Hilbig & Pohl, 2008). In the comparative judgment paradigm, the SDT approach can separate a decision maker's tendency to follow recognition (the bias parameter c) from their accuracy in discriminating cases in which recognition leads to correct versus incorrect inference

6 Note that the lack of an a-priori correlational structure on the model parameters nevertheless allows for posterior estimates that are highly correlated. Therefore, approaches that do (Matzke et al., 2013) or do not (Smith & Batchelder, 2010) explicitly include correlations in the prior distributions can yield relatively similar results. We thank E. J. Wagenmakers for pointing this out.

We used the Beta-MPT r-model to reanalyze data from Pachur et al. (2009; Study 1). The objective of that study was to compare older and younger participants in their use of the RH across environments with either high (U.S. cities) or low (infectious diseases) recognition validities. Forty younger adults (M = 25 years; range 19–33; 24 females) and 40 older adults (M = 71 years; range 65–86; 15 females) participated. Half of the participants in each age group were assigned to the cities environment and the other half to the diseases 7 The SDT measures are based on the definition of Hit Rate as the proportion of cases in which people follow recognition when one object is recognized that scores higher on the criterion, thus leading to a correct inference (see Pachur et al., 2009, Appendix A, for details). Based on computer simulations, Hilbig (2010) concluded that these SDT indices, in contrast to the r-model parameters, are limited measures of the proportion of RH users in a population. Note, however, that d′ and c were not originally proposed for measuring the proportion of RH users, as this would require additional assumptions (e.g., how to classify someone as a “user” of the RH, based on cutoff values on these indices).

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

81

Table 1 Posterior parameter estimates of hierarchical beta distributions (group level) as a function of environment (cities, diseases) and age group (young, old). Parameter

r a b g

Age group

Young Old Young Old Young Old Young Old

Cities

Diseases

μ^

^ σ

^ α

^ β

μ^

^ σ

^ α

^ β

.85 [.79–.90] .85 [.79–.90] .91 [.90–.93] .92 [.89–.94] .76 [.73–.79] .78 [.75–.80] .50 [.47–.54] .53 [.50–.57]

.11 [.07–.16] .10 [.06–.15] .02 [.01–.04] .05 [.03–.07] .05 [.03–.08] .03 [.01–.06] .02 [.01–.04] .02 [.01–.03]

8.92 13.20 345.22 33.66 53.44 291.23 629.75 675.59

1.50 2.21 33.14 2.89 17.08 82.89 627.18 592.56

.30 [.22–.40] .41 [.32–.51] .61 [.59–.63] .59 [.57–.62] .67 [.64–.71] .67 [.64–.70] .46 [.41–.50] .49 [.45–.53]

.21 [.16–.25] .24 [.20–.28] .02 [.01–.03] .01 [.01–.02] .07 [.05–.10] .05 [.02–.08] .05 [.01–.11] .01 [.01–.03]

1.18 1.27 626.80 706.96 33.82 87.98 226.10 650.82

2.83 1.84 404.93 481.53 16.50 43.55 269.90 677.54

^ are the corresponding ^ are the estimated posterior mean and standard deviation of the Beta hyperdistribution; α ^ and β Note. Bayesian 95% confidence intervals are in brackets; μ^ and σ parameters that define the Beta hyperdistribution (see also Fig. 2); r = probability of applying the RH; a = recognition validity; b = validity of further knowledge; g = probability of a valid guess.

environment, leading to a 2 (age group) × 2 (environment) betweensubjects design. 2.2. Procedure and materials In a recognition task, participants indicated which of the objects (either cities or infectious diseases) they recognized. Stimuli were the names of 24 U.S. cities and 24 infectious diseases. In an inference task, participants judged either which of two cities had a larger population (cities environment) or which of two diseases had a higher annual incidence rate in Germany (diseases environment). Objects were exhaustively paired in the inference task, leading to 276 trials. The order of the recognition and inference tasks was counterbalanced across participants. For both younger and older adults, the average recognition validity α was higher for the cities environment (Ms = .90 and .92, respectively) than for the diseases environment (Ms = .62 and .60). 3. Results To implement the hierarchical r-model, we adapted the Beta-MPT model definitions by Smith and Batchelder (2010) and used the Markov Chain Monte Carlo (MCMC) methodology for posterior sampling, as implemented in WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000). Four submodels, with parameters free to vary between the two age groups and two environments, were fitted to the data. Samples from the posterior distribution were drawn from three independent chains of 150,000 iterations each (we discarded the first 50% as burn-in and used a thinning rate of 10, resulting in 3 × 7500 iterations available for further analysis). To conduct the contrasts of interest (i.e., between the two age groups and the two environments) on the S = 4 model parameters b, r, a, and g, we additionally sampled and recorded from the posterior the difference of the group-level means (μ1–μ2), the difference of standard deviations (σ1–σ2), and the corresponding effect sizes, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ^ Cohen's δ ¼ ðμ 1 –μ 2 Þ= σ 21 ðN 1 –1Þ þ σ 22 ðN2 –1Þ =ðN1 þ N2 –2Þ . The R measure (Gelman et al., 2004) for the parameters as well as visual in^ spection of the chains indicated acceptable convergence (i.e., Rsb1:1; further information about convergence and model fit is provided in the online supplement). To verify the assumed superiority of the r-model over a model relying on adherence rate, we implemented an “adherence model,” in which individual r parameters were set to equal each participant's adherence rates (see also Hilbig, Erdfelder, et al., 2010, for a similar test). We then compared this adherence model and the r-model in terms of the deviance information criterion (DIC; e.g., Gelman et al., 2004) to evaluate their relative performance within the Bayesian framework. The DIC is suitable for selecting among hierarchical Bayesian models estimated with MCMC sampling and quantifies the balance between goodness of fit and the effective number of parameters. The DIC can be regarded as a Bayesian alternative to

Akaike's information criterion; smaller DIC indicates better model performance. Imposing the restriction in the “adherence model” led to substantial increases in DICs in both age groups and environments (ΔDICyounger, cities = 58.96; ΔDIColder, cities = 76.54; ΔDICyounger, diseases = 518.51; ΔDIColder, diseases = 406.32), indicating the superiority of the r-model. The latter provided reliably different (and smaller) estimates for the proportion of cases in which participants apply the RH.8 In the following, we first focus on the group level to assess variability and age differences in RH use and then turn to an individual-level analysis to examine how the r-model parameters were related to alternative measures of reliance on recognition and further knowledge. The group^ , of the level estimates (posterior means, μ^ , and standard deviations, σ hierarchical Beta distributions; Smith & Batchelder, 2010) for each parameter and the corresponding credible intervals (i.e., Bayesian confidence intervals) are shown in Table 1, separately for the two environments and age groups. (For a comparison with model parameters obtained from aggregated frequencies using standard maximum likelihood estimation, see the online supplement.) 3.1. Individual variability in RH use As can be seen in Table 1, the credible intervals for the estimated ^ ) do not include zero for any of the four pastandard deviations ( σ rameters, supporting the hypothesis of heterogeneity among participants (i.e., σ N 0), particularly for the r parameter. This suggests individual differences in the use of the RH. Further comparison be^ r; diseases –σ ^ r; cities ) indicated tween the two environments ( Δσ r ¼ σ that variability in RH use was considerably higher in the diseases environment (in which recognition validity was low); this held for both younger adults, P ðΔσ r N0jDÞ≈:99, and older adults, P ðΔσ r N0jDÞ≈: 99 (Fig. 3a and b show the posterior distributions of these differences in ^ r; older ) ^ r; young –σ variability). The comparison between age groups (i.e., σ did not reveal any differences, suggesting that diversity in RH use was similarly pronounced in younger and older adults. These results contrast with the theoretical expectation of increased diversity in cognitive performance with aging (e.g., Li & Lindenberger, 1999; but see Salthouse, 2011). 3.2. Are there age differences in the use of the RH? Reflecting the impact of environmental structure on the use of the RH, the group-mean r parameter was substantially lower in the diseases environment, for both younger and older adults. However, did younger and older adults differ in terms of how they adapted their use of the RH 8 Tests on aggregated data (using maximum-likelihood estimation) also indicated consistent and significant increases in model misfit if the r parameter in a condition was fixed at the average adherence rate: Δχ2(1)younger, cities = 110.87; Δχ2(1)older, cities = 114.97; Δχ2(1)younger, diseases = 530.52; Δχ2(1)older, diseases = 402.37; all ps b .001.

82

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

a

b

c

Fig. 3. Histograms of posterior distributions; 22,500 draws of differences were sampled for each plot. (a, b) Comparisons of interindividual variability in RH use between environments, with b1% of the mass of the posterior below zero. (c) Difference between younger and older adults in RH use in the diseases environment, with about 5% of the mass of the posterior below zero.

between environments? To address this question, we tested for a possible Age × Environment interaction by modifying the r-model such that the r parameter in the diseases environment, for both age groups, was reparameterized (i.e., rdiseases = λ · rcities; where λ is a ratio parameter, ranging between 0 and 1, that represents the proportional reduction in RH use between environments; for further details of this method, see Knapp & Batchelder, 2004). Focusing on λ, we then applied the MCMC approach (as described above) to the reparameterized model. A comparison of the ratio parameters between age groups, λyoung and λolder, allowed us to examine the Age × Environment interaction. We found a medium effect size (Cohen's d = 0.42) for the difference between younger adults' ^ λ = 0.24) and older adults' λ (μ^ λ = 0.48; σ ^ λ = 0.26). Imλ (μ^ λ = 0.37; σ portantly, this was driven mainly by older adults' higher r parameter in the diseases environment. Further analyses of the posterior age difference in the average r parameter (Δμ r ¼ μ^ r; older –μ^ r; younger ) confirmed the tendency for older adults to apply the RH more often than younger adults in the diseases environment, in which recognition validity was low, P Δμ r N 0jD ≈:95 (Fig. 3c); this was not the case in the cities   environment, P Δμ r N 0jD ≈:49. Finally, as could be expected, the group-mean parameter a (reflecting recognition validity) was generally lower in the diseases environment than in the cities environment. The b parameter was also lower for diseases than for cities, suggesting that participants had better additional knowledge (beyond recognition) about the cities than about the diseases. There were no differences between the age groups in recognition or knowledge validities. The guessing parameter g was around .50 across environments and age groups. In sum, a hierarchical implementation of the r-model revealed differences between older and younger adults' reliance on the RH in an environment in which its frequent use was not adaptive. This pattern of results was not detected using adherence rates in Pachur et al. (2009). (Fig. 3c shows the distribution of the age difference; conventional MPT analyses on aggregated responses indicated even larger age effects; see the online supplement.)

3.3. How do the r-model parameters relate to adherence rates and SDT indices? We next examined individual-level estimates to explore the empirical relationship between the r-model parameters, the adherence rates, and SDT indices for recognition-based inference. The following insights emerged: First, the adherence rate (Fig. 4a) was larger than the r parameter for all individuals and may thus systematically overestimate “true”

use of the RH, as emphasized by Hilbig, Erdfelder, et al. (2010).9 Second, there was nevertheless a strong linear association between the r parameter and the adherence rate, ranging from .91 to .99. Interestingly, variability (across participants) was considerably larger for the r parameter (with SDs ranging from .10 to .28) than for the adherence rate (with SDs ranging from .04 to .18). What was the relationship between the r-model parameters and the two SDT indices d′ (discriminability) and c (bias) for people's use of recognition and further knowledge (Pachur et al., 2009)? As Fig. 4b shows, there was a positive association between d′ and the individual b parameters for the diseases environment and, less strongly, for the cities environment (for a similar analysis, see Hilbig & Pohl, 2008). Relative to the correlations between the r parameter and the adherence rate, the association between d′ and b was more moderate (ranging from .40 to .88), suggesting that the two measures capture, to some extent, different aspects of people's use of additional knowledge beyond recognition. For the bias index (c), we observed high negative correlations with the individual r parameters (see Fig. 4c); these two measures may thus capture pure reliance on recognition in similar ways: The more liberal people are in their reliance on recognition (independent of whether it leads to correct or incorrect inference), the more likely they are to use the RH (as captured by r). The discriminability index d′ and the individual r parameters were uncorrelated (rs b .30; all ps N .20; except for older adults in the diseases environment, for which r = –.44). 10 This highlights the complementary merits of the SDT and the r-model approaches: In situations in which it is desirable to have a pure measure of people's ability to evaluate the usefulness of the RH (see Pachur & Hertwig, 2006; Pachur et al., 2009; Volz et al., 2006), the d′ index may represent a useful addition to the rmodel (see also Hilbig & Pohl, 2008). We note, however, that the

9 Converging with the evidence from the group-level analysis, the individual r-model estimates of RH use indicate that most younger and older participants applied the RH most of the time in the environment with high recognition validity (cities), but not in the other (diseases) environment; variability in the use of RH was also considerably higher in the diseases environment than in the cities environment. 10 The lack of a (linear) relationship between parameters r and d′ may be partly due to the following inconsistency between the two measures: Always using the RH would lead to r = 1, implying both a hit rate and a false alarm rate of 1, and consequently, d′ = 0. In contrast, always choosing the unrecognized option (i.e., never using the RH) would lead to r = 0 and both a hit rate and a false alarm rate of 0, thus again implying d′ = 0. We thank an anonymous reviewer for pointing this out.

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

83

a

b

c

Fig. 4. a. Scatterplots showing the adherence rate to the recognition heuristic (RH) plotted against estimates of the individual r parameters. b. Scatterplots showing the relationship between the SDT discriminability measure d′ (used by Pachur et al., 2009, to quantify the adaptive evaluation of recognition on specific items) and estimates of the individual b parameters. Note that variability in knowledge validity b is relatively small (as compared to variability in parameter r) c. Scatterplots showing the empirical relationship between the signal-detection bias measure c (see Hilbig, 2010; Pachur et al., 2009) and estimates of the individual r parameters.

measures provided within the r-modeling approach make it possible to assess goodness of fit and to compare competing theories via parameter restrictions (e.g., it is possible to test whether people's use of the RH differs reliably from the observed adherence rate; see above). Moreover, further research is needed to examine whether these relationships generalize across various data sets and task environments.

4. Discussion We re-examined previous, potentially problematic, conclusions regarding age differences in the use of the RH, using a refined methodological approach (a hierarchical version of the MPT r-model). Our analyses confirmed several of the conclusions drawn by Pachur et al. (2009): Consistent with these authors' analyses (using adherence

84

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85

rates and SDT indices), MPT modeling suggested that both older and younger adults show considerably reduced reliance on the RH in an environment with low recognition validity. Further, age was not associated with the validity of further knowledge or recognition, or with guessing behavior. Nevertheless, our analyses also extended the findings of the previous analyses in several respects, highlighting the potential benefits of a hierarchical MPT modeling approach. First, the modeling revealed a stronger tendency for older adults than younger adults to rely on the RH in a domain with low recognition validity—a pattern that was not obvious using adherence rates (Pachur et al., 2009). While people may apply the RH intuitively or deliberatively (theoretical accounts, including the r-model, are largely silent about this question; but see Hilbig, Scholl, & Pohl, 2010, for a discussion), recognition knowledge can be provided relatively automatically by the cognitive system (e.g., Pachur & Hertwig, 2006). It thus appears that older adults tend to over-rely on a cognitively simple strategy (see also Mata et al., 2007, for comparable findings with the take-the-best heuristic). This interpretation dovetails with research documenting age-related reduction in cognitive resources (e.g., Rogers & Fisk, 2001; Salthouse, 1988) on the one hand, and with work showing that increased need for effort reduction under cognitive constraints promotes reliance on the RH (Hilbig, Erdfelder, & Pohl, 2012; Pachur & Hertwig, 2006; Pohl et al., 2013), on the other. Second, the hierarchical approach allowed us to examine variability in RH use between individuals. These analyses revealed that people vary more in their use of the RH when the validity of recognition is low. Interestingly, however, variability did not differ between older and younger adults. This finding is in line with reports that variability can be relatively constant between 20 and 90 years of age (e.g., Salthouse, 2011), but contrasts with the majority of aging studies, which have found variability in performance to increase with age (Li & Lindenberger, 1999; Nelson & Dannefer, 1992). One possible reason for this is that variability may depend on the difficulty or complexity of the task (e.g., the degree to which a task taps into working memory capacity). The comparative judgment task studied in the present investigation is relatively simple in this regard and allows people (through application of the RH) to rely on semantic recognition knowledge that is both stable across the lifespan (e.g., Baltes et al., 1999) and retrieved with little effort. Overall, our findings are thus consistent with the bulk of research indicating that age-related differences in recognition memory are substantially smaller than those in other memory tasks that tap fluid abilities and require more effortful retrieval, with little support from environmental cues (e.g., Craik, 1994). Our motivation for using a hierarchical Bayesian modeling approach was based on principled arguments and previous demonstrations of potential advantages over nonhierarchical procedures (Gelman & Hill, 2007; Lee & Wagenmakers, 2013; Lee & Webb, 2005; Rouder et al., in press). Although a systematic comparison of hierarchical and nonhierarchical parameter estimation (e.g., Scheibehenne & Pachur, 2014) was beyond the scope of this article, it is interesting to note that the group-level means obtained with the hierarchical approach were rather similar to the mean estimates obtained after aggregating the data (an exception was the r-parameter in the diseases environment; see the Supplemental material for further details). One possible reason is that our study involved a relatively high number of trials per individual (N = 276). Further, this finding could imply that aggregation may not necessarily be inferior to a hierarchical approach if the focus is on group-level analyses only (see Chechile, 2009, for further discussion). In conclusion, we showed that hierarchical MPT modeling can provide additional insight into age differences and individual differences in the use of the RH in probabilistic inference. This approach will contribute to a line of research on the strategic use of recognition and can help to further our understanding of the circumstances under which the RH is used.

Appendix A. Supplementary materials Supplementary materials to this article can be found online at http://dx.doi.org/10.1016/j.actpsy.2014.11.001.

References Arnold, N. R., Bayen, U. J., Kuhlmann, B. G., & Vaterrodt, B. (2013). Hierarchical modeling of contingency-based source monitoring: A test of the probability-matching account. Psychonomic Bulletin and Review, 20, 326–333. Baltes, P. B., Staudinger, U. M., & Lindenberger, U. (1999). Lifespan psychology: Theory and application to intellectual functioning. Annual Review of Psychology, 50, 471–507. Batchelder, W. H., & Riefer, D. M. (1999). Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin and Review, 6, 57–86. Batchelder, W. H., & Riefer, D. M. (2007). Using multinomial processing tree models to measure cognitive deficits in clinical populations. In R. W. J. Neufeld (Ed.), Advances in clinical cognitive science: Formal modeling of processes and symptoms (pp. 19–50). Washington, DC: American Psychological Association. Busemeyer, J. R., & Diederich, A. (2010). Cognitive modeling. London, UK: Sage Publishing. Castela, M., Kellen, D., Erdfelder, E., & Hilbig, B. E. (2014). The impact of subjective recognition experiences on recognition heuristic use: A multinomial processing tree approach. Psychonomic Bulletin and Review, http://dx.doi.org/10.3758/s13423-0140587-4 (Advance online publication.). Chechile, R. A. (2009). Pooling data versus averaging model fits for some prototypical multinomial processing tree models. Journal of Mathematical Psychology, 53, 562–576. Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped or individual data. Psychonomic Bulletin & Review, 15, 692–712. Craik, F. I. M. (1994). Memory changes in normal aging. Current Directions in Psychological Science, 3, 155–158. Erdfelder, E., Auer, T. -S., Hilbig, B. E., Aßfalg, A., Moshagen, M., & Nadarevic, L. (2009). Multinomial processing tree models: A review of the literature. Zeitschrift für Psychologie/Journal of Psychology, 217, 108–124. Estes, W. K. (1956). The problem of inference from curves based on grouped data. Psychological Bulletin, 53, 134–140. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis. Boca Raton, FL: Chapman & Hall. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press. Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650–669. Gigerenzer, G., & Goldstein, D. G. (2011). The recognition heuristic: A decade of research. Judgment and Decision Making, 6, 100–121. Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109, 75–90. Hilbig, B. E. (2010). Precise models deserve precise measures: A methodological dissection. Judgment and Decision Making, 5, 272–284. Hilbig, B. E., Erdfelder, E., & Pohl, R. F. (2010). One-reason decision-making unveiled: A measurement model of the recognition heuristic. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 123–134. Hilbig, B. E., Erdfelder, E., & Pohl, R. F. (2012). A matter of time: Antecedents of one-reason decision making based on recognition. Acta Psychologica, 141, 9–16. Hilbig, B. E., & Pohl, R. F. (2008). Recognizing users of the recognition heuristic. Experimental Psychology, 55, 394–401. Hilbig, B. E., & Richter, T. (2011). Homo heuristicus outnumbered: Comment on Gigerenzer and Brighton (2009). Topics in Cognitive Science, 3, 187–196. Hilbig, B. E., Scholl, S. G., & Pohl, R. F. (2010). Think or blink: Is the recognition heuristic an “intuitive” strategy? Judgment and Decision Making, 5, 300–309. Jacoby, L. L. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30, 513–541. Kellen, D., Singmann, H., & Klauer, K. C. (2014). Modeling source–memory overdistribution. Journal of Memory and Language, 76, 216–236. Klauer, K. C. (2006). Hierarchical multinomial processing tree models: A latent-class approach. Psychometrika, 71, 7–31. Klauer, K. C. (2010). Hierarchical multinomial processing tree models: A latent-trait approach. Psychometrika, 75, 70–98. Knapp, B. R., & Batchelder, W. H. (2004). Representing parametric order constraints in multi-trial applications of multinomial processing tree models. Journal of Mathematical Psychology, 48, 215–229. Kruschke, J. K. (2010). Doing Bayesian data analysis: A tutorial introduction with R and BUGS. Burlington, MA: Academic Press. Lee, M. D., & Newell, B. R. (2011). Using hierarchical Bayesian methods to examine the tools of decision-making. Judgment and Decision Making, 6, 832–842. Lee, M. D., & Wagenmakers, E. -J. (2013). Bayesian modeling for cognitive science: A practical course. Cambridge, UK: Cambridge University Press. Lee, M. D., & Webb, M. R. (2005). Modeling individual differences in cognition. Psychonomic Bulletin and Review, 12, 605–621. Li, S. -C., & Lindenberger, U. (1999). Cross-level unification: A computational exploration of the link between deterioration of neurotransmitter systems and dedifferentiation of cognitive abilities in old age. In L. -G. Nilsson, & H. Markowitsch (Eds.), Cognitive neuroscience of memory (pp. 103–146). Seattle, WA: Hogrefe & Huber. Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). Win-BUGS—A Bayesian modeling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.

S.S. Horn et al. / Acta Psychologica 154 (2015) 77–85 Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user's guide. Mahwah, NJ: Lawrence Erlbaum. Marewski, J. N., Gaissmaier, W., Schooler, L. J., Goldstein, D. G., & Gigerenzer, G. (2010). From recognition to decisions: Extending and testing recognition-based models for multialternative inference. Psychonomic Bulletin & Review, 17, 287–309. Mata, R., Schooler, L. J., & Rieskamp, J. (2007). The aging decision maker: Cognitive aging and the adaptive selection of decision strategies. Psychology and Aging, 22, 796–810. Matzke, D., Dolan, C. V., Batchelder, W. H., & Wagenmakers, E. -J. (2013). Bayesian estimation of multinomial processing tree models with heterogeneity in participants and items. Psychometrika: Application reviews & case studies, http://dx.doi.org/10.1007/ s11336-013-9374-9 (Advance online publication). McCloy, R., Beaman, C. P., Frosch, C., & Goddard, K. (2010). Fast and frugal framing effects? Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1042–1052. Nelson, E. A., & Dannefer, D. (1992). Aged heterogeneity: Fact or fiction? The fate of diversity in gerontological research. Gerontologist, 32, 17–23. Pachur, T. (2011). The limited value of precise tests of the recognition heuristic. Judgment and Decision Making, 6, 413–422. Pachur, T., Bröder, A., & Marewski, J. (2008). The recognition heuristic in memory-based inference: Is recognition a non-compensatory cue? Journal of Behavioral Decision Making, 21, 183–210. Pachur, T., & Hertwig, R. (2006). On the psychology of the recognition heuristic: Retrieval primacy as a key determinant of its use. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 983–1002. Pachur, T., Mata, R., & Schooler, L. J. (2009). Cognitive aging and the adaptive use of recognition in decision making. Psychology and Aging, 24, 901–915. Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L. J., & Goldstein, D. G. (2011). The recognition heuristic: A review of theory and tests. Frontiers in Cognitive Science, 2, 147. Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L. J., & Goldstein, D. G. (2012). When is the recognition heuristic an adaptive tool? In P. M. Todd, G. Gigerenzer, & the ABC Research Group (Eds.), Ecological rationality: Intelligence in the world (pp. 113–143). New York, NY: Oxford University Press. Pohl, R. F., Erdfelder, E., Hilbig, B. E., Liebke, L., & Stahlberg, D. (2013). Effort reduction after self-control depletion: The role of cognitive resources in use of simple heuristics. Journal of Cognitive Psychology, 25, 267–276.

85

Rabbitt, P. (2011). Between-individual variability and interpretation of associations between neurophysiological and behavioral measures in aging populations: Comment on Salthouse (2011). Psychological Bulletin, 137, 785–789. Rogers, W. A., & Fisk, A. D. (2001). Understanding the role of attention in cognitive aging research. In J. E. Birren, & K. W. Schaie (Eds.), Handbook of the psychology of aging (pp. 267–287). San Diego, CA: Academic Press. Rouder, J. N., Morey, R. D., & Pratte, M. S. (2014). Hierarchical Bayesian models. In W. H. Batchelder, H. Colonius, E. Dzhafarov, & J. I. Myung (Eds.), New handbook of mathematical psychology. Measurement and methodology, Vol. 1. London, UK: Cambridge University Press (in press). Salthouse, T. A. (1988). Resource-reduction interpretations of cognitive aging. Developmental Review, 8, 238–272. Salthouse, T. A. (2011). Neuroanatomical substrates of age-related cognitive decline. Psychological Bulletin, 137, 753–784. Scheibehenne, B., & Pachur, T. (2014). Using Bayesian hierarchical parameter estimation to assess the generalizability of cognitive models of choice. Psychonomic Bulletin & Review, http://dx.doi.org/10.3758/s13423-014-0684-4 (Advance online publication). Schnitzspahn, K. M., Horn, S. S., Bayen, U. J., & Kliegel, M. (2012). Age effects in emotional prospective memory: Cue valence differentially affects the prospective and retrospective component. Psychology and Aging, 27, 498–509. Siegler, R. S. (1987). The perils of averaging data over strategies: An example from children's addition. Journal of Experimental Psychology: General, 116, 250–264. Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129–138. Smith, J. B., & Batchelder, W. H. (2010). Beta-MPT: Multinomial processing tree models for addressing individual differences. Journal of Mathematical Psychology, 54, 167–183. Vandekerckhove, J., Tuerlinckx, F., & Lee, M. D. (2011). Hierarchical diffusion models for two-choice response times. Psychological Methods, 16, 44–62. Volz, K. G., Schooler, L. J., Schubotz, R. I., Raab, M., Gigerenzer, G., & von Cramon, D. Y. (2006). Why you think Milan is larger than Modena: Neural correlates of the recognition heuristic. Journal of Cognitive Neuroscience, 18, 1924–1936. Wagenmakers, E. -J., Lee, M. D., Lodewyckx, T., & Iverson, G. (2008). Bayesian versus frequentist inference. In H. Hoijtink, I. Klugkist, & P. A. Boelen (Eds.), Bayesian evaluation of informative hypotheses in psychology (pp. 181–207). New York, NY: Springer.