Journal of Experimental Psychology: Learning, Memory, and Cognition 2005, Vol. 31, No. 4, 768 –788
Copyright 2005 by the American Psychological Association 0278-7393/05/$12.00 DOI: 10.1037/0278-7393.31.4.768
Dual-Process Models of Associative Recognition in Young and Older Adults: Evidence From Receiver Operating Characteristics Michael R. Healy
Leah L. Light
Claremont Graduate University and Pitzer College
Pitzer College
Christie Chung Claremont Graduate University and Pitzer College In 3 experiments, young and older adults studied lists of unrelated word pairs and were given confidencerated item and associative recognition tests. Several different models of recognition were fit to the confidence-rating data using techniques described by S. Macho (2002, 2004). Concordant with previous findings, item recognition data were best fit by an unequal-variance signal detection theory model for both young and older adults. For both age groups, associative recognition performance was best explained by models incorporating both recollection and familiarity components. Examination of parameter estimates supported the conclusion that recollection is reduced in old age, but inferences about age differences in familiarity were highly model dependent. Implications for dual-process models of memory in old age are discussed. Keywords: recognition memory, associative recognition, models of recognition memory, memory and aging, dual-process theories of memory
els, recollective processes involve conscious remembering of particular aspects of a prior episode, such as perceptual details, spatial or temporal information, the source of information, or thoughts and feelings that accompanied the episode. In addition, and of particular importance for the studies that we report here, familiarity is thought to dominate in recognition of single items, whereas associative recognition, in which pairs of words studied together must be discriminated from pairs whose constituents were studied with different partners, requires recollection (Kelley & Wixted, 2001; Rotello & Heit, 1999, 2000; Rotello, Macmillan, & Van Tassel, 2000; Yonelinas, 1997). Adults over 60 years old show age-related declines in memory on recall and recognition tasks (Light, 1991; Light, Prull, La Voie, & Healy, 2000; Salthouse, Fristoe, & Rhee, 1996; Zacks, Hasher, & Li, 2000). Several lines of evidence suggest that these decrements arise from reduced efficiency in recollection, with relative preservation of familiarity-based mechanisms. We begin with a summary of findings in support of this proposition, continue with a short discussion of models of recognition, and then report three experiments that provide new evidence about recollection and familiarity in old age based on receiver operating characteristics (ROCs) for item and associative recognition memory judgments.
Dual-process models of memory posit two processes, recollection and familiarity, that subserve recall and recognition (e.g., Atkinson & Juola, 1974; Hintzman, 1986; Humphreys, Bain, & Pike, 1989; Jacoby, 1991; Mandler, 1980; Raaijmakers & Shiffrin, 1992; Yonelinas, 1994, 1997). Although the core assumptions of these models differ in some respects (see Yonelinas, 2002, for a review), they generally characterize recollection as conscious or intentional in nature, as attention demanding, and as slow in rise time, whereas familiarity is thought to be unconscious, relatively automatic, and recruited more rapidly. In most dual-process mod-
Michael R. Healy and Christie Chung, Department of Psychology, Claremont Graduate University, and Department of Psychology, Pitzer College; Leah L. Light, Department of Psychology, Pitzer College. Christie Chung is now at the Brain and Cognitive Sciences Department, Massachusetts Institute of Technology. This research was supported by National Institute on Aging Grant AG02452 to Leah L. Light. Portions of this research were presented at the International Conference on Memory, Valencia, Spain, July 2001; the Meeting of the Memory Disorders Research Society, Boston, Massachusetts, October 2001; the 42nd Annual Meeting of the Psychonomic Society, Orlando, Florida, November 2001; the Seventh Cognitive Aging Conference, Tours, France, September 2002; and the Ninth Cognitive Aging Conference, Atlanta, Georgia, April 2002. We thank Dale E. Berger, William P. Banks, Andrew P. Yonelinas, Tom Wickens, and John Angus for their advice on statistical matters and Erin Gorman, Meredith Patterson, and Megan Pulham for testing participants. Comments by John Wixted sharpened our thinking on several issues. Correspondence concerning this article should be addressed to Leah L. Light, Department of Psychology, Pitzer College, 1050 North Mills Avenue, Claremont, CA 91711, or to Michael R. Healy, who is now at OTX Research, Inc., 10567 Jefferson Boulevard, Culver City, CA 90232. E-mail:
[email protected] or
[email protected]
Impaired Recollection and Preserved Familiarity Processes in Old Age Although it is unlikely that any measures of memory are process pure (Jacoby, 1991), it is reasonable to assume that recollective processes that depend on contextual information are most heavily involved in recall, with recognition (or at least item recognition) having a larger familiarity component. This leads to the prediction that, on average, recall should show 768
MODELS OF RECOGNITION MEMORY
greater age-related deficits than item recognition, a prediction supported by the ordering of effect sizes in meta-analyses (e.g., La Voie & Light, 1994; Verhaeghen, Marcoen, & Goossens, 1993). Older adults also have poorer memory for contextual information, even when item recognition is held constant (e.g., Chalfonte & Johnson, 1996; Kliegl & Lindenberger, 1993; Schacter, Osowiecki, Kaszniak, Kihlstrom, & Valdiserri, 1994; Spencer & Raz, 1995). Similarly, associative recognition shows an age difference, even when recognition of individual items in young and older adults has been equated (e.g., Light, Patterson, Chung, & Healy, 2004; Naveh-Benjamin, 2000; NavehBenjamin, Hussain, Guez, & Bar-On, 2003). Findings from other paradigms also provide converging evidence that aging is accompanied by impairment of recollection and relative sparing of familiarity (see Light et al., 2000, for a review). For instance, after studying lists of items associated semantically or phonologically with target concepts or words, older adults have higher false-alarm (FA) rates in recognition and higher intrusion rates in recall for the nonpresented targets than do young adults—a form of memory illusion (e.g., Balota et al., 1999; Koutstaal & Schacter, 1997; Norman & Schacter, 1997; Tun, Wingfield, Rosen, & Blanchard, 1998). Jacoby (1999) reported an ironic effect of repetition on recognition memory in young and older adults. After seeing a list in which words appeared one, two, or three times and then hearing a second list of words, participants took a recognition test on which they were asked to respond “old” only to previously heard words. For young adults, exclusion of seen words improved as a function of repetition, but the opposite was true for older adults, implicating an age-related deficit in the use of recollection to oppose enhanced familiarity produced by item repetition. Young adults show effects of repetition in associative recognition like those of older adults when required to respond before a deadline short enough to prevent recruitment of recollective processes (Light et al., 2004). Similar effects of repetition and deadline have been found for the Deese/Roediger–McDermott memory illusion task (Benjamin, 2001; Kensinger & Schacter, 1999). A related effect of spaced versus massed repetition has been found by Benjamin and Craik (2001)—spacing two presentations of an item rather than massing these led young adults to correctly endorse the item more often when asked to recognize it and to reject it when instructed to exclude it. Older adults and young adults responding under time pressure did not benefit from spaced repetition when asked to exclude items, presumably because familiarity is less well opposed by recollection in older adults and by young adults under time pressure. The studies described above provide strong qualitative evidence for the dissociation of recollection and familiarity by age. In addition, two strategies have been used to obtain quantitative estimates of the contributions of recollection and familiarity to memory in young and older adults. These are the processdissociation (Jacoby, 1991) and remember/know procedures (Gardiner, 1988; Tulving, 1985). In the process-dissociation procedure, estimates of recollection (R) and familiarity (F) are obtained by contrasting inclusion tasks where recollection and familiarity contribute to performance and exclusion tasks in which successful performance requires using recollection to oppose familiarity. In the remember/know task, participants are asked to make a second decision—remember or know— based on the con-
769
scious experiences that led to the positive response. A remember response is given when elements of the original study episode are recollected. A know response is given when study-phase contextual details cannot be retrieved but the test item feels sufficiently familiar to warrant an “old” judgment. This procedure measures states of awareness rather than processes mediating these states, but proportions of remember and know responses are often treated as relatively pure measures of recollection and familiarity (Yonelinas & Jacoby, 1995). Two recent meta-analytic reviews have addressed claims that recollection and familiarity estimates obtained from processdissociation and remember/know procedures show age sensitivity and age invariance, respectively (Light et al., 2000; Yonelinas, 2002). Estimates from the process-dissociation procedure show the expected pattern, with (often sizable) age differences in R and age constancy in F being the rule (Light et al., 2000; Yonelinas, 2002). Light et al. (2000) took unweighted means from 10 studies of estimates of R and K derived from hits and FAs and found the anticipated outcome for R (an age difference favoring the young) but a slightly increased value of K for older adults. The remember/ know task assumes that R and K reflect mutually exclusive states of consciousness. Yonelinas and Jacoby (1995) recommended dividing the familiarity estimate K by (1 – R) to provide a correction for independence. Using this correction, Light et al. found age effects favoring the young for both recollection and familiarity. Yonelinas (2002) suggested that the finding of reduced familiarity estimates in the old is due to a ceiling effect. When studies were partitioned into those with estimates of R below .60 and those with estimates of R above .60, familiarity estimates were more similar across age for the first class. However, examination of the tabled values in Yonelinas’s review reveals that for remember/know studies with R estimates below .60, 1 had identical F estimates for young and old, 1 had a higher F estimate for older adults, and 4 had larger F estimates for young adults. For the 6 studies with R estimates above .60, 5 had F estimates that were larger for the young group, and 1 showed a reversed pattern. Overall then, in the 12 studies for which estimates were reported, the F estimates were lower in the older group in 9, higher in the older group in 2, and tied in 1. We believe that this pattern of results is best interpreted as showing that the remember/know paradigm yields lower estimates of both R and F in older adults.1 We do not find this conclusion altogether surprising. Know judgments, like remember judgments, assign items to the category of recently studied material. At some level, then, both may tap recollection, though know judgments represent an absence of detailed information about acquisition. By this account, know judgments would not yield pure estimates of familiarity in the absence of recollection and consequently would not necessarily show stability across age. In the three experiments reported here, we utilized a third method of obtaining quantitative estimates of familiarity and recollection. The experiments used an associative recognition task in which participants studied lists of word pairs and were then tested using three different types of word pairs. Intact pairs had appeared in a previous study list and were to be accepted as “old.” New pairs 1
Prull, Crandell, Martin, Backus, and Light (2004) reached similar conclusions after a more extended review of this literature.
770
HEALY, LIGHT, AND CHUNG
that consisted of two unstudied words and rearranged pairs that consisted of two words that had appeared in the previous study list, but with different partners, served as distractors and were to be rejected as “new.” In Experiment 3, participants were also tested using an item recognition task in which single words from studied pairs were to be accepted as “old” and unstudied words were to be rejected as “new.” For the item and associative recognition tasks, participants rated how confident they were that a word was old or a word pair was intact on a 6-point scale. The rating data were fit with several models of recognition memory, and the ability of these models to reproduce the empirical data was evaluated. We also examined parameter estimates to test hypotheses about agerelated differences in familiarity and recollection. Before moving on to the experiments, we describe the models to be examined and our approach to modeling the data.
Models of Item and Associative Recognition Memory Models of item and associative recognition judgments vary in their assumptions about component processes and the relationships among these processes. Parameters estimated from a particular model represent quantitative characteristics of the hypothesized processes (e.g., familiarity and recollection, item and associative information) seen through the lens of that model. Consequently, interpretation of age differences in item and associative recognition depends on the model selected to describe performance. There is at present considerable debate as to how best to model both item and associative recognition data. Rather than committing in advance to a single model, then, we fit a range of models with differing assumptions about the nature and contribution of component processes to recognition memory. Our primary goals were to determine whether all models that postulate a role for recollection generate estimates of the relevant parameters that favor young adults over older adults (as would be predicted from our literature review), to see whether models that incorporate familiarity processes of one kind or another yield estimates of these processes that also vary with age, and in general to evaluate the effect of model selection on conclusions drawn about age differences in component processes. The Appendix lists the models that were examined and the parameters estimated for each. At one end of the spectrum is the unequal-variance signal detection theory (UVSDT) model. In this model, the target and distractor distributions are assumed to be Gaussian in nature, and recognition judgments arise from an assessment of whether a test probe’s strength of evidence (familiarity) exceeds some criterial value. The parameters estimated by this model include two familiarity parameters, d⬘I/N and d⬘R/N, representing the mean differences between the two old (intact and rearranged) distributions and the new pair distribution (standardized to the new distribution). Note that the signature for good associative recognition in this model would be high d⬘I/N together with low d⬘R/N. In addition, the ratio of the variance of old items (both intact and rearranged pairs) to new items was estimated as I,R/N. For ROCs based on hits and FAs in single word recognition, the UVSDT model (without d⬘R/N) is widely believed to provide the best description of ROC data (e.g., Glanzer, Kim, Hilford, & Adams, 1999; Heathcote, 2003; but see below). Estimates of for single word recognition are usually in the neighborhood of 1.25; equivalently, the z-transformed slope (1/) is
about .80 (Ratcliff, Sheu, & Gronlund, 1992). The UVSDT model also provides a good fit for intact/new ROCs from associative recognition tests (e.g., Kelley & Wixted, 2001; Rotello et al., 2000). It predicts curvilinear ROCs, with asymmetry around the minor diagonal reflecting the greater variability of the target distribution. At the other end of the modeling spectrum is the two-highthreshold model. Here, positive recognition judgments arise when test-item strength exceeds some threshold value. As suggested by its name, the two-high-threshold model suggests that two recollective processes (or states) operate in recognition. The first is a recall-to-accept process (RO) that involves recollection of specific details of the prior occurrence of a test probe to determine study status; recall to accept is assumed to operate only on intact pairs on associative recognition tests. The second is a recall-to-reject process (RN) that involves recollection that lure items are new; in associative recognition, each member of a rearranged lure pair may be used as a cue for its studied pair mate. In its traditional form, the two-high-threshold model predicts linear operating characteristics with intercepts that diverge from their (0, 0) and (1, 1) intercepts. Intact/rearranged ROCs from associative recognition are often described as linear in nature (consistent with the two-highthreshold model) or at least as more linear than intact/new ROCs (e.g., Kelley & Wixted, 2001; Rotello et al., 2000; Yonelinas, 1997, 2001). However, when we used the approach taken by Yonelinas, Dobbins, Szymanski, Dhaliwal, and King (1996) to model fitting, we found that the two-high-threshold model gave a much less good account of intact/rearranged ROCs from 13 published studies when pitted against three models that predict curvilinear functions for these ROCS (the UVSDT model and two versions of Yonelinas’s dual-process model varying in number of recollection parameters).2 Interestingly, the two-high-threshold model can produce curvilinear operating characteristics nearly indistinguishable from signal-detection models if recollectionbased responses can be assigned to any response category, rather than just the strongest confidence ratings (Malmberg, 2002; see also Erdfelder & Buchner, 1998). There is considerable behavioral evidence that both recollection and familiarity have a role in recognition decisions (e.g. Arndt & Reder, 2002; Hintzman & Curran, 1994), especially when test items are pairs that can contain both item and associative information (Cleary, Curran, & Greene, 2001; Kelley & Wixted, 2001; Light et al., 2004; Rotello et al., 2000). Yonelinas and his colleagues (e.g., Yonelinas, 1994, 1997; Yonelinas et al., 1996; Yonelinas, Regehr, & Jacoby, 1995) have proposed a dual-process signal detection theory model. Within the framework of this model, associative recognition is a within-task opposition procedure; intact pairs represent to-be-included items that benefit from both recollection and familiarity, whereas rearranged pairs represent to-be-excluded items with recollection needed to oppose item familiarity. As in the two-high-threshold model, RO and RN rep-
2
A more complete description of these analyses may be obtained directly from us.
MODELS OF RECOGNITION MEMORY
resent the probabilities of recall to accept and recall to reject.3 Decisions about the status of old test items that are not recollected and new test items that cannot be recollected are based on assessment of familiarity, modeled as an equal-variance signal-detection process. This dual-process equal-variance (DP-EV) model accounts for aspects of associative recognition ROCs such as lower y-intercepts that often diverge from the (0, 0) origin and curvilinearity that is often not quite what has been predicted by the UVSDT model (see Yonelinas, 2001, for a discussion). We included both the DP-EV model and a dual-process unequalvariance (DP-UV) model in which I,R/N was estimated rather than fixed at 1. In the DP-EV and DP-UV models, only the most extreme confidence ratings (6 for intact pairs and 1 for rearranged pairs) can be assigned to recollection-based responses. We also examined a dual-process model in which recollection-based responses can be distributed across confidence judgments (4, 5, or 6 for intact pairs and 1, 2, or 3 for rearranged pairs). This hybrid model is a multinomial processing tree signal detection theory (MPTSDT) model (Macho, 2002) that includes familiarity, estimated by d⬘I,R/N, and two recollection parameters, RO and RN, as well as estimating the distribution of recollection responses across confidence levels. An important difference between this model and the DP-EV and DP-UV models is that, unlike them, it does not contain a rearranged/new distribution. Rather, intact and rearranged pairs are assumed to have equal familiarity, and discrimination of the two pair types depends solely on recollection (but see Yonelinas, Kroll, Dobbins, & Soltani, 1999). In Macho’s (2002, 2004) application of the MPTSDT model, only a single recollection parameter was estimated. We estimated both RO and RN so that the separate contributions of recall to accept and recall to reject could be evaluated as in the other dual-process models under consideration. The recollection-based models discussed thus far treat recollection as threshold in nature. Kelley and Wixted (2001), however, proposed that associative information is continuous. Their someor-none (SON) model incorporates three normal distributions representing new items, item information, and associative information. In this model, recognition decisions are based on item familiarity incremented or decremented by associative familiarity. For intact pairs, associative information (with probability RO) is added to item information to yield a single strength-of-evidence variable. For rearranged pairs, associative information is subtracted from item information (with probability RN). As a consequence of adding associative information to or subtracting it from item information, the decision variable is predicted to have ASSOC greater than 1. We examined fits of both equal- and unequalvariance SON models to our data. Macho (2004) fit a number of models to pooled associative recognition data from both Yonelinas (1997, Experiments 1–3) and Kelley and Wixted (2001, Experiments 1–3). Among the models examined by Macho for Yonelinas’s data were the DP-UV (which he referred to as the two-high-threshold signal-detection model) and the SON models. Because Yonelinas argued that recall-toreject processes are rarely used and because his first experiment was designed to reduce the usefulness of this strategy, the DP-UV model fit by Macho included only RO, with RN ⫽ 0. For Experiments 1 and 2, this model adequately fit the data, but RN was needed for Experiment 3. For the SON models, setting RN ⫽ 0 worked for Experiment 1 but not for Experiments 2 or 3. As is
771
evident, the results from these two classes of model did not yield uniform conclusions. Also, contrary to Kelley and Wixted, constraining ASSOC ⫽ 1 in the SON model proved satisfactory for all three experiments. In addition, the MPTSDT model (called the HT-3R model by Macho) also provided an adequate description of Yonelinas’s data. (Because there were no new lures in Yonelinas’s studies, d⬘I,R/N was not estimated.) As we show below, our results did not fully conform to those obtained for Yonelinas’s data.
Approach to Model Fitting ROC analysis for associative recognition data has usually been carried out separately for intact/new, intact/rearranged, and (less often) rearranged/new ROCs. The method used in the present experiments was to simultaneously fit intact/new and rearranged/ new ROCs, using the approach of Macho (2004). This technique, which models all rating data from a single recognition test in one pass, has important advantages over the piecemeal approach, especially when, as in our case, the associative recognition test contains both new and rearranged lures. First, it avoids interpretive problems arising when different models provide better fits to ROCs based on the same hit or FA rates. In preliminary analyses of our data, we found that the unequal-variance signal-detection model gave a better account of intact/new ROCs but that a recollection-based model generally described intact/rearranged ROCs more satisfactorily, especially for young adults (see Light, Healy, Patterson, & Cheung, 2005). Because the three types of test pairs involved in these ROCs (intact, rearranged, new) all appeared on the same test and because people are unlikely to shift response criteria on an item-by-item basis (Stretch & Wixted, 1998), finding that different models fit different aspects of the data better is at best theoretically awkward. Second, again because all data generated by the recognition test are fit simultaneously, certain anomalies in parameter estimation are avoided. In our preliminary analyses, fitting Yonelinas’s (1997) dual-process model to intact/rearranged and intact/new ROCs separately yielded rather different estimates of RO, despite the fact that the same hit rates were plotted in the two ROCs. Macho’s procedure estimates a single RO for the entire data set, precluding discrepant estimates of the same parameter from ROCs based on different subsets of the data from a single test. Fits of associative recognition memory ROCs have generally been carried out for data aggregated over participants (i.e., at the group level), though some studies have reported ROC fits for individuals (Kelley & Wixted, 2001, Experiment 4; Yonelinas, 1997, Experiment 3). Inferences about processes based on these two approaches are not invariably identical. That is, ROCs are 3
Two-high-threshold models of recognition postulate one recollection process (RO) that operates solely on “old” items and a different one (RN) that operates solely on “new” items (see Macmillan & Creelman, 1991; Snodgrass & Corwin, 1988). If these are constrained to be equal in magnitude, it might make sense to estimate a single R parameter, but in doing so, the theoretical distinction between these is lost, and empirically, when both are estimated, they are not always equal in magnitude (e.g., Arndt & Reder, 2002; Yonelinas, 1997). That is, estimates of RO may be larger than those for RN, perhaps because (as suggested by a reviewer) the latter is somewhat more dependent on initiation of strategic retrieval triggered by a familiarity cue, whereas the high contextual match between study and test for intact pairs triggers a fairly automatic process.
HEALY, LIGHT, AND CHUNG
772
nonlinear functions, and the mean of a set of individual nonlinear functions does not necessarily equal the same function fit to aggregated data (Wickens, 2002). In our studies, we included sufficient numbers of observations per person per condition for associative recognition (96 in Experiments 1 and 2 and 72 in Experiment 3) and for item recognition (72 in Experiment 3) to permit modeling at the individual level. Thus, our data set provides a unique opportunity to compare models for item and associative recognition for each age group as well as across age groups for large numbers of individual participants. In addition, estimating parameters for individual participants permitted us to use traditional statistical analyses for detecting group differences (i.e., t tests). Further details of the modeling procedure are discussed after we report the behavioral data for Experiments 1–3.
The Present Experiments If, as we have suggested above, older adults have impaired recollection but spared familiarity processes, parameter estimates derived from a given model should vary across age in predictable ways. We know of only one published study in which ROCs for young and older adults were systematically compared. Harkins, Chapman, and Eisdorfer (1979) compared item recognition ROCs for 8 young and 16 older women and found that these ROCs differed in terms of sensitivity but not slope. We used dataThief (Huyser & van der Laan, 1994) to capture the (x, y) coordinates for the ROCs in Harkins et al. and fit the UVSDT model to their data. The estimates we obtained for both d⬘ and were numerically smaller for older adults. When Light et al. (2000) fit the DP-EV model to Harkins et al.’s ROCs, they found that older adults’ RO and familiarity (d⬘) estimates were both numerically smaller than those of young adults. Prull et al. (2004) reported a similar outcome, namely, that values of d⬘ and RO estimated for item recognition from the DP-EV model were both smaller for older adults. A decline in d⬘ is unexpected given that process-dissociation tasks generally show small to nonexistent age differences in F, though it is consistent with a decline in proportion of know judgments in the remember/know paradigm. On the assumption that recollection is age sensitive, we predicted that models including the recollection components RO and RN would yield larger values of these parameters for young than for older adults. Whether familiarity, as assessed by d⬘, would show age differences for either item or associative ROCs was considered an open question given our review of the literature. Note, however, that if item recognition ROCs are best described by the UVSDT model (as is typically the case), one would expect to see age differences in d⬘, given that age effects in item recognition are pervasive (Burke & Light, 1981). Although we have included a rather large number of models in our analyses of associative recognition, our goal was not so much to determine which of these was the best fitting one (and indeed, we show below that more than one model does a credible job of accounting for our data) but to determine whether conclusions drawn about age differences in familiarity and recollection are dependent on the specific ways in which these constructs are embodied in different models.
Experiment 1 In Experiment 1, young and older participants studied lists of word pairs and were given a confidence-rated recognition test on
each. The test lists consisted of intact pairs that had appeared in the previous study list, rearranged pairs in which both words had been studied but with different partners, and new pairs that contained two unstudied words. Because age differences in associative recognition could result from problems in either memory for single items or memory for the association between items (e.g., NavehBenjamin, 2000), we tested subgroups of participants in each age group at different presentation rates in the hope of producing approximately equal performance on item recognition across age for at least one duration.
Method Participants. Fifty-nine young adults (43 women and 16 men) and 60 older adults (44 women and 16 men) were randomly assigned to either a short or a long study condition. The short study duration was 2 s/pair for young adults (n ⫽ 30) and 3 s/pair for older adults (n ⫽ 30). The long study duration was 3 s/pair for young adults (n ⫽ 29) and 4 s/pair for older adults (n ⫽ 30). The young adults were students of the Claremont Colleges, Claremont, California, and the older adults were residents of the local community. Demographic characteristics of the sample and performance of participants on a battery of measures collected for descriptive purposes are given in Table 1. Data from a further 2 young adults and 9 older adults were collected but not included in the analyses. A criterion for inclusion was that a participant had to have distributed confidence judgments sufficiently to permit fitting of at least 3-point intact/new and rearranged/new ROCs at the individual level. Additional inclusion criteria are discussed when we consider implementation of the models in more detail. Data from 1 young and 2 older adults in the short study condition and 2 older adults in the long study condition were removed for this reason. One young adult’s data were lost because of a power outage. Of the remaining older adults, 2 had not learned English by a criterion age of 8 years old, 2 were older than a criterion age of 80, and 1 did not complete the associative recognition task. All participants were tested individually in each of the three experiments. Materials. We randomly selected 896 words from the MRC Psycholinguistic Database (Coltheart, 1981) to be targets and 56 to be buffers. The target words were between four and nine letters long, had an average frequency of 49.54 occurrences per million (SD ⫽ 80.80, range ⫽ 1–760; Francis & Kucˇera, 1982), and had an average of 1.90 syllables (range ⫽ 1– 4). Design and procedure. At the beginning of the experiment, participants completed a demographic questionnaire and then read detailed instructions about the associative recognition task. Participants were told that they would study a list of word pairs and then be given a memory test for three types of word pairs (intact, rearranged, and new). Participants were told that they would never see one old and one new word, that words repeated at test would be on the same side of the screen as they were during study, and that words would be repeated only within the same study–test block. No suggestion was made in the instructions that performance might be improved by trying to form associations or to use these at test. After participants read the instructions, they were questioned to make sure that they understood the task. The associative recognition task consisted of seven study–test blocks, the first of which was treated as practice and not analyzed. Assignment of words to pairs and to testing conditions was randomized for each participant. Each study list consisted of 48 word pairs placed between two buffer word pairs at each end of the list. Of the 48 study pairs, 16 were tested as intact pairs. The remaining 32 study pairs were used to create 16 rearranged pairs; only the left or the right word from a given study pair was used for this purpose, with the unselected members of the study pairs not tested. Each test list also included 16 new pairs. The ordering of pair types on the test list was pseudorandomized such that a given pair type could not occur more than four times consecutively. Participants were asked to rate their
MODELS OF RECOGNITION MEMORY
773
Table 1 Characteristics of Young and Older Participants Experiment 1 Young Group
M
n Age (years) Educationa Vocabularyb Healthc CVLT immediate CVLT delayed Circling Asd Relative speede Computation spanf
20.03 13.76 15.98 8.58 6.81 4.76 20.64 1.23g 3.39
Experiment 2 Older
Young
SD
M
SD
M
2.30 1.63 3.93 1.15 1.78 1.86 5.12 0.32 1.14
60 74.10*** 16.43*** 20.17*** 7.85*** 4.58*** 2.35*** 14.75*** 1.37h** 2.51h***
4.19 3.24 4.13 1.62 1.81 1.53 3.86 0.32 1.09
20.39 14.19 15.97 8.32 6.64 4.42 21.26 1.23i 3.52j
59
Experiment 3 Older
Young
SD
M
SD
M
2.35 1.94 4.32 1.01 2.24 1.57 5.32 0.24 1.40
33 73.27*** 16.61*** 20.91*** 8.36k 5.58** 1.91*** 13.88*** 1.18l 2.22k***
4.42 3.20 3.09 1.32 1.70 1.33 3.25 0.42 1.07
21.96 15.16 16.20 8.24 6.88 3.88 19.60 1.12 3.88
31
Older SD
M
SD
3.06 2.25 3.84 1.02 1.83 2.09 5.07 0.37 1.17
36 72.50*** 16.17 20.28*** 8.15 5.58** 2.36*** 14.14*** 1.30m 2.61***
4.10 3.40 3.83 1.17 1.95 1.88 6.13 0.39 1.15
25
Note. CVLT ⫽ California Verbal Learning Test (Delis, Kramer, Kaplan, & Ober, 1987). a Years of formal education. b Nelson & Denny (1960), maximum score ⫽ 25. c Subjective ratings of health on a 10-point scale; higher values mean better health. d Participants were given several sheets of paper containing rows of random letters and were asked to find as many letter As as they could in 1 minute. e Relative speed is based on the Symbol–Digit Substitution Task (Salthouse, 1992). The values shown are the reaction time differences between digit–symbol and digit– digit divided by digit– digit. f Salthouse (1992). g n ⫽ 57. h n ⫽ 59. i n ⫽ 26. j n ⫽ 29. k n ⫽ 32. l n ⫽ 30. m n ⫽ 35. ** p ⬍ .05. *** p ⬍ .01.
confidence in the study status of a test pair using a 6-point scale (1 ⫽ Sure New to 6 ⫽ Sure Old) and were encouraged to use all six response categories. They were instructed to call intact pairs “old” by giving ratings of 4, 5, or 6, and to call rearranged and new pairs “new” using ratings of 1, 2, or 3. Confidence ratings were made vocally, without time limit, and were recorded by the research assistant. After each study–test block, participants were given a rest break, and after completing all of the study–test blocks, they were given another rest break before completing the auxiliary measures reported in Table 1. A typical testing session lasted between 1.5 and 2 hr, and participants were paid $20.
Results We report here the data for hit rates on intact pairs and FA rates on new and rearranged pairs. Unless otherwise noted, a two-tailed alpha of .05 was used for all statistical tests, and effect sizes are reported using partial 2. Although we computed a large number of contrasts for parameter estimates, a liberal significance level was deemed appropriate for these comparisons because of our interest in seeing similar trends in parameters across experiments. Statistical modeling of the rating data for all three experiments is described in the section titled Formal Modeling of Experiments 1–3. Hit and FA rates were obtained by calculating the proportions of intact, rearranged, and new pairs given “old” ratings (see Table 2) and analyzed using Age ⫻ Duration analyses of variance (ANOVA). The FA ANOVA included a within-subjects factor of FA type (rearranged, new). Hit rates showed no significant effects of duration or age (both Fs ⬍ 1), but there was a marginally significant crossover interaction between these factors, F(1, 115) ⫽ 3.10, p ⫽ .08, 2 ⫽ .03. Young adults had marginally lower hit rates in the short condition than in the long condition, t(57) ⫽ ⫺1.76, p ⫽ .08, 2 ⫽ .05, whereas older adults’ hit rates were nonsignificantly higher in the short condition (t ⬍ 1). Rearranged pairs had higher FAs than new pairs, F(1, 115) ⫽ 381.67, p ⬍ .01, 2 ⫽ .77; older adults made more FAs than young adults, F(1, 115) ⫽ 8.38, p ⬍ .01, 2 ⫽ .07; and the interaction between age and FA type was significant, F(1, 115) ⫽ 16.93, p ⬍ .01, 2 ⫽ .13. Young adults made significantly fewer FAs than older adults on rearranged pairs, t(117) ⫽ ⫺3.62, p ⬍ .01, 2 ⫽ .10, but not on
new pairs, t(117) ⫽ ⫺1.38, p ⫽ .17, 2 ⫽ .02. None of the duration effects were significant for FAs (all Fs ⬍ 1).
Discussion The behavioral results of Experiment 1 can be summarized quite simply—young and older adults did not differ significantly in hits for intact pairs or in FAs for new pairs, but false alarms for rearranged pairs were
Table 2 Mean Hit and False-Alarm (FA) Rates, With Standard Deviations, for Experiments 1, 2, and 3: Associative Recognition FAs rearranged
Hits Group
n
M
SD
M
FAs new
SD
M
SD
Experiment 1 Young 2s 3s All Older 3s 4s All
30 29 59
.65 .72 .69
.16 .16 .16
.26 .25 .25
.15 .16 .15
.09 .11 .10
.07 .10 .09
30 30 60
.71 .69 .70
.14 .13 .14
.34 .39 .37
.18 .18 .18
.11 .14 .13
.13 .13 .13
.17 .21
.09 .13
.15 .11
.16 .24
.04 .13
.06 .13
Experiment 2 Young Older
31 33
.74 .71
.14 .16
.17 .41
Experiment 3 Young Older
25 36
.75 .75
.14 .14
.23 .45
HEALY, LIGHT, AND CHUNG
774
much more common for older adults. These findings join other experimental evidence for reduced associative recognition in older adults, even when performance in item recognition appears to be constant across age (NavehBenjamin, 2000; Naveh-Benjamin et al., 2003).
Experiment 2 Experiment 1 demonstrated an age-related impairment in associative recognition in that older adults had considerable difficulty rejecting rearranged pairs. In Experiment 2, we asked whether this pattern would change if participants were encouraged to use recollection when making associative recognition decisions. Previous research by Rotello et al. (2000, Experiments 1 and 2) has shown that the use of recall to reject is under strategic control. In their experiments, people studied lists of singular and plural nouns. Estimates of the upper x-intercept were considerably lower when participants believed that using the recall-to-reject strategy was to their advantage in discriminating whether test probes were either identical to studied words or plurality-reversed lures. Item-by-item monitoring of the similarity of test probes to studied items has also been found to reduce older adults’ source misattribution errors in the false-fame paradigm (Multhaup, 1995) and false recognition of related lures in the Deese/Roediger–McDermott paradigm (Koutstaal, Schacter, Galluccio, & Stofer, 1999). Experiment 2 addressed the issue of whether older adults are impaired in associative recognition tasks because they do not spontaneously form associations during study or engage in recollection but can do so if actively encouraged.
Method Participants. Descriptive information for the 31 young adults (18 women and 13 men) and 33 older adults (26 women and 7 men) who participated in Experiment 2 is given in Table 1. Data from an additional 6 young and 7 older adults were collected but not analyzed. Of the young adults, 3 had previously participated in another associative recognition study, data from 1 were lost because of equipment failure, 1 did not follow the instructions, and 1 did not distribute confidence judgments sufficiently to permit model fitting. Of the older adults, 1 performed at chance when making intact/new decisions, data from 1 were lost because of equipment failure, 1 did not complete the associative recognition task, and 3 were over 80 years in age. In addition, data from 1 older adult who had zeroes on the computation span and CVLT measures were also removed from the analyses. Materials. The stimuli and testing apparatus were the same as in Experiment 1. Design and procedure. The experimental procedure was nearly identical to Experiment 1, save for two changes. First, study duration was fixed at a rate of 3 s/word pair for all participants. Second, and more important, participants were given instructions that explicitly encouraged them to form associations between words in a pair. The instructions were designed to encourage participants to actively form associations between word pairs during encoding and to use these associations during test. An excerpt from the instructions appears below. While you are studying the word pairs, we would like you to try and form meaningful associations between the words. Most of the time, the words in the pair will not have an obvious relationship. For example if you see “KING PEAR,” you could say to yourself, “The plump KING looks like a PEAR.” The associations you make will of course be up to you, but by forming the associations you should be able to remember the word pairs better. To determine if you’ve seen a word pair before, we want you to try and remember any associations
you might have formed during study. For example, you might see “KING PEAR” or “KING FORK.” If you think about the association you made for the word “King” (“The plump KING looks like a PEAR”), you will have an easier time differentiating between the three kinds of test pairs. Participants were reminded before each study list to form associations between pairs and before each test to use these associations to decide whether pairs were “old” or “new.”
Results The hit and FA rates (see Table 2) were analyzed as in Experiment 1 and yielded similar results. The hit rates for intact pairs did not differ across groups (t ⬍ 1). The older adults made more FAs than young adults, F(1, 62) ⫽ 12.53, p ⬍ .01, 2 ⫽ .17, and rearranged pairs attracted more FAs than new pairs, F(1, 62) ⫽ 150.77, p ⬍ .01, 2 ⫽ .71. There was a strong interaction of group and FA type, F(1, 62) ⫽ 50.26, p ⬍ .01, 2 ⫽ .45, produced by older adults having very elevated FA rates to rearranged pairs, t(62) ⫽ ⫺4.95, p ⬍ .01, 2 ⫽ .28, but not higher FA rates to new pairs, t(62) ⫽ ⫺1.05, p ⫽ .30, 2 ⫽ .02. The instructions in Experiments 1 and 2 varied only in the extent to which they stressed forming associations during study and using recollection strategies at test. Tests of instructional effects were carried out by adding an experiment factor to the hit and FA ANOVA designs reported above. These tests showed that hit rates were not affected by instructions, and once again, neither age nor any of the interactions of age with other factors was significant (all Fs ⬍ 1.68). There was, however, a significant three-way interaction observed in the FA ANOVA, F(1, 179) ⫽ 11.91, p ⬍ .01, 2 ⫽ .06. The FA rates for new pair lures were not affected by instructions for either age group (both ts ⬎ ⫺0.94). More elaborate instructions did, however, lower young adults’ FA rates to rearranged pairs so that these were smaller in Experiment 2 than in Experiment 1, t(88) ⫽ 2.47, p ⬍ .05, 2 ⫽ .06. Of primary interest, elaborated instructions did not lead to lower FAs for rearranged lures in older adults, t(99) ⫽ 0.19.
Discussion The results of Experiment 2 confirm and extend those of Experiment 1. Older adults once again incorrectly accepted rearranged pairs at a higher rate than young adults. Moreover, only young adults benefited from the augmented instructions; young adults made fewer FAs on rearranged pairs in Experiment 2 than in Experiment 1. Older adults’ FA rates on rearranged pairs were actually slightly higher in this study than in Experiment 1. Thus, the evidence suggests that both the use of a recall-to-reject mechanism and its strategic control are negatively affected by aging. Although cross-experiment comparisons must be interpreted with appropriate caution, we believe that the evidence speaks convincingly to an impairment in forming and/or utilizing associations in old age.
Experiment 3 In Experiment 3, both item and associative recognition processes were examined in young and older adults. In both types of tests, participants studied pairs of words. The associative recogni-
MODELS OF RECOGNITION MEMORY
tion test followed the same format as in the previous experiments. The item recognition test, however, consisted of single words that had originally been studied as members of word pairs, and participants rated how confident they were in the “old”/“new” status of the test word. The primary questions of interest in Experiment 3 were whether the model fits for item recognition would yield evidence for recollection and, if so, whether this evidence would be equally strong for young and older adults.
Method Participants. Twenty-five young adults (16 women and 9 men) and 36 older adults (21 women and 15 men) were sampled from the same populations as in Experiments 1 and 2. Their demographic characteristics appear in Table 1. Data from an additional 9 young adults and 2 older adults were collected but not analyzed. Two young adults performed at chance when making intact/new discriminations, 3 did not distribute confidence judgments sufficiently to permit ROC curve fitting, 2 did not complete the experiment, 1 learned English after the age of 8 years old, and data from 1 were lost because of experimenter error. Of the older adults, 1 performed at chance when making intact/new judgments, and experimenter error resulted in the loss of data from the other. Materials. The 972 targets and 112 buffers used as stimuli were randomly selected from the same corpus of words described in Experiment 1. Target words were between four and nine letters long, had an average frequency of 74.59 occurrences per million (SD ⫽ 104.74, range ⫽ 1– 897), and had an average of 1.81 syllables per word (range ⫽ 1– 4). Design and procedure. Six associative and four item recognition tests were completed in two sessions that occurred within a 2-week period. The number of item and associative recognition tests was randomized such that there were from one to three item tests and from two to four associative tests per session. Each session began with one practice item recognition test and one practice associative recognition test that were not analyzed. Each study list consisted of 24 target pairs, two primacy buffers, and two recency buffers presented at a rate of 3 s/pair. Word pairings and assignment of stimuli to testing conditions were determined randomly for each participant. After a study list was presented, participants were randomly given either an item or an associative recognition test. Participants were told that they would not know what type of test they would take until after the study list was completed. Each item recognition test consisted of a random ordering of 18 old and 18 new words displayed one at a time on a computer monitor. Half of the old items had been studied as left-side partners, and the other half had been right-side partners. The testing conditions were constrained so that no more than two left– old, two right– old, or four new words could appear consecutively. The associative recognition tests contained 12 intact, 12 rearranged, and 12 new pairs. Unlike Experiments 1 and 2, both right and left members of study pairs occurred in rearranged pairs, though not, of course, with their original partners. The instructions in Experiment 3 were similar to those in Experiment 1, but they also described the item recognition tests. Participants were instructed to rate intact pairs and studied words as “old” (4, 5, or 6 on the rating scale) and rearranged pairs, new pairs, and unstudied words as “new” (1, 2, or 3 on the rating scale). Confidence ratings were made aloud and recorded by a research assistant before a new trial was initiated. After each session’s primary task was completed, participants were given a battery of auxiliary measures as in the previous experiments. A typical testing session lasted between 1 and 2 hr, and participants were paid $20 per session.
Results and Discussion Associative recognition. The analysis strategy used in Experiment 3 followed those in the previous experiments. Hit and FA
775
rates are given in Table 2. Young and older adults both had hit rates of .75 for intact pairs. FAs were analyzed with an Age ⫻ FA type repeated-measures ANOVA. The age, F(1, 59) ⫽ 14.75, p ⬍ .01, 2 ⫽ .20, and FA type, F(1, 59) ⫽ 177.61, p ⬍ .01, 2 ⫽ .75, effects were reliable, as was their interaction, F(1, 53) ⫽ 12.62, p ⬍ .01, 2 ⫽ .18. Older adults had higher FA rates on rearranged pairs, t(59) ⫽ ⫺3.95, p ⬍ .01, 2 ⫽ .21, as well as on new pairs, t(59) ⫽ ⫺3.06, p ⬍ .01, 2 ⫽ .14. The critical result, however, was the same here as in Experiments 1 and 2, namely, a disproportionately greater age difference on rearranged pairs. Item recognition. Hit and FA rates were calculated as the proportions of old and new words judged as “old.” Hit rates were almost identical for young (M ⫽ .80, SD ⫽ .11) and older adults (M ⫽ .79, SD ⫽ .10; t ⬍ 1). However, young adults (M ⫽ .20, SD ⫽ .13) made fewer FAs than older adults (M ⫽ .30, SD ⫽ .14), t(59) ⫽ ⫺2.78, p ⬍ .01, 2 ⫽ .12. Experiment 3 provides a further demonstration of an associative deficit in older adults. Experiment 3 also showed evidence of an age-related deficit in item recognition inasmuch as older adults’ FA rates on new items exceeded those of young adults, though there was no age difference in hit rates on old items.
Formal Modeling of Experiments 1–3 Model fitting was carried out using routines developed by Macho (2002, 2004)4 that utilize Microsoft Excel’s Solver function to simultaneously fit intact/new and rearranged/new ROCs by maximum-likelihood estimation. Model goodness of fit was assessed by computing Akaike’s information criterion (AIC; Akaike, 1973) and the Bayesian information criterion (BIC; Schwarz, 1978). The AIC and BIC are goodness-of-fit measures that take into account the number of parameters estimated by a model, penalizing models with fewer degrees of freedom. For both indices, smaller values indicate better fit. Macho (2004) reported that these indices do not always yield uniform conclusions about goodness of fit, and as is shown below, this is also true for our data. Seven models were fit to the associative recognition data. The first model was the UVSDT model with parameters d⬘I/N, d⬘R/N, and I,R/N. This model does not contain recollection parameters. All of the remaining models that we considered do include recollection components. The second and third were two dual-process models with parameters RO, RN, d⬘I/N, and d⬘R/N. These dualprocess models differed only in whether the intact and rearranged (old) distributions had the same variability as the new distribution (the DP-EV model) or whether variability for the intact and rearranged distributions was estimated (the DP-UV model). As in the UVSDT model, a single I,R/N was shared between the intact and rearranged distributions (e.g., Macho, 2004; Quamme, Frederick, 4
The equations used can be found in Macho (2002, 2004), and his Excel files are available for downloading at http://www.unifr.ch/psycho/general/ german/people/macho/mar.php. Our approach differed in one important respect from that of Macho (2004). Because our goal was to see how recollection and familiarity estimates for young and older adults vary as a function of model, we fit only full versions of each model and did not attempt to systematically constrain parameters within a given model to obtain the best fitting version of that model. This permitted us to compare models that all had parallel sets of parameters. Comparisons of our results with those of Macho (2004) must thus be understood with this caveat.
776
HEALY, LIGHT, AND CHUNG
Kroll, Yonelinas, & Dobbins, 2002). The fourth model was the SON model (Kelley & Wixted, 2001) with parameters RO, RN, d⬘ITEM, d⬘ASSOC, and ASSOC. The RO and RN parameters in this model represent the probabilities that associative familiarity will be added to or subtracted from item familiarity when making judgments on intact and rearranged test pairs, respectively. We called this version of the model the SON-UV (unequal-variance) model and also examined a fifth model in which item and associative variability were fixed at 1, the SON-EV (equal-variance) model. In both SON models, ITEM ⫽ 1, following Macho (2004) and simplifying comparison of his results with ours. The sixth and seventh models were equal- and unequal-variance versions of the MPTSDT model (MPTSDT-EV and MPTSDT-UV, respectively; Macho, 2002, 2004), in which recollection responses could be distributed across confidence categories. Macho (2002, 2004) also constrained the distribution of recollection responses to old items and to lures to be symmetrical, so that, for instance, the likelihood of an old recollected item receiving a confidence judgment of 6 was equal to that of a new (to-be-rejected) recollected lure receiving a 1. To keep the number of estimated parameters on a par with those of other models, we also constrained the distributions of confidence judgments to be symmetrical. We did not fit an equalvariance signal detection theory model because in preliminary analyses of the data at the group level, this model performed dismally (worst for all fit statistics for both age groups in all three experiments). Space considerations preclude a full discussion of the analyses of pooled data, but we refer to them when it is theoretically important to do so, and a summary may be obtained directly from us. Given a 6-point rating scale and three types of test items, the maximum number of data points was 18, with 15 degrees of freedom. At the individual level, 15 independent data points were not invariably available because participants did not distribute judgments across all rating categories for each item type. Missing data can be problematic inasmuch as they may influence goodnessof-fit measures (because there are fewer data points to explain). The MPTSDT models estimate distributions of recollected responses across confidence classes and make symmetry assumptions about to-be-accepted targets and to-be-rejected lures. For these models, we could not see an easy way to deal with different patterns of missingness across confidence judgments in different participants. (Three young and 4 older adults had 3-point ROCs, and 9 young and 9 older adults had 4-point ROCs instead of 5-point ROCs.) For the MPTSDT models, then, we report analyses based solely on the subset of participants who had complete data. We considered reporting for all models only analyses of parameter estimates based on participants who fully distributed their confidence judgments but decided that the gain in power produced by retaining as many data sets as possible for each model justified the approach taken. When appropriate, we assessed the effect of missing data by comparing the results of analyses including all participants with those including only people who had complete ROC data. Both sets of analyses led to the same substantive conclusions. When fitting models at the individual level, we placed upper and lower bounds on some parameters. In choosing constraints, we sought values that could adequately describe the data while at the same time limiting the extent to which instability in the data would lead to extreme values. For all models, values of d⬘ were con-
strained to lie within theoretical limits given the number of trials in a condition (Macmillan & Creelman, 1991, p. 10). For Experiments 1 and 2, this value was ⫾5.12, and for Experiment 3, it was ⫾4.92. The same upper bounds were used for d⬘ in the SON model, but the lower bound was set to 0. For the DP-UV, DP-EV, and UVSDT models, I,R/N was constrained to fall within the interval from 1.00 to 4.00 under the assumption that target items have greater variability than new items. The lower bound on ASSOC was reduced to 0.80 for the SON models because Macho (2004) found that ASSOC did not always exceed 1.00, as predicted by Kelley and Wixted (2001); using this lowered boundary permitted further empirical examination of the range of this parameter. All recollection parameters were constrained to be within the interval from 0 to 1. The final constraint was that response criteria had to be within the interval of ⫾3.10 standard units. In some of the models that we examined, was not fixed at 1.00. When is greater than 1.00, d⬘ overestimates the mean difference between the target and lure distributions (Wickens, 2002). This can lead to erroneous conclusions about processes underlying empirical phenomena (for an example, see Verde & Rotello, 2003). We therefore also report da (Kijewski, Swensson, & Judd, 1989; Simpson & Fitter, 1973) when we estimated because it provides a better index of sensitivity in unequalvariance situations.
Model Fits: Associative Recognition Although model fitting was carried out at the individual level, Figure 1 shows ROCs for pooled data for young and older adults in each experiment so that the reader can have a feel for the appearance of these. To evaluate model fit, we examined AIC and BIC measures in three ways. First, we computed mean AIC and mean BIC for each model, separately for young and older adults in each experiment, for all participants who had complete rating data. Second, we ranked the AIC and BIC values for the seven models for these participants and computed mean ranks for each model for young and older adults in each experiment. Third, we simply counted how many times each model gave the best fit for individuals in each age group in each experiment. The first two approaches led to very similar conclusions, and because some reviewers of this article raised a concern about the appropriateness of taking means of AIC and BIC measures (because of possible nonlinearities), our discussion focuses on the ranking data. Table 3 gives mean ranks for the AIC and BIC indices as well as counts of the frequency with which the various models provided the best and second-best fits for young and older adults in each experiment. Although the ordering of models was not entirely consistent across AIC and BIC measures, important regularities are evident in Table 3. Most notably, for mean AIC ranks, the SON-EV model came in first in all three experiments for young adults as well as in Experiments 1 and 2 for older adults; the SON-UV model was in second place for young adults in the first two experiments and for older adults in Experiment 2. For mean BIC ranks, the UVSDT model was the runaway winner, coming in first in all cases except for young adults in Experiment 2, where it was second best after the SON-EV model. The SON-EV model came in second for young adults in Experiments 1 and 3 and for older adults in Experiments 1 and 2. The AIC and BIC indices clearly do not yield the same orderings of goodness of fit. Macho (2004) reported
MODELS OF RECOGNITION MEMORY
777
Figure 1. Intact/new, rearranged/new, and intact/rearranged receiver operating characteristics for young and older adults in Experiments 1, 2, and 3 (pooled data).
inconsistencies across fit statistics for the models he examined for the pooled response data from Yonelinas (1997) and Kelley and Wixted (2001). We do not, at this time, have an adequate explanation for the discrepancies in ordering of model fit statistics across measures, and we share Macho’s view that there is no simple way to decide among alternative models. These outcomes for mean ranks would lead one to expect that, at the level of individual participants, the SON-EV and the UVSDT models would garner the most top ranks for AIC and BIC measures, respectively. For the BIC measure, the model with the lowest mean rank always had the most votes for best fit except for young adults in Experiment 2, where the UVSDT model had the edge, making this model the universal winner. However, the
UVSDT model made a better showing for the AIC measure than might have been anticipated solely from examination of mean ranks. For this measure, collapsing across age and experiment, the UVSDT model had 76 top votes, and the SON-EV model came in second with 60. There was some difference across young and older adults in this regard, with 27 young and 49 older adults being favored by the UVSDT model and 34 young and 26 older adults by the SON-EV model. When both first- and second-place votes were considered for the AIC measure, the edge went to the SON-EV model with 119 votes versus 93 for the UVSDT model. As shown in Table 3, the spread in mean rank AIC values was relatively small compared with that for the BIC measure, and when we examined the AICs of individual participants, these often were
HEALY, LIGHT, AND CHUNG
778
Table 3 Associative Recognition: Best Fitting Models for Individual Data
Model UVSDT Young Older DP-UV Young Older DP-EV Young Older SON-UV Young Older SON-EV Young Older MPTSDT-UV Young Older MPTSDT-EV Young Older
Experiment 1 mean ranks
Experiment 2 mean ranks
Experiment 3 mean ranks
AIC (1, 2)
BIC (1, 2)
AIC (1, 2)
BIC (1, 2)
AIC (1, 2)
BIC (1, 2)
4.50 (13, 4) 3.52 (22, 3a)
2.16 (30, 13) 1.79 (37, 6)
4.28 (6, 3) 3.79 (10, 3)
2.20 (15, 5) 1.55 (22, 2)
3.82 (8, 1) 2.68 (17, 3)
1.59 (13, 6) 1.21 (30, 1)
4.05 (1, 8) 4.91 (1, 2)
4.64 (0, 2) 5.25 (0, 0)
4.96 (0, 0) 4.83 (0, 2)
5.32 (0, 0) 5.17 (0, 0)
4.09 (1, 5) 5.18 (0, 1)
4.91 (0, 0) 5.26 (0, 0)
4.02 (10, 9) 3.88 (8, 8a)
3.20 (8, 12) 3.15 (4, 15)
4.70 (1, 4) 3.59 (7, 6)
3.28 (0, 9) 2.79 (2, 12)
3.32 (6, 5) 2.79 (5, 12)
2.36 (6, 5) 2.50 (1, 17)
3.18 (6, 14) 3.53 (2, 12)
4.07 (2, 4) 4.17 (0, 3)
3.36 (2, 7) 3.38 (2, 16)
4.20 (0, 2) 4.21 (0, 1)
3.82 (2, 1) 4.56 (1, 1)
4.73 (0, 1) 4.79 (0, 0)
2.62 (20, 13) 2.57 (13, 17)
2.29 (15, 25) 2.17 (11, 26)
2.64 (11, 5) 2.72 (7, 7)
1.88 (10, 9) 2.31 (5, 14)
2.82 (3, 6) 2.82 (6, 11)
2.27 (3, 10) 2.50 (3, 15)
4.62 (3, 2) 5.34 (0, 6)
6.43 (0, 0) 6.64 (0, 0)
4.62 (1, 1) 5.10 (0, 3)
6.68 (0, 0) 6.79 (0, 0)
5.45 (0, 2) 5.79 (1, 1)
6.91 (0, 0) 6.94 (0, 0)
5.00 (3, 6) 4.26 (6, 6)
5.21 (1, 0) 4.83 (1, 3)
3.44 (4, 5) 4.59 (3, 2)
4.44 (0, 1) 5.17 (0, 0)
4.68 (2, 2) 4.18 (4, 5)
5.23 (0, 0) 4.79 (0, 1)
Note. Values shown are mean ranks. The lowest Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) mean ranks appear in boldface roman type. The second lowest AIC and BIC mean ranks appear in boldface italic type. Numbers in parentheses indicate the frequency with which a model provided the best and second-best fits for the particular group of participants in each experiment. UVSDT ⫽ unequal-variance signal detection theory model; DP-UV ⫽ dual-process model, unequal variance; DP-EV ⫽ dual-process model, equal variance; SON-UV ⫽ some-or-none model, unequal variance; SON-EV ⫽ some-or-none model, equal variance; MPTSDT-UV ⫽ multinomial processing tree signal detection theory model, unequal variance; MPTSDT-EV ⫽ multinomial processing tree signal detection theory model, equal variance. a Indicates a tie in ranks.
very close for the top models. Thus, it is perhaps not surprising that the advantage for the SON-EV model was less lopsided for the individual fits than might be expected for the AIC measure. What is very clear from these analyses, however, is that the SON-EV and UVSDT models fared very well in terms of fit indices in our studies, whereas some of the remaining ones, especially the DP-UV model and the two MPTSDT models, fared extremely poorly. We turn now to the most important questions for us—whether the parameter estimates for these models behave in a manner that is consistent in the conclusions they permit us to draw about aging and whether those conclusions make theoretical sense. We also discuss the way that elaborated instructions affect parameter estimates for the various models (if they do) by reporting analyses that contrast Experiments 1 and 2. In the behavioral data, stressing the formation and use of associations during recognition was associated with decreased FAs on rearranged lures in young, but not older, adults; the question of interest is how this is reflected in different models at the level of component-process parameter estimates.
Parameter Estimates: Associative Recognition Parameter estimates for the various models are found in Tables 4, 5, 6, 7, 8, 9, and 10. Conclusions based on comparisons of age
differences in parameters for the various models for the full and reduced samples did not differ importantly. UVSDT model. As seen in Table 4, young adults had significantly larger I,R/N and d⬘I/N together with significantly smaller d⬘R/N and da,R/N estimates. The age difference in d⬘I/N is interesting and perhaps surprising given that age differences in hit rates were not observed and age differences in FAs to new lures were either nonsignificant or relatively small across the board. One possibility here is that d⬘ is not a good index of sensitivity in this unequalvariance situation. This conclusion is reinforced by the finding that age differences in da,I/N were not significant in any of our three experiments. The only parameters to show Age ⫻ Experiment interactions representing instructional changes between Experiments 1 and 2 were the d⬘R/N and da,R/N estimates, F(1, 179) ⫽ 6.19, p ⬍ .05, 2 ⫽ .03, and F(1, 179) ⫽ 5.58, p ⬍ .05, 2 ⫽ .03, respectively, which both yielded reduced values of these parameters in young adults when instructions stressed formation and use of associations; older adults had small increases in these values (reflecting slightly poorer performance with augmented instructions). For this model, then, what we can say about age-related differences in familiarity varies with whether we are talking about intact/new or rearranged/new ROCs and choice of measure of sensitivity. Dual-process models. Mean estimates for the DP-UV model can be found in Table 5. Young adults had larger recollection
MODELS OF RECOGNITION MEMORY
779
Table 4 Parameter Estimates: Unequal-Variance Signal Detection Theory Model I,R/N Experiment
d⬘I/N
da,I/N
d⬘R/N
da,R/N
M
SE
M
SE
M
SE
M
SE
Young Older
1.82 1.39***
0.07 0.05
2.53 2.05**
0.16 0.11
1.69 1.69
0.10 0.09
⫺0.11 0.67***
0.12 0.06
0.01 0.60***
0.07 0.05
Young Older
1.91 1.42***
0.12 0.08
2.81 2.07***
0.19 0.13
1.88 1.70
0.13 0.10
⫺0.58 0.77***
0.19 0.08
⫺0.26 0.68***
0.09 0.08
Young Older
1.65 1.31***
0.11 0.04
2.86 2.21***
0.19 0.15
2.15 1.88
0.16 0.12
0.22 1.01***
0.16 0.08
0.26 0.86***
0.11 0.06
1 2
M
SE
3
** p ⬍ .05.
*** p ⬍ .01.
estimates than older adults. Conclusions about familiarity in the DP-UV model were constant across d⬘ and da measures. Any age differences in d⬘I/N and da,I/N were small, but more substantial differences were present for d⬘R/N and da,R/N. In Experiment 1, young adults had significantly higher RO, RN, and I,R/N estimates; older adults had higher d⬘R/N and da,R/N estimates; and the differences in d⬘I/N and da,I/N estimates were not significant. In Experiment 2, RN and I,R/N were significantly larger for young adults, whereas RO was just numerically larger. The differences in d⬘I/N and da,I/N were nonsignificant. The differences in d⬘R/N and da,R/N were significant— older adults were worse at rejecting rearranged pairs than were young adults. Finally, in Experiment 3, the only significant age difference was in RN, though for most parameters, the direction of the numerical differences was the same as in the other studies. We note one discrepancy that was clearly visible in the group and individual fits for this model—the young adult group fit yielded an estimate of RN ⫽ 0 for Experiment 2, whereas all other values of this parameter were in the neighborhood of .20 for young adults at both the aggregated and individual ROC levels. This result is not a fluke; we observed a similar outcome in analyses not reported here when we modeled pooled data from young adults in Light et al. (2004, Experiment 2). We believe that this finding highlights the importance of fitting multiple data sets and examining parameter estimates at the individual level.
Parameter estimates for the DP-EV model appear in Table 6. Significant age differences favoring young adults in RO and RN appeared in every experiment except for RO in Experiment 3, where the difference also favored the young. The only significant difference in d⬘I/N was in Experiment 1. Unlike the DP-UV model in which the d⬘R/N parameter was consistently larger for older adults in both Experiments 1 and 2, here the difference was significant only for Experiment 2, though once again the direction of the difference was the same in all three experiments. In the DP-EV and DP-UV models, then, age differences in associative recognition show up most consistently in recollection parameters but can sometimes be observed in familiarity estimates, depending to some extent on whether equal- or unequal-variance models are fit. Finally, the manipulation of instructions across Experiments 1 and 2 produced an Age ⫻ Experiment interaction in d⬘R/N, F(1, 179) ⫽ 4.26, p ⬍ .05, 2 ⫽ .02, that was also marginally significant in da,R/N, F(1, 179) ⫽ 3.44, p ⬍ .07, 2 ⫽ .02, for the DP-UV model; young adults’ estimates decreased from Experiment 1 to Experiment 2, whereas older adults’ estimates increased. The DP-EV model generated only a main effect of instructions, and that was for a marginal increase in RO, F(1, 179) ⫽ 3.78, p ⬍ .06, 2 ⫽ .02. SON models. Parameter estimates for the SON-UV and SON-EV models can be found in Tables 7 and 8, respectively. In
Table 5 Parameter Estimates: Dual-Process Unequal-Variance Model RO Experiment
I,R/N
RN
d⬘I/N
da,I/N
d⬘R/N
da,R/N
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
Young Older
.35 .24***
.03 .03
.19 .07***
.02 .02
1.37 1.19***
0.05 0.03
1.44 1.47
0.11 0.09
1.18 1.35
0.08 0.08
0.52 0.88***
0.09 0.06
0.49 0.82***
0.07 0.06
Young Older
.39 .30
.04 .04
.20 .05***
.04 .02
1.40 1.21**
0.08 0.05
1.58 1.39
0.19 0.11
1.32 1.26
0.15 0.10
0.21 0.94***
0.11 0.07
0.25 0.87***
0.09 0.07
Young Older
.39 .31
.04 .03
.23 .08***
.04 .02
1.20 1.13
0.05 0.03
1.78 1.61
0.16 0.12
1.66 1.50
0.17 0.11
0.94 1.18
0.13 0.08
0.89 1.11
0.13 0.07
1 2 3
** p ⬍ .05.
*** p ⬍ .01.
HEALY, LIGHT, AND CHUNG
780
Table 6 Parameter Estimates: Dual-Process Equal-Variance Model RO Experiment
RN
d⬘I/N
d⬘R/N
M
SE
M
SE
M
SE
M
SE
Young Older
.41 .28***
.03 .02
.29 .10***
.03 .02
1.12 1.31*
0.08 0.08
0.81 0.95
0.05 0.06
Young Older
.47 .34**
.03 .04
.35 .11***
.04 .02
1.28 1.23
0.13 0.10
0.70 1.04***
0.08 0.07
Young Older
.41 .35
.05 .03
.29 .11***
.05 .02
1.66 1.45
0.17 0.11
1.12 1.20
0.10 0.08
1 2 3
Note. This model also used the parameter I,R/N for all groups and experiments mentioned in the table, with I,R/N fixed at 1.00. * p ⬍ .10. ** p ⬍ .05. *** p ⬍ .01.
the SON models, ITEM is fixed at 1 for the intact/new ROC, so for this ROC, only d⬘ITEM (and not da,ITEM) is estimated. We first describe results for the SON-UV model. Estimates of ASSOC were uniformly greater than 1.00 ( ps ⬍ .05, one-tailed) for all comparisons except Experiment 3 young adults ( p ⬍ .07, one-tailed). Interpretation of these parametric tests requires caution, however, because 12 out of 59, 14 out of 31, and 12 out of 25 of the young adults and 22 out of 60, 18 out of 33, and 16 out of 36 of the older adults had ASSOC estimates that fell between .80, the lower bound imposed on this parameter for individual data, and 1.00. By binomial test, ASSOC was reliably greater than 1.00 only for the Experiment 1 young and older adults ( ps ⬍ .052). Thus, there is some evidence at the individual level for Macho’s (2004) claim that the associative familiarity distribution may not actually have greater variability than the item distribution, which has the important implication that ASSOC values less than 1.00 make associative information more threshold in nature than the continuous process suggested by Kelley and Wixted (2001). The reason for this is that as the associative familiarity distribution becomes less dispersed, the addition or subtraction of associative information operates in a more restricted area of the confidence range. Across the board, young adults had numerically larger recollection estimates. RO was significantly larger for the young in Experiments 2 and 3, and all three RN comparisons were significant and
favored the young. Older adults had significantly larger d⬘ITEM estimates than young adults in Experiments 1 and 2, but the d⬘ASSOC and da,ASSOC estimates were larger for young adults (significantly so in Experiment 1). When Experiments 1 and 2 were compared, young adults continued to have significantly larger RO and RN estimates, F(1, 179) ⫽ 4.33, p ⬍ .05, 2 ⫽ .02, and F(1, 179) ⫽ 28.75, p ⬍ .01, 2 ⫽ .14, respectively, and there was a marginally significant interaction of age and instructions for RO, F(1, 179) ⫽ 3.11, p ⬍ .08, 2 ⫽ .02, as well as for RN, F(1, 179) ⫽ 4.46, p ⬍ .04, 2 ⫽ .02. Both parameters were larger in Experiment 2 for young adults, although the opposite was true for older adults. There was also a suggestion of da,ASSOC being significantly larger in Experiment 2, F(1, 179) ⫽ 3.30, p ⬍ .07, 2 ⫽ .02. In the analyses of individual parameter estimates from the SON-EV model, RN again was significantly larger for young adults, with RO age differences significant in Experiment 2 only, although the numerical difference for young and older adults was nearly as large in Experiment 3. As was the case for the unequalvariance model, older adults had significantly larger d⬘ITEM estimates than young adults in Experiments 1 and 2, but the d⬘ASSOC estimates were larger for young adults (significantly so in Experiment 1). Finding increased item strength and decreased associative strength in older adults in the SON models suggests that they
Table 7 Parameter Estimates: Some-or-None Unequal-Variance Model RO Experiment
ASSOC
RN
d⬘ITEM
d⬘ASSOC
da,ASSOC
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
Young Older
.53 .52
.03 .03
.30 .18***
.03 .03
1.21 1.13
0.04 0.04
0.82 0.99**
0.06 0.06
3.16 2.34***
0.15 0.16
2.83 2.18***
0.12 0.15
Young Older
.64 .50**
.04 .04
.41 .13***
.04 .04
1.15 1.14
0.06 0.06
0.74 1.01**
0.09 0.06
3.28 2.78
0.18 0.24
3.02 2.59
0.14 0.22
Young Older
.59 .47*
.05 .04
.37 .18***
.05 .03
1.08 1.07
0.05 0.04
1.19 1.30
0.10 0.08
3.09 2.57
0.23 0.24
2.92 2.51
0.20 0.23
1 2 3
* p ⬍ .10.
** p ⬍ .05.
*** p ⬍ .01.
MODELS OF RECOGNITION MEMORY
781
Table 8 Parameter Estimates: Some-or-None Equal-Variance Model RO Experiment
RN
M
d⬘ITEM
SE
M
SE
M
d⬘ASSOC SE
M
SE
1 Young Older
.52 .52
.03 .03
.33 .18***
.03 .03
0.84 0.98*
0.05 0.06
3.00 2.35***
0.13 0.16
Young Older
.63 .47***
.04 .04
.43 .21***
.04 .05
0.74 1.04***
0.07 0.07
3.00 2.52
0.18 0.24
Young Older
.61 .49
.05 .05
.34 .17***
.04 .04
1.09 1.18
0.08 0.07
3.07 2.72
0.23 0.25
2 3
Note. This model also used the parameter ASSOC for all groups and experiments mentioned in the table, with ASSOC fixed at 1.00. * p ⬍ .10. *** p ⬍ .01.
mates for the MPTSDT-UV model were greater in young adults but significantly so only in Experiment 1. The effect of the instructional manipulation in Experiments 1 and 2 was isolated to RN, which showed significant Age ⫻ Experiment interactions in both the MPTSDT-EV model, F(1, 159) ⫽ 8.27, p ⬍ .01, 2 ⫽ .05, and the MPTSDT-UV model, F(1, 159) ⫽ 9.75, p ⬍ .01, 2 ⫽ .06. For both models, RN increased for young adults but decreased for older adults. There was also a slight trend in the MPTSDT-UV estimates for RN to be larger in Experiment 2, F(1, 159) ⫽ 2.95, p ⬍ .09, 2 ⫽ .02. Summary: Associative recognition. The meaning of the terms recollection and familiarity is not identical across the models we have fit. However, in all models containing recollection parameters, regardless of how these are construed, young adults had significantly greater values for RN in every experiment and numerically larger estimates for RO that were sometimes significant (though not always for the same experiments). The MPTSDT models also include estimates of the probabilities that recollected responses will be assigned to different confidence-rating classes. In Experiment 1 and perhaps in Experiment 2, there was evidence to suggest that older adults spread their responses more across confidence bins than do young adults, but not in Experiment 3.
attend to individual items within pairs at the expense of relationship between pair members, even when study instructions encourage association formation, as in Experiment 2 (see also NavehBenjamin, 2000, Experiment 2). In terms of the effects of instructions, RN was marginally larger in Experiment 2 than in Experiment 1, F(1, 179) ⫽ 3.26, p ⬍ .08, 2 ⫽ .02, and RO produced an interaction of age and experiment, F(1, 179) ⫽ 5.09, p ⬍ .05, 2 ⫽ .03, such that young adults’ estimates increased by .11 between experiments, whereas older adults’ estimates decreased by .05. Multinomial processing tree signal detection theory models. Tables 9 and 10 give parameter estimates for the MPTSDT models. Analyses of the parameter estimates lead to identical conclusions for the two variants of the MPTSDT model for parameters common to both. RN was significantly higher for young adults in all experiments, as was RO in Experiment 3. There was some evidence for greater distribution of recollection responses across confidence bins for older adults. The age difference was significant for the two strongest response categories, but only in Experiment 1; in Experiment 2, there were somewhat more responses for the lowest confidence bin for older adults. Estimates of d⬘ in both models were significantly larger for older adults, and I,R/N esti-
Table 9 Parameter Estimates: Multinomial Processing Tree Signal Detection Theory Unequal-Variance Model RO Experiment
RN
j1
j2
I,R/N
j3
d⬘I,R/N
da,I,R/N
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
Young Older
.49 .51
.03 .02
.32 .17***
.03 .03
.80 .70***
.02 .03
.15 .26***
.02 .03
.05 .05
.01 .01
1.24 1.16*
0.03 0.03
0.85 1.00*
0.06 0.07
0.77 0.94
0.05 0.07
Young Older
.57 .49
.04 .04
.49 .12***
.05 .03
.83 .76
.03 .04
.14 .18
.03 .03
.02 .06*
.01 .02
1.21 1.18
0.06 0.05
0.84 1.00
0.10 0.06
0.79 0.93
0.10 0.07
Young Older
.58 .45*
.05 .05
.37 .21**
.05 .04
.82 .80
.04 .03
.12 .12
.03 .03
.05 .08
.02 .02
1.17 1.11
0.04 0.03
1.18 1.33
0.10 0.08
1.10 1.26
0.10 0.07
1 2 3
Note. The ji estimates may not sum to 1 because of rounding. * p ⬍ .10. ** p ⬍ .05. *** p ⬍ .01.
HEALY, LIGHT, AND CHUNG
782
Table 10 Parameter Estimates: Multinomial Processing Tree Signal Detection Theory Equal-Variance Model RO Experiment
RN
j1
j2
j3
d⬘I,R/N
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
Young Older
.49 .51
.03 .02
.34 .18***
.03 .03
.81 .70***
.02 .03
.15 .26***
.02 .03
.04 .04
.01 .01
0.86 1.01*
0.05 0.06
Young Older
.57 .48
.04 .04
.49 .13***
.05 .03
.84 .77
.03 .04
.14 .18
.03 .03
.02 .05*
.01 .01
0.87 1.01
0.08 0.06
Young Older
.58 .45*
.05 .05
.38 .22**
.05 .04
.83 .81
.04 .03
.13 .12
.03 .03
.05 .07
.02 .02
1.17 1.29
0.09 0.07
1 2 3
Note. The ji estimates may not sum to 1 because of rounding. This model also used the parameter I,R/N for all groups and experiments mentioned in the table, with I,R/N fixed at 1.00. * p ⬍ .10. ** p ⬍ .05. *** p ⬍ .01.
Conclusions regarding familiarity were considerably more model specific, depending on the ways in which familiarity was construed and on relationships among familiarity and other component processes in a model. Nonetheless, the models do converge on some common themes, which we elaborate on in the General Discussion.
Model Fits: Item Recognition Because the item recognition data yielded only a single old/new ROC, not all of the models fit to the associative recognition data were appropriate. We fit the UVSDT model with parameters O/N and d⬘O/N; the DP-UV model with parameters RO, O/N, and d⬘O/N; the DP-EV model with parameters RO and d⬘O/N; and the MPTSDT-EV model with parameters RO and d⬘O/N. The MPTSDT-UV model was not investigated because, for item recognition, it has df ⫽ 0. The SON models were also not fit because, without associative information, both reduce to equal-variance signal-detection models, which (as discussed earlier) have not received strong support in the literature for item recognition and which did not appear (by visual inspection) to be strong contenders for our data set (see Figure 2). We did not include RN in our models for item recognition because our lures were unrelated to the targets; recall to reject has been implicated as a component process in item recognition tests with lures that are very similar to target items (e.g., Arndt & Reder, 2002; Hintzman & Curran, 1994), but our item lures were randomly selected. Table 11 presents fit statistics for participants who had complete ROC data for item recognition. For all indices, the UVSDT and DP-EV models clearly excelled in fits of individual participants, with the DP-UV and MPTSDT-EV models consistently placing last, especially for the BIC measure. The parameter estimates for all models of item recognition are also given in Table 11. For analyses of these parameter estimates, we included all participants who could be fit by a particular model rather than just those who had complete data sets. The O/N, d⬘O/N, and da,O/N estimates for the UVSDT were all larger for the young adults but significantly so only for d⬘O/N and da,O/N. The RO, O/N, d⬘O/N, and da,O/N estimates in the DP-UV model tended to be larger for the young adults, with both RO and da,O/N significantly larger. Young adults’ DP-EV parameter estimates were also larger, but
contrary to expectation, a significant age difference appeared in d⬘O/N and not in RO. The MPTSDT-EV model produced numerically larger RO and d⬘O/N estimates for young adults, and young adults’ ji estimates tended to be slightly more equally distributed across confidence categories; however, none of these apparent effects proved to be statistically significant. From the perspective of the two top-fitting models for individual data, we would conclude that the only age effect in item recognition lies in familiarity. This finding may seem somewhat surprising, but similar results have been reported by Prull et al. (2004) and Harkins et al. (1979).
General Discussion Our primary goal in this research was to examine possible differences in the contributions of familiarity and recollection to item and associative recognition in young and older adults by
Figure 2. Old/new receiver operating characteristics for young and older adults in Experiment 3 (pooled data).
MODELS OF RECOGNITION MEMORY
783
Table 11 Item Recognition: Fit Indices and Parameter Estimates for Individual Data
Model UVSDT Young Older DP-UV Young Older DP-EV Young Older MPTSDT-EV Young Older
Mean rank
First, seconda
AIC
BIC
AIC
1.45 1.38
1.32 1.29
3.09 3.15
3.00 3.03
2.14 1.92
1.68 1.71
3.32 3.55
4.00 3.97
O/N
RO
BIC
M
SE
14, 6 15, 7 21, 10c 23, 9c 0, 3 0, 2
0, 0 0, 0
1.46 1.36 .22 .05 1.23 .09** .03 1.28
7, 9 7, 15 .33 9, 17c 9, 23c .28 1, 4 2, 3
0, 0 0, 0
M
.54 .44
d⬘O/N SE
M
da,O/N SE
M
j1 SE
M
j2 SE
M
j3 SE
M
d⬘I,R/N SE
M
SE
0.08 2.25 0.18 1.76 0.11 0.06 1.67*** 0.10 1.38*** 0.08 0.07 1.72 0.06 1.47
.05 1.00b .04 1.00b .05 .05
1.33 1.02**
0.14 1.50 0.10 1.26*
0.10 0.08
0.10 0.09 .72 .04 .19 .04 .09 .03 0.94 0.17 .78 .04 .16 .04 .06 .02 0.75 0.10
Note. AIC ⫽ Akaike’s information criterion; BIC ⫽ Bayesian information criterion; UVSDT ⫽ unequal-variance signal detection theory model; DP-UV ⫽ dual-process unequal-variance model; DP-EV ⫽ dual-process equal-variance model; MPTSDT-EV ⫽ multinomial processing tree signal detection theory model, equal variance. a The frequency with which a model provided the best and second-best fits for each group. b Fixed parameter. c Tie in ranks. * p ⬍ .10. ** p ⬍ .05. *** p ⬍ .01.
estimating parameters representing these constructs from several contemporary models of associative recognition. At a purely behavioral level, our results are straightforward and entirely consistent with other findings in the literature— older adults had poorer associative recognition than young adults in Experiments 1–3, manifested by higher FA rates on rearranged lures, as well as poorer item recognition in Experiment 3. The issue of interest is how these age differences have been captured in component processes by the various models that we fit to the data. Our findings suggest that prior claims—including some of our own—that recollection, but not familiarity, is impaired in older adults may be too simplistic and that a more nuanced approach is needed. We first reprise the conclusions we believe can be drawn about age differences in recollection and familiarity within the context of the seven models of associative recognition we have considered here and then turn to a broader consideration of the implications of our findings for the general viability of these models.
Recollection and Familiarity in Old Age With respect to recollection, regardless of the specific way that this construct is defined within the models of associative recognition we fit to our data, the results strongly confirm the existence of age differences in recollection and in effects of instructional variables on this parameter. Importantly, for understanding age differences in associative recognition, it does not seem to matter much if recollection is viewed as a discrete process that serves as an independent source of information about whether test pairs are old or new or if it is viewed as a continuous variable representing associative information that can be added to or subtracted from (for our data, particularly the latter) item information to yield a single strength-of-evidence variable. Interestingly, Macho (2004) found it possible to fit the two-high-threshold signal-detection model (what we have called the DP-UV model) to aggregated data from Yonelinas’s (1997) Experiments 1 and 2 (but not Experiment 3) data, constraining RN ⫽ 0. In our experiments, however, age
differences in recollection were carried primarily by this parameter, rather than by RO, despite the fact that estimates of RO were uniformly larger than those of RN (see Tables 4 –10). In contrast to recollection, there is not a single answer to the question of whether familiarity is affected by aging. What we can say about familiarity in associative recognition for young and older adults must be conditioned on selection of models as well as choice of indices of sensitivity. In the UVSDT model, there are two familiarity parameters, one derived from scaling intact pairs against new pairs and one derived from scaling rearranged pairs against new pairs. Age differences in d⬘I/N and in d⬘R/N were seen in each experiment, with higher values of the former and lower values of the latter seen for young adults, but age differences using da were present for only the rearranged/new ROC. On the basis of this model and the choice of d⬘ as our index of sensitivity, we would conclude that there are age differences in familiarity when people must differentiate between intact pairs and both dissimilar and similar lures, but on the basis of da, we would argue that older adults have problems only when potential targets and lures are very similar, something that is also seen in singular/plural discrimination (Light, Chung, Pendergrass, & Van Ocker, in press). Note that defining familiarity in terms of pitting one type of FA against another here leads to conclusions that familiarity is higher in older adults, but this is not evidence for better performance in this age group. The UVSDT model is nested in the DP-UV model, so one might expect the behavior of the familiarity parameters d⬘I/N and d⬘R/N to be similar in these models. However, this is not the case. Age differences were not found for either d⬘I/N or da,I/N in the DP-UV model, but we did see larger values for older than for young adults in both d⬘R/N and da,R/N in Experiments 1 and 2 (though neither difference was significant in Experiment 3). When I,R/N ⫽ 1 in the DP-EV model, there was no evidence for better performance in young adults in discriminating intact from new pairs, and only the Experiment 2 age difference in d⬘R/N was significant. Clearly, the
784
HEALY, LIGHT, AND CHUNG
nature and magnitude of age differences in familiarity in these models vary with the nature of the distractor set under consideration, whether I,R/N is fixed or estimated, whether recollection parameters are set to 0 or estimated, and, for at least one model, the choice of d⬘ or da. For both the SON-UV and SON-EV models, older adults had significantly higher values of d⬘ITEM for Experiments 1 and 2 together with smaller values of d⬘ASSOC; the latter differences were significant for Experiment 1 in the DP-UV model (this was true for da,ASSOC as well) and for both Experiments 1 and 2 in the DP-EV model. Larger values of d⬘ITEM in this model suggest a greater contribution of item information to strength of evidence for older than for young adults, whereas reduced values of d⬘ASSOC are in accord with reduced contributions from associative information. Finally, in the MPTSDT models, there was very little evidence for any age difference in familiarity as assessed by either sensitivity index, with the one exception of a larger value of d⬘I,R/N in Experiment 1 for older adults in the MPTSDT-EV estimates. In this model, then, age differences in performance on associative recognition are reflected principally in recollection, with some evidence for greater spread of confidence judgments in older adults. Conclusions about the contributions of recollection and familiarity in item recognition (Experiment 3) must also be contextualized. For old/new discrimination, the UVSDT model best described the ROCs of both young and older adults, but the DP-EV model also did a reasonable job. From the perspective of the UVSDT model, age differences in d⬘O/N and da,O/N were present, but not age differences in O/N. The DP-UV model generated age differences in RO, with sensitivity differences appearing only in da,O/N. When O/N ⫽ 1 in the DP-EV model, only the difference in d⬘O/N was manifest. In short, answers to questions about age differences in familiarity estimated from confidence-rated recognition—and whether such age differences, if present, signify poorer performance in older adults— cannot be given unless the respondent knows the model that the interlocutor has in mind, the role of recollection and familiarity within the model, what else is being estimated, and which measure of sensitivity is being used. The need to contextualize conclusions drawn about age differences in familiarity parameters estimated from ROC curves for associative recognition may appear to contrast with findings from the remember/know paradigm and from the process-dissociation paradigm reviewed earlier. However, it is clear that some older adults do show age differences in familiarity as estimated by Jacoby’s (1991) process-dissociation procedures. Davidson and Glisky (2002) reported that older adults who scored low on tests of medial temporal lobe function had lower estimates of familiarity than older adults who scored high on such tests; the latter did not differ in familiarity estimates from young adults. Also, SchmitterEdgecombe (1999) found numerically lower estimates of familiarity in older than in young adults in one analysis. Interestingly, Toth and Parks (2004) also found lower familiarity estimates for older adults in a process-dissociation task in which partial recollection of noncriterial information contributed to assessments of familiarity. Yonelinas (2002) has suggested that age differences in familiarity estimates can be found when recollection estimates are very high (over .60), but examination of the relationship between size of RO and RN in Experiments 1–3 reveals that these are mostly well
below .60 and that the presence or absence of age differences in d⬘I/N or d⬘ITEM is unrelated to the magnitude of the recollection parameters (see Tables 5– 8), so this is not likely to be an issue here. Our studies are agnostic on one crucial point. Although we have shown that both recollection and familiarity, especially familiarity of rearranged lures, as estimated from confidence-rating data, are reduced in old age in several of the models examined, we cannot identify the source of these differences with any certainty. That is, our studies have not addressed the issue of whether the differences we have observed arise solely from operations performed at retrieval or whether they arise, at least in part, from impoverished encoding. Li (2002) has modeled cognitive aging by reducing responsivity and increasing random variability of activation within a neural network, leading to less distinctive internal representations of external stimuli. As a result, learning novel combinations of events is slower in older adults and may not reach the same asymptote. It is possible that some age differences that appear to require postulation of recollection and familiarity in a dual-process model of retrieval can be explained simply in terms of reduced distinctiveness of memory representations in older adults (see Malmberg, Zeelenberg, & Shiffrin, 2004, for an example) or it may be that the effectiveness of retrieval processes is limited by the quality of representations produced during initial study (e.g., Norman & O’Reilly, 2003). Further empirical and theoretical work is needed to settle these issues.
Implications for Models of Associative Recognition The ideal outcome of our curve-fitting exercise for associative recognition would have been one clear winner among candidate models. Generally speaking, for the analyses of individual participant data that we have presented, the SON-EV and UVSDT models fared well with the AIC measure, and the SON-EV came in second after the UVSDT model for the BIC measure pretty consistently. Thus, it may seem somewhat surprising that we gave serious consideration to other models in discussing age differences in component processes. We did so because results of fit indices were not wholly consistent for our individual and pooled levels of analyses. Although we have argued that conclusions about models are most appropriately drawn from fitting individual ROC data, most discussions of associative recognition models in the literature have involved fits of pooled data in which models other than the SON-EV and UVSDT have had some success in accounting for the data. In comparing our results with those of other investigators, then, we felt it important to summarize our findings fully. Here, we address some discrepancies and some possible anomalies identified by such comparisons. We note first that the SON-EV model fit our individual data quite well but that in our analyses of aggregated data (not reported here), the SON-UV model outperformed the SON-EV model. Thus, some version of the SON model seems to do well across the board. We have no good reasons to offer as to why the SON-UV model is a better choice for pooled responses and the SON-EV for individual data. Kelley and Wixted (2001) predicted greater variability for the associative distribution than for the item distribution, but Macho (2004) did not find it necessary to relax the equality constraint to satisfactorily fit ROCs for their data or for that of Yonelinas (1997) for pooled responses. For our individual data,
MODELS OF RECOGNITION MEMORY
tests of the hypothesis that ASSOC ⫽ 1 did not in all cases provide evidence against Macho’s claim, and examination of estimates for individual participants in our Experiments 1–3 revealed many values ⬍ 1. We also note that Macho found that the MPTSDT model fit data quite well, and he argued that this model could emulate the SON model. Our analyses at the group level are in accord with this conclusion, but at the individual level, the MPTSDT models made quite a poor showing. Analyses of additional data sets are needed to determine if there are conditions that systematically affect the performance of these models and others not examined here. It is important to note that models preferred by particular fit indices (i.e., the UVSDT model for the BIC) are not invariably the models that best track the behavioral data when theoretically important variables are manipulated. Pure strength-based familiarity models have difficulty in accounting for the effects of repetition and deadline on associative recognition and on item recognition with similar distractors (e.g., Hintzman & Curran, 1994; Jones & Jacoby, 2001; Light et al., 2004, in press), and it is widely believed that dual-process models of some ilk are needed to account for associative recognition (Clark & Gronlund, 1996). For example, dual-process models seem necessary to explain findings of ironic effects of repetition in associative recognition, such as older adults having increased rates of FAs to rearranged lures when study pairs are repeated whereas young adults have constant or reduced rates of FAs to these lures with repetition (Light et al., 2004). The DP-EV and DP-UV models, too, do not universally yield plausible outcomes. In fitting pooled data from Kelley and Wixted (2001), Macho (2004) reported that a version of the DP-UV model that fixed RN,WEAK ⫽ RN,STRONG ⫽ RO,STRONG ⫽ 0 was adequate for their Experiments 1–3, suggesting that recall to reject played no role for either strong pairs (presented several times) or weak pairs (presented once) and that recall to accept was used only for weak pairs, a counterintuitive view (see also Quamme et al., 2002, for a similar result for source memory). In analyses of pooled data not reported here, we too observed RN ⫽ 0 for young adult data, though only in Experiment 2. Given that RN values were nontrivial when estimated from the DP-EV model for young adults’ pooled data, we suggest that I,R/N and the recollection parameters are capturing similar aspects of the data. In addition, Macho found that the mean of the familiarity distributions for weak rearranged pairs was greater than the mean for strong rearranged pairs, a finding that makes sense in the context of the SON models but that can also be interpreted within the framework of the DP-EV and DP-UV models as familiarity unopposed by recollection. The experiments reported here, though, argue for the importance of RN for pairs studied only once, especially in differentiating young from older adults at the individual level of analysis. As noted above, Macho (2004) found that setting ASSOC ⫽ 1 produced acceptable fits for the SON model, contrary to Kelley and Wixted (2001), but for our pooled data, this restriction worsened fit. In addition, Macho reported that the mean of the item, but not the associative, distribution increased with pair strength; this is a puzzling outcome, but one assumed by Kelley and Wixted as well. Finally, for strong pairs, Macho found with an MPTSDT model that participants spread their responses over rating categories more for strong than for weak intact pairs, suggesting that it is more difficult for people to decide for strong pairs whether item familiarity or associative familiarity contributes to overall strength
785
of evidence that a test pair was studied. Extrapolating from strength (pair repetition) to age, as noted above, would lead one to expect that young adults would spread their responses more evenly than older adults, but, if anything, the reverse was true. Moreover, in fitting MPTSDT models to pooled data from Light et al. (2004), we found more spreading of responses for weak than strong pairs and more signs of spreading ratings for older than young adults.5 At present, then, we believe that none of the models we have explored that invoke both recollection and familiarity can fully account for all aspects of associative recognition, though all are arguably more plausible than pure familiarity models. However, firm conclusions about model adequacy and resolution of inconsistencies in the literature await systematic exploration of the effects of theoretically important variables on component processes estimated at both individual and group levels.
5
These analyses were carried out for the present article and may be obtained directly from us.
References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrox & F. Caski (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest, Hungary: Akademiai Kiado. Arndt, J., & Reder, L. M. (2002). Word frequency and receiver operating characteristic curves in recognition memory: Evidence for a dualprocess interpretation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 830 – 842. Atkinson, R. C., & Juola, J. F. (1974). Search and decision processes in recognition memory. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes (Eds.), Contemporary developments in mathematical psychology: Vol. 1. Learning, memory, and thinking (pp. 243–293). San Francisco: Freeman. Balota, D. A., Cortese, M. J., Duchek, J. M., Adams, D., Roediger, H. L., McDermott, K. B., & Yerys, B. E. (1999). Veridical and false memories in healthy older adults and in dementia of the Alzheimer’s type. Cognitive Neuropsychology, 16, 361–384. Benjamin, A. S. (2001). On the dual effects of repetition on false recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 941–947. Benjamin, A. S., & Craik, F. I. M. (2001). Parallel effects of aging and time pressure on memory for source: Evidence from the spacing effect. Memory & Cognition, 29, 691– 697. Burke, D. M., & Light, L. L. (1981). Memory and aging: The role of retrieval processes. Psychological Bulletin, 90, 513–546. Chalfonte, B. L., & Johnson, M. K. (1996). Feature memory and binding in young and older adults. Memory & Cognition, 24, 403– 416. Clark, S. E., & Gronlund, S. D. (1996). Global matching models of recognition memory: How the models match the data. Psychonomic Bulletin & Review, 3, 37– 60. Cleary, A. M., Curran, T., & Greene, R. L. (2001). Memory for detail in item versus associative recognition. Memory & Cognition, 29, 413– 423. Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 33(A), 497–505. Davidson, P. S. R., & Glisky, E. L. (2002). Neuropsychological correlates of recollection and familiarity in normal aging. Cognitive, Affective, & Behavioral Neuroscience, 2, 174 –186. Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (1987). California Verbal Learning Test: Adult version. New York: Psychological Corporation.
786
HEALY, LIGHT, AND CHUNG
Erdfelder, E., & Buchner, A. (1998). Process-dissociation measurement models: Threshold theory or detection theory? Journal of Experimental Psychology: General, 127, 83–96. Francis, W. N., & Kucˇera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin. Gardiner, J. M. (1988). Functional aspects of recollective experience. Memory & Cognition, 16, 309 –313. Glanzer, M., Kim, K., Hilford, A., & Adams, J. K. (1999). Slope of the receiver-operating characteristic in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 500 – 513. Harkins, S. W., Chapman, R. C., & Eisdorfer, C. (1979). Memory loss and response bias in senescence. Journal of Gerontology, 34, 66 –72. Heathcote, A. (2003). Item recognition memory and the receiver operating characteristic. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1210 –1230. Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411– 428. Hintzman, D. L., & Curran, T. (1994). Retrieval dynamics of recognition and frequency judgments: Evidence for separate processes of familiarity and recall. Journal of Memory and Language, 33, 1–18. Humphreys, M. S., Bain, J. D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic, and procedural tasks. Psychological Review, 96, 208 –233. Huyser, K., & van der Laan, J. (1994). dataThief (Version 2.0) [Computer software and manual]. Retrieved March 25, 2003, from http://www .nikhef.nl/⬃keeshu/datathief/ Jacoby, L. L. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30, 513–541. Jacoby, L. L. (1999). Ironic effect of repetition: Measuring age-related differences in memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 3–22. Jones, T. C., & Jacoby, L. L. (2001). Feature and conjunction errors in recognition memory: Evidence for dual-process theory. Journal of Memory and Language, 45, 82–102. Kelley, R., & Wixted, J. T. (2001). On the nature of associative information in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 701–722. Kensinger, E. A., & Schacter, D. L. (1999). When true memories suppress false memories: Effects of ageing. Cognitive Neuropsychology, 16, 399 – 415. Kijewski, M. F., Swensson, R. G., & Judd, P. F. (1989). Analysis of rating data from multiple-alternative tasks. Journal of Mathematical Psychology, 33, 428 – 451. Kliegl, R., & Lindenberger, U. (1993). Modeling intrusions and correct recall in episodic memory: Adult age differences in encoding of list context. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 617– 637. Koutstaal, W., & Schacter, D. L. (1997). Gist-based false recognition of pictures in older and younger adults. Journal of Memory and Language, 37, 555–583. Koutstaal, W., Schacter, D. L., Galluccio, L., & Stofer, K. A. (1999). Reducing gist-based false alarms in older adults: Encoding and retrieval manipulations. Psychology and Aging, 14, 220 –237. La Voie, D. J., & Light, L. L. (1994). Adult age differences in repetition priming: A meta-analysis. Psychology and Aging, 9, 539 –553. Li, S.-C. (2002). Connecting the many levels and facets of cognitive aging. Current Directions in Psychological Science, 11, 38 – 43. Light, L. L. (1991). Memory and aging: Four hypotheses in search of data. Annual Review of Psychology, 42, 333–376. Light, L. L., Chung, C., Pendergrass, R., & Van Ocker, J. C. (in press). Effects of repetition and response deadline on item recognition in young and older adults. Memory & Cognition.
Light, L. L., Healy, M. R., Patterson, M. M., & Chung, C. (2005). Dual-process models of memory in young and older adults: Evidence from associative recognition. In L. Taconnat, D. Clarys, S. Vanneste, & M. Isingrini (Eds.), Manifestations cognitives du vieillissement psychologique: Actes des VIIe`mes journe´es du vieillissement cognitif (pp. 95–116). Paris: Publibook. Light, L. L., Patterson, M. M., Chung, C., & Healy, M. R. (2004). Effects of repetition and response deadline on associative recognition in young and older adults. Memory & Cognition, 32, 1182–1193. Light, L. L., Prull, M. P., La Voie, D. J., & Healy, M. R. (2000). Dual-process theories of memory in old age. In T. J. Perfect & E. A. Maylor (Eds.), Models of cognitive aging (pp. 238 –300). New York: Oxford University Press. Macho, S. (2002). Cognitive modeling with spreadsheets. Behavior Research Methods, Instruments, & Computers, 34, 19 –36. Macho, S. (2004). Modeling associative recognition: A comparison of two-high-threshold, two-high-threshold signal detection, and mixture distribution models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 83–97. Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user’s guide. Cambridge, England: Cambridge University Press. Malmberg, K. J. (2002). On the form of ROCs constructed from confidence ratings. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 380 –387. Malmberg, K. J., Zeelenberg, R., & Shiffrin, R. M. (2004). Turning up the noise or turning down the volume? On the nature of the impairment of episodic recognition memory by Midazolam. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 540 –549. Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87, 252–271. Multhaup, K. S. (1995). Aging, source, and decision criteria: When false fame errors do not occur. Psychology and Aging, 10, 492– 497. Naveh-Benjamin, M. (2000). Adult age differences in memory performance: Tests of an associative deficit hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1170 – 1187. Naveh-Benjamin, M., Hussain, Z., Guez, J., & Bar-On, M. (2003). Adult age differences in episodic memory: Further support for an associativedeficit hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 826 – 837. Nelson, M. J., & Denny, E. C. (1960). The Nelson-Denny Reading Test. Boston: Houghton Mifflin. Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementarylearning-systems approach. Psychological Review, 110, 611– 646. Norman, K. A., & Schacter, D. L. (1997). False recognition in younger and older adults: Exploring the characteristics of illusory memories. Memory & Cognition, 25, 838 – 848. Prull, M. W., Crandell, L. L., Martin, A. M., III, Backus, H. F., & Light, L. L. (2004). Recollection and familiarity in recognition memory: Adult age differences and neuropsychological test correlates. Manuscript submitted for publication. Quamme, J. R., Frederick, C., Kroll, N. E. A., Yonelinas, A. P., & Dobbins, I. G. (2002). Recognition memory for source and occurrence: The importance of recollection. Memory & Cognition, 30, 893–907. Raaijmakers, J. G., & Shiffrin, R. M. (1992). Models for recall and recognition. Annual Review of Psychology, 43, 205–234. Ratcliff, R., Sheu, C.-F., & Gronlund, S. D. (1992). Testing global memory models using ROC curves. Psychological Review, 99, 518 –535. Rotello, C. M., & Heit, E. (1999). Two-process models of recognition memory: Evidence for recall-to-reject? Journal of Memory and Language, 40, 432– 453. Rotello, C. M., & Heit, E. (2000). Associative recognition: A case of recall-to-reject processing. Memory & Cognition, 28, 907–922.
MODELS OF RECOGNITION MEMORY Rotello, C. M., Macmillan, N. A., & Van Tassel, G. (2000). Recall-toreject in recognition: Evidence from ROC curves. Journal of Memory and Language, 43, 67– 88. Salthouse, T. A. (1992). Influence of processing speed on adult age differences in working memory. Acta Psychologica, 79, 155–170. Salthouse, T. A., Fristoe, N., & Rhee, S. H. (1996). How localized are age-related effects on neuropsychological measures? Neuropsychology, 10, 272–285. Schacter, D. L., Osowiecki, D., Kaszniak, A. W., Kihlstrom, J. F., & Valdiserri, M. (1994). Source memory: Extending the boundaries of age-related deficits. Psychology and Aging, 9, 81– 89. Schmitter-Edgecombe, M. (1999). Effects of divided attention and time course on automatic and controlled components of memory in older adults. Psychology and Aging, 14, 331–345. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461– 464. Simpson, A. J., & Fitter, M. J. (1973). What is the best index of detectability? Psychological Bulletin, 80, 481. Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognition memory: Applications to dementia and amnesia. Journal of Experimental Psychology: General, 117, 34 –50. Spencer, W. D., & Raz, N. (1995). Differential effects of aging on memory for content and context: A meta-analysis. Psychology and Aging, 10, 527–539. Stretch, V., & Wixted, J. (1998). On the difference between strength-based and frequency-based mirror effects in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1379 – 1396. Toth, J. P., & Parks, C. (in press). Effects of age on estimated familiarity in the process-dissociation procedure: The role of noncriterial recollection. Memory & Cognition. Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26, 1–12. Tun, P. A., Wingfield, A., Rosen, M. J., & Blanchard, L. (1998). Response latencies for false memories: Gist-based processes in normal aging. Psychology and Aging, 13, 230 –241. Verde, M. F., & Rotello, C. M. (2003). Does familiarity change in the
787
revelation effect? Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 739 –746. Verhaeghen, P., Marcoen, A., & Goossens, L. (1993). Facts and fiction about memory aging: A quantitative integration of research findings. Journal of Gerontology: Psychological Sciences, 48, P157–P171. Wickens, T. D. (2002). Elementary signal detection theory. New York: Oxford University Press. Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1341– 1354. Yonelinas, A. P. (1997). Recognition memory ROCs for item and associative information: The contribution of recollection and familiarity. Memory & Cognition, 25, 747–763. Yonelinas, A. P. (2001). Components of episodic memory: The contribution of recollection and familiarity. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 350, 1363– 1374. Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory and Language, 46, 441–517. Yonelinas, A. P., Dobbins, I., Szymanski, M. D., Dhaliwal, H. S., & King, L. (1996). Signal-detection, threshold, and dual-process models of recognition memory: ROCs and conscious recollection. Consciousness and Cognition, 5, 418 – 441. Yonelinas, A. P., & Jacoby, L. L. (1995). The relation between remembering and knowing as bases for recognition: Effects of size congruency. Journal of Memory and Language, 34, 622– 643. Yonelinas, A. P., Kroll, N. E. A., Dobbins, I. G., & Soltani, M. (1999). Recognition memory of faces: When familiarity supports associative recognition judgments. Psychonomic Bulletin & Review, 6, 654 – 661. Yonelinas, A. P., Regehr, G., & Jacoby, L. L. (1995). Incorporating response bias in a dual-process theory of memory. Journal of Memory and Language, 34, 821– 835. Zacks, R. T., Hasher, L., & Li, K. Z. H. (2000). Human memory. In F. I. M. Craik & T. A. Salthouse (Eds.), The handbook of aging and cognition (2nd ed., pp. 293–357). Mahwah, NJ: Erlbaum.
(Appendix follows)
HEALY, LIGHT, AND CHUNG
788
Appendix Model Parameters Model
df
Parameters
Associative recognition UVSDT EVSDT DP-UV DP-EV SON-UV SON-EV MPTSDT-UV MPTSDT-EV
7 8 5 6 5 6 4 5
I,R/N, d⬘I/N, d⬘R/N I,R/N,a d⬘I/N, d⬘R/N I,R/N, d⬘I/N, d⬘R/N, RO, RN I,R/N,a d⬘I/N, d⬘R/N, RO, RN ASSOC, d⬘ITEM, d⬘ASSOC, RO, RN ASSOC,a d⬘ITEM, d⬘ASSOC, RO, RN I,R/N, d⬘I,R/N, RO, RN, j1, j2 I,R/N,a d⬘I,R/N, RO, RN, j1, j2 Item recognition
UVSDT DP-UV DP-EV MPTSDT-EV
3 2 3 1
O/N, d⬘O/N O/N, d⬘O/N, RO O/N,a d⬘O/N, RO d⬘O/N, RO, j1, j2
Note. Up to five criteria were also estimated for each model. j3 ⫽ 1 ⫺ (j1 ⫹ j2). All parameters were estimated except those marked with a superscript (a). UVSDT ⫽ unequal-variance signal detection theory model; EVSDT ⫽ equal-variance signal detection theory model; DP-UV ⫽ dualprocess unequal-variance model; DP-EV ⫽ dual-process equal-variance model; SON-UV ⫽ some-or-none model, unequal variance; SON-EV ⫽ some-or-none model, equal variance; MPTSDT-UV ⫽ multinomial processing tree signal detection theory model, unequal variance; MPTSDTEV ⫽ multinomial processing tree signal detection theory model, equal variance. a Fixed parameter (1).
Received March 25, 2003 Revision received January 4, 2005 Accepted January 4, 2005 䡲