PPT AND RECOGNITION 1 Suboptimal Recognition ...

4 downloads 0 Views 858KB Size Report
The U. S. Equal Employment Opportunity Commission identifies cognitive testing, .... random, the consistency coefficient will equal zero, whereas if performance ...
PPT AND RECOGNITION

1

Article in press, Cognitive Technology, 18, (2), 4-17 http://www.cognitivetechnologyjournal.com/ArticleDetail.aspx?id=300 Suboptimal Recognition for Increasing Set Sizes is Largely Due to Inconsistency: A Potential Performance Theory Analysis of Individual Differences

Joshua Sandry1,2 , Stephen Rice3, David Trafimow4, Gayle Hunt4, Lisa Busche4 , and Eduardo Rubio5

1

Neuropsychology & Neuroscience Laboratory, Kessler Foundation Research Center, West Orange, NJ

2

Department of Physical Medicine and Rehabilitation, Rutgers University – New Jersey Medical School, Newark, NJ 3

College of Aeronautics, Florida Institute of Technology, Melbourne, FL 4

New Mexico State University, Las Cruces, NM

5

Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa

Word Count: 7036

Author Note Correspondence should be addressed to Joshua Sandry; [email protected], Neuropsychology & Neuroscience Laboratory, Kessler Foundation Research Center, 300 Executive Dr, Suite 70, West Orange, NJ 07052

PPT AND RECOGNITION

2 Abstract

Recognizing information occurs on daily basis in both our professional and personal lives. In the current studies, we used Potential Performance Theory (PPT; Trafimow & Rice, 2008) to separate and identify the role that random and non-random factors play in the recognition of briefly presented information of various set sizes. Participants were presented with a 2-interval forced choice recognition task using number strings (Experiment 1) or letter strings (Experiment 2) between four and nine digits/characters long. Findings indicated that as the number of items to be recognized increased, observed performance decreased. PPT analyses showed that this effect was almost entirely due to a decrease in consistency (i.e., an increase in randomness) whereas potential performance remained nearly perfect. Individual differences analyses revealed this was not the case for all participants. The present findings provide new information about the role consistency plays in recognition tasks. Theoretical contributions and practical applications will be discussed. Keywords: recognition, memory, potential performance, systematic, consistency, random

PPT AND RECOGNITION

3

Suboptimal Recognition for Increasing Set Sizes is Largely Due to Inconsistency: A Potential Performance Theory Analysis of Individual Differences

There exists a long history in the behavioral sciences of researchers attempting to measure and quantify people based upon their performance on some type of a cognitive test. General cognitive ability is often a good indicator of performance in any job (Schmidt, 2012) and in some instances, cognitive testing can be used to predict how well someone will perform in a given profession (Hunter & Hunter, 1984). General cognitive ability and aptitude tests have been used successfully as valid indicators of job performance (Hunter, 1986; Schmidt & Hunter, 2004). Employers will often use various test batteries to select individuals; for example, law enforcement agencies often use assessments to select or promote employees (Jacobs, Cushenbery, & Grabarek, 2011). The Department of Defense uses the Armed Services Vocational Aptitude Battery to quantify enlisted services members skills and this information can be used for job assignment (Kaplan & Saccuzzo, 2005). One component used in some cognitive tests is the ability to remember, recognize, or compare briefly presented information (individual differences in memory capacity for example is related to general fluid intelligence, see Unsworth & Engle, 2007). The ability to recognize information is important for the selection and training of individuals. The U. S. Equal Employment Opportunity Commission identifies cognitive testing, such as memory, perceptual speed, and accuracy as acceptable types of employment testing and procedures that can be used to select applicants (“U.S. Equal”,  2010). One type of cognitive test that may be used to assess applicants’ cognitive ability may be a memory-recognition test where first, prospective employees are presented with information which they are asked to remember.

PPT AND RECOGNITION

4

Next, they may be presented with a recognition task, where they compare the information stored in memory with subsequent information presented. They would be asked to discern whether or not that information is old or new (in psychology, this would commonly be referred to a s a recognition memory test). Similar examples could be considered related to testing in a classroom setting, e.g., multiple choice testing is similar to a recognition memory test. There are some applied settings where it is crucial to be able to recognize information presented in displays. An air traffic controller often has to view a constantly updating display containing many different bits of information. Next, the air traffic controller must notice changes in the display and compare any new information with information from earlier (i.e., altitude, call numbers, or speed) and use this information to make decisions regarding flight control. For an additional applied example, consider a traveler who is using a global positioning system (GPS) to assist them in navigating to some unfamiliar location. The GPS user has to remember the name of a street where they will make their next turn or the building number of their destination. Next, they have to recognize this information when the street or building appears. Although a great deal of information can be outsourced with technology, humans still have to recognize or compare information frequently. A very important question to help further our understanding of item recognition is to understand the underlying factors that contribute to performance decrements in these recognition tasks. With this knowledge, future research can continue to improve recognition performance by designing task-relevant practice and training programs. Over the past few decades, there has been a myriad of proposed theories that deal with recognition tasks (see Wixted, 2007, for review), as well as short-term and working memory (see Jonides et al., 2008 for review). Our concern is not with comparing or testing any of these specific theories, but instead with presenting and exploring a new variable that may be important

PPT AND RECOGNITION

5

when dealing with recognition tasks and how that variable may be related to performance testing. Presently, no one has investigated the role of inconsistency (randomness) in recognizing information. A relatively new theory, termed Potential Performance Theory (PPT), allows researchers to separate the random factors that affect task performance from the non-random factors that affect task performance (Trafimow & Rice, 2008, 2009, 2011). Some current studies using PPT have found that consistency (which we will elaborate on in the following section) is an important but often overlooked factor in task performance; for example, test-taking (Rice, Geels, Trafimow, & Hackett, 2011; Rice, Trafimow, & Kraemer, 2012), math ability between cultures (Rice & Trafimow, 2012b), visual versus auditory memory (Hunt et al., 2011), automation performance (Hunt, Rice, Geels, & Trafimow, 2010; Rice, Trafimow, & Hunt, 2010), the speed-accuracy trade off (Rice, Trafimow, Keller, Hunt, & Geels, 2011), improved task performance when under time pressure (Rice & Trafimow, 2012a), perceptual discrimination tasks and signal detection modeling (Trafimow, MacDonald, & Rice, 2012), enumeration (Hunt, Rice, Trafimow, & Sandry, 2013), task difficulty of visual search (Rice et al., 2012) and attribution (Trafimow, Hunt, Rice, & Geels, 2011). PPT may have important implications for recognition tasks. Potential Performance Theory PPT (Trafimow & Rice, 2008, 2009, 2011) is a general theory of task performance. According to PPT, all of the factors that influence task performance can be placed into two categories. These are nonrandom (systematic)  and  random  factors.  In  general,  people’s  observed   performances on tasks are a function of both systematic and random factors. The goal of PPT is to estimate what performance would be if all random factors could be eliminated, thereby

PPT AND RECOGNITION

6

leaving the influence of only the relevant systematic factors. Estimates of random-free performance are termed potential scores. To obtain potential scores, it is necessary to have a task with multiple trials where participants complete at least two blocks of matched multiple trials. The reason for having two blocks of trials is that this allows the researcher to compute a consistency coefficient. Consistency is the correlation between the two blocks of trials, for each participant. It is important to be clear that there are as many consistency coefficients as there are participants; the consistency coefficient is a within-person correlation coefficient. If performance is completely random, the consistency coefficient will equal zero, whereas if performance is completely nonrandom, the consistency coefficient will equal unity. Thus, consistency is an inverse measure of randomness. The  PPT  use  of  “consistency”  should  not  be  confused  with  other  uses.  In  PPT,   consistency does not mean doing the same thing on every trial. Nor does it refer to performing the same behavior in a variety of contexts as in attribution theory (Kelley & Michela, 1980; Orvis, Cunningham, & Kelley, 1975). Our use of this word also differs from how it is used in the lens model, where consistency refers to a correlation between predicted judgments and actual judgments, or between the predicted criterion value and the actual criterion value (Brunswick, 1952). In contrast to other theories, PPT refers to consistency as a reverse measure of randomness; it is about participants performing the same way on trials that are matched across blocks. For example, if a participant responds in one way on Item 1 in Block 1, and a different way on Item 2 in Block 1, this might or might not be consistent. Consistency depends on the Block 2 responses. If the participant responds to Item 1 in Block 2 the same way as she responded to Item 1 in Block 1, and she responds to item 2 in Block 2 the same way as she

PPT AND RECOGNITION

7

responded to Item 2 in Block 2, then her responses have perfect consistency. In the case of this participant, the consistency coefficient will equal unity; there is no randomness in her responses. If, on the other hand, the participant responds to Item 1 in Block 2 differently from the way she responded to Item 1 in Block 1, this would be considered randomness in her responses and result in a lower consistency coefficient. Depending on how often she is inconsistent in her responses across the matched trials in Blocks 1 and 2, her consistency coefficient will diminish accordingly. That PPT uses a consistency coefficient based on responses across blocks of trials is not surprising, given that its origins trace back to classical true score theory (see Lord and Novick, 1968 for a review). In fact, the correction formula for attenuation was originally proposed by Spearman (1904), who was concerned mainly with the issue of randomness. Spearman showed how random measurement error decreases correlation coefficients from what they would be in the absence of random measurement error (this is the classic attenuation formula), and he also showed how to estimate the size of correlations if random measurement error could be eliminated (this is the classic formula for correcting correlation coefficients for attenuation). PPT makes use of these contributions. We direct the reader interested in the computational sequence for the calculations to Trafimow and Rice (2008, 2009, 2011). Current Experiments Potential Performance Theory does not allow for the direct identification of strategies; however it does allow for the quantification of the effect of systematic and random factors independently. Our current goal is to uncover how much the effect of systematic and/or random factors (inconsistent responding) contribute to performance decrements as set increases in a memory recognition task. It is easy to see the importance of systematic factors, but the role of

PPT AND RECOGNITION

8

random factors in recognition performance is not readily obvious, and has not been supported by any data. Therefore, the experiments to be presented test the possibility of consistency playing an important role in explaining differences in observed performance of recognition tasks at different set sizes. Based on reasoning that there is a systematic process whereby increasing the number of items causes performance decrements, one plausible hypothesis would be that the decrement in observed performance is completely due to non-random, systematic factors; we label this first hypothesis the Systematic Hypothesis. A recent attribution study by Trafimow et al. (2011) found evidence that changes in observed performance were completely due to changes in systematic factors. By theorizing from PPT, a plausible alternative hypothesis can also be generated; the Consistency Hypothesis. Given that performance decrements are known to be caused by a reduction in consistency (remember that this is equivalent to an increase in randomness) in a variety of tasks (Hunt et al., 2010, 2011, 2013; Rice et al., 2010, 2011, 2012, 2012; Trafimow et al., 2012; Rice & Trafimow, 2012a, 2012b), it is reasonable to hypothesize that a decrease in recognition performance is due to a decrease in consistency. A third possibility, which we refer to as the Combination Hypothesis, is that both systematic processes and consistency affect recognition performance (Trafimow & Rice, 2009, for support of mixed effects). As of now, it is not clear which of these hypotheses will be correct so we aim test these hypotheses against each other with respect to recognition memory performance by asking participants to perform a two-interval forced choice (2IFC) recognition memory task using both digits and letters.

PPT AND RECOGNITION

9

Experiment 1 Method Participants. Twenty undergraduates (10 females and 10 males) from a large Southwestern university participated for partial course credit. The mean age was 20.95 (SD = 3.75). Apparatus and Stimuli. The experiment was presented on a Dell computer with a 22 in. (55.88 cm) monitor via E-prime 2.0 using 1024 x 768 pixel resolution. Stimuli consisted of number strings between four and nine numbers long, displayed horizontally across the center of the screen. Each digit took up approximately 2.05 vertical degrees and 0.68 horizontal degrees of visual angle with four items taking up 5.45 degrees and nine items taking up 9.53 degrees. A fixation  point  was  presented  before  each  trial  to  direct  the  participants’  visual  attention  toward   the center of the display. Procedure and Design. After providing consent, participants were seated in a comfortable chair directly in front of the experimental display. Viewing distance from the monitor was 21 in (53.34 cm) from the display for each participant and controlled with a chinrest. Participants were then given instructions on the computer screen and had the opportunity to ask questions of the experimenter before beginning the experiment. The instructions conveyed to the participants that they would be presented with a string of digits which would be followed by two interval displays and it would be their job to choose the interval that matched the first stimulus (see Figure 1). Because of methodological constraints related to PPT, a 2IFC design measuring accuracy was required; we used a design similar to that of Luck and Vogel (1997) in their investigation of visual short-term memory.

PPT AND RECOGNITION

10

Each trial began with a 500 ms fixation cross appearing in the center of the screen. The next screen presented participants with a string of numbers for 1000 ms which was followed by a fixation cross that lasted for 500 ms. Following the fixation, a series of two different interval displays appeared on the screen for 1000 ms each, separated by a fixation cross lasting 500 ms. Lastly participants were asked to indicate which of the two interval choices matched the original number  string.  They  were  asked  to  press  the  “F”  key  to  signify that the first interval contained the  matching  number,  or  to  press  the  “J”  key  to  signify  that  the  second  interval  contained  the   matching number. Each of 6 set-sizes (4 digit string, 5 digit string, 6 digit string, 7 digit string, 8 digit string, and 9 digit string) consisted of 50 trials, and the order of presentation for all set-sizes was randomly selected by the computer program. The correct answer appeared in the first interval for half of the trials and in the second interval for the other half of trials (randomized for each participant). The incorrect answer was constructed by randomly modifying one digit within each string. A single block of trials contained all 50 stimuli from each of the 6 set-sizes. Each participant was presented with the same block two times, in random order each time, which culminated in a total of 600 trials, lasting 60 minutes. Both blocks contained identical number strings, as is necessary in order to perform the PPT analyses. A within-participants design was used, in which all participants were presented with all numeric strings in both blocks of trials. Results General PPT Analysis. A one-way within-participants ANOVA on the observed scores was significant, F(5, 95) = 46.65, p < .001, η2 = .71 (see Figure 2). A trend analysis revealed a

PPT AND RECOGNITION

11

significant linear trend, F(1, 19) = 189.29, p < .001, η2 = .90. As Figure 1 shows, observed performance decreased dramatically as the number of recognition items increased. A one-way within-participants ANOVA on the potential scores was not significant, F(5, 95) = 0.65, p = .66, η2 = .03; indicating that potential scores remained nearly perfect regardless of the number of recognition items presented. A one-way within-participants ANOVA on consistency scores was significant, F(5, 95) = 48.63, p < .001, η2 = .72; there was a significant drop in consistency as the number of recognition items increased. A trend analysis on consistency scores revealed a significant linear trend, F(1, 19) = 143.64, p < .001, η2 = .88. Analysis on Systematic Processes. To reiterate, PPT does not allow for the direct measurement of strategies, per se. Instead, PPT allows for measurement of systematic and random factors that contribute to an observed score. PPT analyses also allow us to examine differences between each set size, with particular focus on how much of a change in observed performance due to an increase in the number of recognition items is a result of systematic processes (procedures for doing this are outlined by Trafimow & Rice, 2009, who also provide derivations of all theorems). The following analysis examines the differences between the overall set of four to nine recognition items. When increasing from four to nine recognition items, the average observed score dropped by 22%. This change was almost entirely due to a mean drop in consistency from r = .83 to r = .35. Potential scores only dropped by 3%. In addition, we also considered the effect of the change in potential performance while controlling for the change in consistency and the effect of the change in consistency while controlling for the change in potential performance. If there had been no change in consistency when increasing from four to nine recognition items, observed performance due to the change in systematic processes would have been from M = .96 to M =

PPT AND RECOGNITION

12

93%, a difference of only 3%. If there had been no change in systematic processes, observed performance would have decreased from M = 96% to M = 78%, a much more substantial decrease of 18% that is reasonably close to the actually observed decrease of 22%. It is easy to see from this additional analysis that the change in observed performance is predominately a function of inconsistency. PPT Analyses on Individuals. One advantage of PPT is the ability to conduct individual level analyses, rather than using only groups. This type of individual analysis provides the opportunity to see how systematic processes and consistency affect observed scores for each person, allowing for person-specific recommendations on how to improve performance. In this section we will provide four examples of how individual analyses can reveal important differences between participants. Figure 3 presents these data. Participant 5 maintained an almost perfect potential score from 4 to 7 recognition items and then this participant began to decrease slightly around 7 recognition items, a decrease of only .08 between 7 and 9 recognition items. While Participant  5’s potential score decreased between 7 and 9 recognition items, the consistency also began to plummet around a set size of 6 recognition items, dropping by .56. Overall, Participant  5’s  observed  performance decreased by .22, and based on the PPT analysis, this seems to be largely attributable to a decrease in consistency. Participant 8 showed a steady decrease in their consistency over all set sizes. Over the different recognition sizes, this participant showed a very minimal decrease in potential scores until the final set size of 9 items. When going from 8 to 9 items the potential score dropped substantially from .99 to .68, a decrease of .31. Interestingly, Participant  8’s consistency remained relatively stable, only dropping by .02; therefore, almost all of the decrease in observed performance from 8 to 9 recognition items was a result of a change in systematic factors.

PPT AND RECOGNITION

13

Participant 11 followed the general trend of the group graph across all set sizes. The potential score remained relatively steady while consistency dropped, which resulted in a decrease in observed performance. Participant 18 provides an excellent example of a phenomenon first noted by Trafimow and Rice (2009), whereby changes in systematic factors and consistency can go in opposite directions, cancel each other out, and thereby result in little change in observed performance. Taken at face value, when there is little change in observed performance, the obvious conclusion is that nothing happened. In contrast, even when nothing seems to be happening at the level of observed scores, a PPT analysis can identify those individuals for whom two changes occurred in opposite directions. In the case of Participant 18, the interesting action occurred going from 8 to 9 recognition items. The potential score dropped from 1.00 to .90 but consistency increased from .39 to .56. These counteracting tendencies resulted in little change in observed scores, and the analysis makes salient the ability of PPT analyses to uncover two countervailing changes when there seems to be little change in observed scores. Discussion Taken  together,  these  results  demonstrate  that  participants’  accuracy  decreased  as  the   number of recognition items increased. This effect was largely due to the influence of randomness. The potential scores remained approximately constant even as the number of recognition items  increased.  However,  participants’  consistencies  dropped  in  correspondence  to   their observed scores, thereby providing a good account of the effect of increasing items on decreasing observed scores. The findings from Experiment 1 provide more support for the Consistency Hypothesis than the Systematic Hypothesis or the Combination Hypothesis. Importantly, the analyses on individual differences suggest that this is not the case for all

PPT AND RECOGNITION

14

participants. Implications regarding the individual analyses will be discussed further in the general discussion. Experiment 2 It is possible that the ability to recognize information works differently with different types of stimuli and so participants may recognize numerical information better than they recognize other bits of information. Crannell and Parrish (1957) for example, compared memory span for digits, letters and words and found that participants’ memory span was longer for digits than it was for letters and the memory span for letters was longer than it was for words. It is also readily obvious that everyday recognition tasks involve different stimuli. Pilots, for example, deal with both numeric and lexical stimuli. Consequently, for replication purposes, and in order to demonstrate that our findings in Experiment 1 were not caused by the specific characteristics of the particular stimuli set, in Experiment 2 we switched from number strings to letter strings. Method Participants. Twenty-three undergraduates (13 females and 10 males) from a large Southwestern university participated in this study for partial course credit. The mean age was 19.91 (SD = 4.50). Materials and Stimuli. Experiment 2 replicated Experiment 1, with the exception of the type of stimuli to be recognized. Rather than using numeric digits, participants were presented with strings consisting of characters from the English alphabet. Each character took up approximately 0.68 vertical degrees and 0.68 horizontal degrees of visual angle with four items taking up 2.73 degrees and nine items taking up 5.45 degrees. The strings of characters were generated randomly. A within-participants design was utilized, in which all participants were presented with all strings of characters in both blocks of trials.

PPT AND RECOGNITION

15

Results General PPT Analysis. A one-way within-participants ANOVA on the observed scores was significant, F(5, 110) = 47.49, p < .001, η2 = .68 (see Figure 4). A trend analysis revealed a significant linear trend, F(1, 22) = 122.94, p < .001, η2 = .85. As Figure 2 shows, observed performance decreased dramatically as the number of presented items increased. A one-way within-participants ANOVA on the potential scores was not significant, F(5, 110) = 0.45, p = .81, η2 = .02; potential scores remained approximately perfect regardless of the number of recognition items presented. In contrast, but replicating the Experiment 1 findings, the one-way within-participants ANOVA on consistency scores was significant, F(5, 110) = 54.53, p < .001, η2 = .71; there was a significant drop in consistency as the number of presented items increased. A follow up trend analysis on the consistency scores revealed a significant linear trend, F(1, 22) = 128.79, p < .001, η2 = .85. Analysis on Systematic Processes. As with the analyses in the first study, we also performed the PPT analysis to look at the differences between each set size while controlling for systematic processes (see Trafimow & Rice, 2009). The following analyses consider the differences in overall change from four to nine recognition items. When going from four to nine recognition items, the average observed score dropped by 19%. This change was due entirely to a mean drop in consistency from r = .83 to r = .34. Potential scores did not drop at all. We considered the effect of the change in potential performance while controlling for the change in consistency and the effect of the change in consistency while controlling for the change in potential performance. If there had been no change in consistency when going from four to nine recognition items, observed performance due to the change in systematic processes would have increased slightly from M = 96% to M =

PPT AND RECOGNITION

16

97%. If there had been no change in systematic processes, observed performance would have decreased from M = 96% to M = 79%, a decrease of 17%, which is close to the observed change of 19%. PPT Analyses on Individuals. In this section we again provide examples of how individual analyses unveil important differences between participants (see Figure 5). Participant 4 demonstrated nearly perfect potential scores over all set sizes and followed the trend of the group data fairly closely, as did Participant 5. However, Participant 5 began to exhibit lower potential scores when going from 6 to 7 recognition items and continued to drop through the remaining set sizes. The drop in consistency from 6 to 9 recognition items was .38 and the drop in potential scores was .07. The larger decrement in consistency seems to have contributed to the decrease in observed performance. Participant 12 is an excellent example of minimal influence from random factors, the ability to remain highly consistent over the series of different set sizes, and the joint impact this has  on  observed  performance.  Participant  12’s  observed  performance  reached  almost  perfect   levels and this was a result of the aggregation of high potential scores (hovering around 1.0) as well as remaining highly consistent. Participant 22 had high potential scores throughout the tasks only changing slightly from 5 to 6 recognition items, dropping a total of .04. Participant  22’s consistency decreased steadily until reaching 7 to 8 recognition items where consistency improved, increasing by .06 and this increase resulted in a majority of the improvement in observed performance, increasing from .53 to .64, an increase of .11. Interestingly Participant 22 became very inconsistent when going from 8 to 9 recognition items with consistency dropping by .26. Again, this can account for the decrease in observed performance which dropped by .08.

PPT AND RECOGNITION

17

Discussion As in Experiment 1, the Consistency Hypothesis provides a much better account of the effect of set size on observed recognition performance than does the Systematic Hypothesis or the Combination Hypothesis. Similar implications to Experiment 1 can be drawn regarding the individual differences we present in Experiment 2. It is not shocking that there are individual differences, however without the PPT methodology and computational algorithm, it would be impossible to see if systematic factors or consistency is limiting observed performance. Akin to Experiment 1, the analyses on individual differences suggest this may not be the case for all participants. These implications will be discussed further in the general discussion. General Discussion Theoretical Implications The present study was somewhat exploratory in nature in testing a number of different alternative hypotheses. Based on PPT, we proposed three hypotheses to account for the expected decrease in observed recognition performance as more items were presented. According to the Systematic Hypothesis, observed performance decreases as recognition set sizes increase and this is due to systematic factors (Trafimow et al., 2011). According to the Consistency Hypothesis, the  presentation  of  additional  items  instills  increasing  randomness  into  people’s  recognition processes, and it is the increased randomness that decreases observed performance (Hunt et al., 2010, 2011, 2013; Rice et al., 2010, 2011, 2012, 2012; Rice & Trafimow, 2012a, 2012b; Trafimow et al., 2012). The two hypotheses are not mutually exclusive because it is possible for both of them to be correct simultaneously; hence the Combination Hypothesis (Trafimow & Rice, 2009).

PPT AND RECOGNITION

18

The Experiment 1 findings provide strong support for the Consistency Hypothesis and little support for the Systematic or Combination Hypotheses. It is possible that the findings in Experiment 1 were limited to number string paradigms and so we switched to letter strings in Experiment 2. The findings in Experiment 2 provided corroborating support for the Consistency Hypothesis and less support for the Systematic Hypothesis. The conclusion, then, is that decrements in observed recognition performance for number and letter strings are largely a result of increasing randomness as more items are presented. The findings from both experiments corroborate evidence from previous PPT studies that consistency plays a major role in task performance of various kinds (Hunt et al., 2010, 2011, 2013; Rice et al., 2010, 2011, 2012, 2012; Rice & Trafimow, 2012a, 2012b; Trafimow et al., 2011, 2012; Trafimow & Rice, 2009). Lavie (2010) suggests that performance improves under high perceptual load (bottom-up processing) but performance declines under high working memory load (top-down processing). Presently, the tasks in our experiments were demanding of working memory resources, directly involving the top-down executive control system. Our findings may be suggestive that the breakdown in performance associated with a high working memory cognitive load (e.g., larger set sizes) may be a result of inconsistency and randomness. In prior work, Rice et al. (2012) manipulated the task difficulty associated with a visual conjunction search (involving top down executive processing as opposed to bottom-up perceptual processing, see Connor, Egeth, & Yantis, 2004) with two set sizes. They found that decreases in performance at the larger set size were largely a function of inconsistency. The Rice et al. findings and the present findings seem to suggest that the higher the executive cognitive load associated with the task, the more likely it is for randomness to invade the measure. When dealing with manipulations high in working

PPT AND RECOGNITION

19

memory load as opposed to being high in perceptual load, researchers should consider the impact that randomness has on the performance measures. Although the present study was designed under an applied framework to inform candidate selection and work towards developing personalized training programs, there are some interesting theoretical interpretations related to working memory capacity that are worth mentioning. Presently, we found that  participants’  accuracy  dropped  in  a  systematic  fashion  after   about  four  or  five  items.  This  is  consistent  with  other  researchers’  findings  that  working memory capacity is limited to about four bits of information when rehearsal is unlikely (Alvarez & Cavanaugh, 2004; Awh, Barton, & Vogel, 2007; Chen & Cowan, 2009; Cowan, 2000; Cowan & Chen, 2005; Luck & Vogel, 1997; Pashler, 1988; Sperling, 1960; Todd & Marois, 2004; Vogel & Machizawa, 2004; Vogel, Woodman, & Luck, 2001; Xu & Chun, 2006). Different models have been proposed about working memory capacity (see Fukuda, Awh, & Vogel, 2010) and the current results may provide support for one model over another. In the discrete resource model, researchers theorize that working memory is composed of different storage slots or compartments and these slots can only hold a limited number of chunks (Cowan & Rouder, 2009; Zhang & Luck, 2008). The alternative model, the flexible resource model, predicts that the capacity of working memory is not limited, but instead, all information is allocated a proportion of memory resources; the diminished performance then becomes a function of the limited resources being distributed across all items (Bays & Hussain, 2008; Wilken & Ma, 2004). When interpreting the present findings in terms of these two models of capacity, the present results seem to support the flexible resource model over the discrete resource model. Because potential scores remained relatively perfect even when the number strings were nine digits long, this could be interpreted as resources being allocated to a number

PPT AND RECOGNITION

20

of competing items as the set size grew larger. We would like to make it clear that the present experiments were not designed to test these two models directly and future research along with existing evidence certainly needs to be evaluated in order to draw solid conclusions about how resources are allocated in working memory. Practical Implications The present study was lab-based, but there remain a number of practical implications that can be taken from the findings. One reason to consider the role of consistency in task performance involves the selection and recruitment of individuals for careers that involve recognition tasks. Recently, van Meeuwen, Brand-Gruwel, van Merrienboer, de Bock, and Kirshner (2010) identified a number of criteria for selecting Air Traffic Control candidates who would not drop out. It may also be beneficial to consider applicants who are less likely to have random error invade their recognition process. This idea extends to other employment domains, for example, the military and medical professions. Many tasks in the military require recognition of targets or recognition of enemy strongholds. The medical profession is often faced with recognition tasks, for example, holding a milliliter measurement in mind, recognizing the correct milliliter measurement to be drawn into a syringe, and then subsequently administering this medication to the patient. As we noted in the introduction, there are a number of tasks where consistency is an extremely important component of task performance. Currently, PPT methodology is one of the few ways (possibly the only way) where it is possible to disentangle and measure task performance in terms of potential performance and consistency. Typically, an observed score would be calculated as percentage correct value. Imagine the following example where a test was used in an air traffic control school to determine the selection and qualification of applicants and

PPT AND RECOGNITION

21

percentage correct functioned as the dependent measure. To test the various applicants, the instructors may choose to use a simple recognition memory experiment as a measure of cognitive ability. For example, a student may be asked to view a set of stimuli (screen shots of air traffic control displays with a variety of planes at variable altitudes), the venerable encoding task. Next, the student would be presented with both old and new air traffic control displays and they would be asked to discriminate which display they have seen before. If the applicant performs well and demonstrates a high percentage correct, they may be selected to progress in the program. If they perform suboptimally, they may be withdrawn from the program. The problem is that some of the applicants may in fact have a high potential score and it is just their consistency that needs to be adjusted. If the school had used a PPT performance measure, accompanied by the percentage correct performance measure, they may find that some of the participants who suffered from poor recognition ability could benefit from consistency training programs to improve their potential score and eventually these people may become an asset to the aviation industry. This example leads us to the additional issues of practice and training at the level of the individual. In recent work investigating time-accuracy functions in visual search tasks, interesting trends emerged when measuring PPT variables and individual differences. The individual differences analysis revealed that although many of the participants followed the general group trend, there were some participants with uniquely different performance, with observed performance improving with increases in consistency (Rice et al., 2011). We also present a large amount of data regarding individual differences between participants in both Experiments 1 and 2. It is important to be clear about the interpretation of these individual differences with respect to the more general group-level findings. Without the individual differences analysis, the general conclusion from the present data would be that a large decrement in observed performance is due

PPT AND RECOGNITION

22

to inconsistent responding. However, when a fine-grained look is given to the individual level analyses, an interesting finding becomes more readily observable. It is clear from these analyses that not all individuals have poor performance from increasing set sizes that stem from decreases in consistency, instead what seems to be happening with a few of the participants is that their observed scores decrease due to changes in the favorability of systematic factors. Accounting for individual differences is important  because,  instead  of  a  “one  size  fits  all”   type  of  training  technique,  PPT  allows  for  personalized  training  that  can  target  an  operator’s   consistency in order to bring observed performance closer to potential performance. No other methodology allows for knowing a priori what type of training a person should receive in order to reach their potential capacity for the task. The goal of the present paper was not to outline or generate different training programs; each task would certainly need to be accommodated by different training parameters relevant to the task-goal. Instead, our goal was to make other researchers aware of new factors to consider when designing training regimes. The additional analysis of individual differences may act as a prescription for future training studies and training programs. For example, Trafimow and Rice (2009) used PPT to understand how practice influences potential scores, observed performance, and consistency of task performance. They demonstrated that practicing on a task may result in changes in potential scores, consistency or both, and this is largely dependent on the individual. Further, they found that practice may cause changes in potential scores and consistency that cancel each other out (consistency may increase while potential scores decrease resulting in no practice effect on observed performance, or potential scores may increase while consistency decreases). So it may be that training or practice programs do in fact result in performance improvements, however it may not be observable by traditional measures of observed performance, such as percent correct.

PPT AND RECOGNITION

23

Instead, the changes that are a function of training or practice may be at the level of potential scores or consistency. Future research should investigate different consistency training programs that can be used to improve overall performance. In the future, pursuing individual differences mixed with PPT variables seems like a promising direction for researchers interested in improving recognition performance and more generally, improving performance in tasks that are cognitively demanding. Researchers should investigate whether systematic or random factors are contributing to the performance decrement and design appropriate training programs. Additionally, it may be beneficial for researchers to manipulate particular systematic processing instructions and measure PPT variables to identify the most efficient processing strategy for the task at hand. This information could then be used to provide guidelines and suggestions regarding how operators process the information. Future work in this area is essential before drawing solid conclusions regarding the assessment, selection, and training of different applicants. However, the present data are promising, in that they show some new information regarding the nature of individual differences in the ability to recognize information. In future work, researchers should consider some of the possible negative implications of using performance measures (e.g., percentage correct) that do not parse systematic (non-random) and non-systematic (random) components of task performance. Additionally, future research should investigate how user interfaces interact with PPT variables and people’s  ability  to  respond, remember, and recognize information. Designers should be aware that randomness can impact the ability to compare information. When designing a user interface that switches between information or updates information, (e.g., air traffic control display) the designer should keep in mind that consistency plays a role in the user correctly recognizing information. Ergonomics designers should be

PPT AND RECOGNITION

24

informed about both systematic and random factors that affect task performance. Hopefully this new knowledge will lead to more user-friendly designs, reducing errors and increasing safety. Although the present study was conducted in a lab, the findings are still potentially applicable to a number of real world issues, even more than the examples mentioned here. Limitations Because PPT is a new theory, it is important to be clear about what it can and cannot do. At the level of potential performance and consistency, PPT provides unambiguous results and solid conclusions. At the level of particular theorized processes, PPT is limited unless accompanied by strong theorizing. To see the ambiguities, consider again that the number of items presented failed to influence potential scores to any particular degree. The similarity of potential scores, no matter how many items were presented, makes it tempting to conclude that people used the same chunking (Gilchrist & Cowan, 2012; Miller, 1956), rehearsal (Baddeley,1986), or other systematic processes. Although this might be so, it also is possible that people used a range of strategies depending on the number of presented items, but that the processes used were equally effective if used consistently. The present data do not provide compelling support or disconfirmation for either of these possibilities. Simply put, PPT can tell you how much of the observed score is due to non-random strategies; however, it cannot tell you which strategies are being employed without an accompanying theory. At the moment, these research possibilities into strategies using PPT are necessarily speculative. However, our caution in interpreting the strong support for the Consistency Hypothesis with respect to specific recognition strategies should not obscure what has been accomplished. Both Experiments provide strong evidence that some of the limitations in recognition abilities, of the type investigated here, are due to randomness rather than to

PPT AND RECOGNITION

25

systematic changes in recognition strategies depending on the set size (cognitive load). In the past, it would be difficult to see why researchers would suspect that randomness could be so important in recognizing information. Common sense seems to indicate that there is no way that randomness can explain systematic effects, but contrary to common sense, randomness can be expressed systematically as in PPT—increases in randomness decrease performance whereas decreases in randomness increase performance. The present experiments using PPT demonstrate an advantage of applying basic theory to applied issues and uncovering novel findings with real world applicability. Copious amounts of psychological research begin in the laboratory and then the findings are generalized to outside of the laboratory. The present study was conducted in a laboratory under controlled conditions but there are reasons to be optimistic about the generalizability of the findings. Participants completed a number of trials on the computer while their viewing distance was controlled. This is not so artificial from the responsibility of an air traffic controller who is presented with brief strings of information. Next, the old information is compared to similar new information, e.g., comparing altitudes between various aircraft. The implications might generalize to real-world tasks such as this air traffic control example. Conclusions drawn from lab studies do not always apply to the real world, however, sometimes they do apply and it is an important next step for future research to conduct PPT studies outside of the lab, in more applied settings. Conclusion The present experiments are an example of how basic research can assist applied research. The value of basic research is sometimes questioned because it is difficult to see how it contributes to application. At first blush, it may have been less than clear to see how an

PPT AND RECOGNITION

26

understanding of consistency would lead to an applied contribution. However, these findings identify consistency as an important performance component of people’s ability to recognize information. Additionally, the findings that there are individual differences between potential scores and consistency may have important implications regarding the nature of individual differences in task performance, especially for applicant screening and the development of person-specific training programs. The present work expands on the relatively new literature on PPT and opens the door for a burgeoning of future research integrating PPT with particular process theories related to recognition strategies as well as providing insight into future applicant recruitment and training programs.

PPT AND RECOGNITION

27 Acknowledgments

The first author was supported in part by a Howard Hughes Medical Institute Scientific Teaching Fellowship, under Award No. 52006932 and a National Science Foundation GK-12 Fellowship, under Award No. DGE-0947465. The authors wish to thank Perception and Performance lab members at NMSU for their assistance in collecting data.

PPT AND RECOGNITION

28 References

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole Publishing Company. Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychological Science, 15, 106–111. Awh, E., Barton, B., & Vogel, E. K. (2007). Visual working memory represents a fixed number of items regardless of complexity. Psychological Science, 18, 622–628. Bays, P. M., & Husain, M. (2008). Dynamic shifts of limited working memory resources in human vision. Science, 321, 851–854. Brunswik, E. (1952). The conceptual framework of psychology. Chicago: University of Chicago Press. Chen, Z., & Cowan, N. (2005). Chunk limits and length limits in immediate recall: A reconciliation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1235-1249. Chen, Z., & Cowan, N. (2009). Core verbal working-memory capacity: The limit in words retained without covert articulation. The Quarterly Journal of Experimental Psychology, 62, 1420–1429. Cohen, R. J., & Swerdlik, M. E. (1999). Psychological testing and assessment: An introduction to tests and measurements (4th ed.). Mountain View, CA: Mayfield. Connor, E. C., Egeth, Y., & Yantis, S. (2004). Visual attention: Bottom-up vs. top-down. Current Biology, 14, R850 – R852. Cowan, N. (2000). The magical number four in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185.

PPT AND RECOGNITION

29

Crannell, C. W., & Parrish, J. M. (1957). A comparison of immediate memory span for digits, letters, and words. Journal of Psychology, 44, 319–327. Crocker, L., & Algina, J. (1986). Introductions to classical and modern test theory. New York: Holt, Rinehart and Winston. Fukuda, K., Awh, E., & Vogel, E. K. (2010). Discrete capacity limits in working memory. Current Opinion in Neurobiology, 20, 177 – 182. Gilchrist, A.L., & Cowan, N. (2012). Chunking. In V. Ramachandran (ed.) Encyclopedia of human behavior, Vol. 1. San Diego: Academic Press. (pp. 476-483) . Gulliksen, H. (1987). Theory of mental tests. Hillsdale: NJ: Lawrence Erlbaum Associates, Publishers. Hintzman, D. L. (2011). Research Strategy in the Study of Memory: Fads, Fallacies, and the Search  for  the  “Coordinates  of  Truth”.  Perspectives on Psychological Science, 6, 253 – 271. Hunt, G., Rice, S., Geels, K., & Trafimow, D. (2010). Analyzing sub-optimal humanautomation performance across multiple sessions. Proceedings of the 54 th Annual Meeting of the Human Factors and Ergonomics Society. Hunt, G., Rice, S., Trafimow, D., & Sandry, J. (2013). Using potential performance theory to analyze systematic and random factors in enumeration tasks. American Journal of Psychology, 26(1), 23-32. Hunt, G., Rice, S., Trafimow, D., Schwark, J., Sandry, J., Busche, L., & Geels, K. (2011). Visual vs. auditory memory in an aviation task: A Potential Performance Theory analysis. Proceedings of the 16 th Annual International Symposium of Aviation Psychology.

PPT AND RECOGNITION

30

Hunter, J. E. & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98. Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior, 29, 340-362. Jacobs, R., Cushenbery, L., & Grabarek, P. (2011). Assessments for selection and promotion of police officers In J. Kitaeff (Ed.), Handbook of Police Psychology (pp. 193-). New York, NY: Routledge. Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G., & Moore, K.S. (2008). The Mind and Brain of Short-Term Memory. The Annual Review of Psychology 59, 193-224. Kaplan, R. M., & Saccuzzo, D. P. (2005). Psychological testing: Principles, applications, and issues. (6 ed., p. 745). Belmont, CA: Thomson Wadsworth. Kelley, H. H., & Michela, J. L. (1980). Attribution theory and research. Annual Review of Psychology, 31, 457–501. Lavie, N. (2010). Attention, distraction, and cognitive control under load. Current Directions In Psychological Science, 19, 143-148. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley Publishing Company. Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 309, 279–281. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97.

PPT AND RECOGNITION

31

Orvis, B. R., Cunningham, J. D., & Kelley, H. H. (1975). A closer examination of causal inference: The roles of consensus, distinctiveness, and consistency information. Journal of Personality and Social Psychology, 32, 605–616. Pashler, H. (1988). Familiarity and visual change detection. Perception & Psychophysics, 44, 369–378. Reed, A. (1973). Speed-accuracy trade-off in recognition memory. Science, 181, 574–576. Rice, S., & Trafimow, D. (2012a). Time pressure heuristics can improve performance due to increased consistency: A PPT methodology. Journal of General Psychology, 139, 273288. Rice, S., & Trafimow, D. (2012b). Using Potential Performance Theory to assess differences in math abilities between citizens from India and the United States. Higher Education Studies, 2, 24-29. Rice, S., Geels, K., Hackett, H., Trafimow, D., McCarley, J. S., Schwark, J., & Hunt, G. (2012). The harder the task, the more inconsistent the performance: A PPT analysis on task difficulty. Journal of General Psychology, 139(1), 1-18. Rice, S., Geels, K., Trafimow, D., & Hackett, H. (2011). Our students suffer from both lack of knowledge and consistency: A PPT analysis of test-taking. US-China Education Review, 1(6), 845-855. Rice, S., Trafimow, D., & Hunt, G. (2010). Using PPT to analyze sub-optimal humanautomation performance. Journal of General Psychology, 137, 310–329. Rice, S., Trafimow, D. & Kraemer, K. (2012). Using Potential Performance Theory to assess how to increase student consistency in taking exams. Higher Education Studies, 2(4) 6874.

PPT AND RECOGNITION

32

Rice, S., Trafimow, D., Keller, D., Hunt, G., & Geels, K. (2011). Using PPT to correct for inconsistency in a speeded task. Journal of General Psychology, 138(1), 12–34. Schmidt, F. L. (2012). Cognitive tests used in selection can have content validity as well as criterion validity: A broader research review and implications for practice. International Journal of Selection and Assessment, 20 (1), 1-13. Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of work: Occupational attainment and job performance. Journal of Personality and Social Psychology, 86, 162– 173. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74, 1–29. Todd, J. J., & Marois, R. (2004). Capacity limit of visual short-term memory in human posterior parietal cortex. Nature, 428, 751–754. Trafimow, D., & Rice, S. (2008). Potential performance theory: A general theory of task performance applied to morality. Psychological Review, 115, 447–462. Trafimow, D., & Rice, S. (2009). Potential performance theory (PPT): Describing a methodology for analyzing task performance. Behavior Research Methods, 41, 359–371. Trafimow, D., & Rice, S. (2011). Using a sharp instrument to parse apart strategy and consistency: An evaluation of PPT and its assumptions. Journal of General Psychology, 138, 169-184. Trafimow, D., Hunt, G., Rice, S., & Geels, K. (2011). Using potential performance theory to test five hypotheses about meta-attribution. Journal of General Psychology, 138(2), 1–13. U. S. Equal Employment Opportunity Commission. (2010, September 23). Retrieved from http://www.eeoc.gov/policy/docs/factemployment_procedures.html

PPT AND RECOGNITION Unsworth, N., & Engle, R. W. (2006). Simple and complex memory spans and their relation to fluid abilities: Evidence from list-length effects. Journal of Memory and Language, 54, 68 - 80. Van Meeuwen, L. W., Brand-Gruwel, S., Van Merriënboer, J. J. G., De Bock, J. J. P. R., & Kirschner, P. A. (2010). Indicators for successful learning in air traffic control training. Paper presented at the 5th EARLI SIG 14 Learning and Professional Development Conference. Munich, Germany. Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in visual working memory capacity. Nature, 428, 748–751. Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27, 92–114. Wilken, P., & Ma, W. J. (2004). A detection theory account of change detection. Journal of Vision, 4, 1120–1135. Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114, 152-176. Xu, Y., & Chun, M. M. (2006). Dissociable neural mechanisms supporting visual short-term memory for objects. Nature, 440, 91–95. Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233–235.

33

PPT AND RECOGNITION

34

Figure 1. Timeline of an experimental trial. A single block of trials contained all 50 stimuli from each of the 6 set-sizes (randomly presented). Each participant was presented with the same block two times in random order each time, which culminated in a total of 600 trials, lasting 60 minutes. Both blocks contained identical number (Experiment 1) or letter (Experiment 2) strings as is necessary to perform the PPT analyses.

PPT AND RECOGNITION

35

Figure 2. Group data from Experiment 1 (number strings). Data presented in dark gray indicate observed performance (% correct). Data in light gray indicate what the potential score would increase by given perfect consistency.

Note: Due to sampling error, some potential score estimates were slightly above one (+.01 standard errors). Potential scores presented are truncated at 1.0.

PPT AND RECOGNITION

36

Figure 3. Individual data from Experiment 1 (number strings). Data presented in dark gray indicate observed performance (% correct). Data in light gray indicate what the potential score would increase by given perfect consistency.

Note: Due to sampling error, some potential score estimates were slightly above one (+.02 standard errors across the four participants). Potential scores presented are truncated at 1.0.

PPT AND RECOGNITION

Figure 4. Group data from Experiment 2 (letter strings). Data presented in dark gray indicate observed performance (% correct). Data in light gray indicate what the potential score would increase by given perfect consistency.

Note: Due to sampling error, some potential score estimates were slightly above one (+.006 standard errors). Potential scores presented are truncated at 1.0.

37

PPT AND RECOGNITION

38

Figure 5. Individual data from Experiment 2 (letter strings). Data presented in dark gray indicate observed performance (% correct). Data in light gray indicate what the potential score would increase by given perfect consistency.

Note: Due to sampling error, some potential score estimates were slightly above one (+.01 standard errors). Potential scores presented are truncated at 1.0.