Using Computational Models to Help Explain ... - Semantic Scholar

1 downloads 0 Views 111KB Size Report
Jan 10, 2003 - undergraduate student who is currently at home on Sunday morning, and ... go out that Sunday night with some wild and rowdy friends, but he ...
Models of Substance Abuse 1

Using Computational Models to Help Explain Decision Making Processes of Substance Abusers by Jerome R. Busemeyer Julie C. Stout & Peter Finn Indiana University January 10, 2003

Chapter to appear in Cognitive and Affective Neuroscience of Psychopathology Edited by Deanna Barch, Ph.D. Oxford University Press.

Send Correspondence to: Jerome R. Busemeyer Department of Psychology Indiana University Bloomington Indiana, 47405 jbuseme [email protected]

Models of Substance Abuse 2

Abstract The purpose of this chapter is to formulate a computational model that synthesizes two separate explanations for decision making deficits in substance abusers. One is based on poor planning caused by discounting of future consequences, and the other is based on poor learning caused by insensitivity to punishments or hypersensitivity to rewards. First we review empirical research on differences between substance abusers and non abusers with respect to discounting of future conseque nces. Second, we review empirical research that has revealed hypersensitivity to rewards or insensitivity to punishments by drug abusers as compared to non abusers on the Bechara-Damasio simulated gambling task. Third we present a computational model of multistage decision making that includes planning for future consequences, and we apply this model to both the delay discounting paradigm and the simulated gambling paradigm. Finally, we make linkages between this computational model and the neurophysiological underpinnings of the model, and we discuss the implications of these linkages for understanding the development of substance abuse habits.

Models of Substance Abuse 3

Substance abuse is a complex problem that needs to be approached from many different scientific directions, with each providing a necessarily limited perspective and contribution. A reasonably complete understanding of the issues requires an interdisciplinary integration of the sociological, psychological, neurological, and genetic causes underlying this personal and social dilemma (West, 2001, provides a broad overview of theories). Even within a single disciplinary point of view, such as psychology, an overwhelming number of factors must be considered including psychopathology, personality, cognitive, and motivational processes (see Finn, 2002; Jentsch & Taylor, 1999, Tiffany, 1990, for alternative views). This chapter focuses on a specific, yet vital, topic concerning substance abuse – the decision making process (cf. Skog, 2000). The purpose of this chapter is to formulate a computational model1 that synthesizes a number of separate lines of research directed at understanding decision making processes of substance abusers. Examining the issue from this restricted point of view allows us to delve more deeply into a few of the basic psychological mechanisms underlying substance abuse. Furthermore, we link these decision making processes to their neuro-physiological underpinnings. 1. A Prototypic Drug Abuse Decision To begin our formal analysis, we ask, what are the basic theoretical ingredients that go into making decisions in a substance abuse setting? Figure 1 is a “decision tree” 1

Marr (1982) proposed three levels of theorizing. At the top, computational level, theories are designed to explain the abstract goals that a cognitive system is trying to achieve (e.g., maximize expected utility). At the next, algorithmic level, theories are designed to explain the dynamics used to achieve the top level goals (e.g., learning processes). The final, implementation level, describes the neuro-physiological basis of the second level (e.g.,nerual synaptic modifications). Here we discuss all three levels.

Models of Substance Abuse 4 diagram of a typical ‘self-control’ decision problem (based on Kanfer & Karoly, 1972), and one that is frequently faced by drug abusers. This is an oversimplification of the problem, but it is sufficient to introduce the basic theoretical concepts. 2 Imagine an undergraduate student who is currently at home on Sunday morning, and he is invited to go out that Sunday night with some wild and rowdy friends, but he also has to study for a calculus test that is scheduled for early Monday morning. At the current decision (the square node labeled D1 ), he must decide whether to stay alone at home and study, or go out with his friends. If he decides to stay at home and study (take the downward branch), then he will miss all the immediate fun, but he is likely to do pass the exam the next day (the two branches of the circular chance node E1 represent the different possible outcomes of the exam after this choice). If he decides to go out with his friends, then he will face a second decision – that is, he may return soon after dinner and go home and study, or he may stay out too late, abusing alcohol and possibly other drugs. In the former case, he gets to enjoy being with his friends for a short while, and he still has a very good chance of passing the exam. In the latter case, he gets to have fun all night, but he will suffer the next day -- feeling ill and tired and likely to fail the exam. At the very end of each path of the tree are the terminal nodes, or final consequences, that follow each particular sequence of actions and events.

2

More abstractly, the first stage can be viewed as a choice between experimenting with drugs or not. If the decision in the first stage is to experiment, then the second stage is a choice between stopping after a short time, or continuing and overindulging in drug use.

Models of Substance Abuse 5

Figure 1: Self – Control Decision Problem

Stay out Late Go out with friends

D1

Pass Stay home and study

C1

Fail

C2

Pass

C3

Fail

C4

E1

D2 Return Early

Pass

E2

C5

E3 Fail

C6

This simple decision tree entails most of the basic ingredients discussed by decision theories (see, e.g., Clemen, 1996). First of all, the decision involves planning, because the student needs to consider the future decision at node D2 before choosing at the current decision node D1 . Second, the decision involves risk or uncertainty about the chance events (E1 , E2 , or E3 ) that follow each action. Third, the decision involves the evaluation of the rewards and punishments produced by the consequences (C 1 – C6 ) of each decision. Decision theorists use backward induction to determine the optimal strategy for decision trees such as the one shown in Figure 1 (see Clemen, 1996). Working backwards from decision node D2 , the optimal choice is to return home early, because this maximizes the chances of passing the important calculus exam. Given that the student plans to return home early, the optimal choice at node D1 is to go out with his friends, because this maximizes the enjoyment with friends while at the same time produces the

Models of Substance Abuse 6 same high chance of passing the exam. Thus one could argue that the optimal strategy, producing the highest expected payoff, is to go out to dinner with friends and then return early. By doing this, the student gets to enjoy time with friends and he still has a very good chance of passing the exam. There is a danger, however, with this optimal strategy. The student may fail to follow through on the optimal plan, and change his mind midstream when he reaches decision node D2 . In other words, he may become impulsive and lose control at this later stage, foregoing his important, long term, academic interests for the immediate fun of staying out late with his friends. The decision to stay at home and study would be considered an act of self-control in this situation (Kanfer & Karoly, 1972). By staying home, the student prevents the opportunity and removes the temptation of carousing all night with his friends. The self control strategy may not be optimal, but it safely guarantees that the student prepares well for the exam the next day. From this decision theory point of view, substance abuse may result either from poor planning, or failure to learn the rewards and punishments produced by various strategies. Next we review experimental evidence from laboratory research which examines the influence of each of these components on decision processes of substance abusers. 2. Plans and the evaluation of future consequences Suppose the student facing the decision in Figure 1 makes a plan at decision node D1 to go out with his friends and then return early. Also suppose that after he carries out the first stage of the plan, and actually goes out with his friends, he changes his mind and decides to stay out late and have fun instead of returning early. This is an example of

Models of Substance Abuse 7 what is called dyna mic inconsistency (Busemeyer, et al., 2001; Cubitt, Starmer, & Sugden, 2002; Thaler, 1981). One psychological reason why this occurs is because drug abusers discount future consequences more rapidly than non abusers (Kirby, Petry, & Bickel, 1999). Here is how discounting works. Although every action produces both immediate as well as future consequences, the latter may receive less attention or less weight than the former. Therefore, the value of a future consequence may be discounted or reduced according to its delay of occurrence. In many cases, this is a reasonable or rational process. For example, if a businessman is given a choice between $1000 immediately or $1500 in ten years, he may prefer the immediate $1000 dollars, because he can invest this money for 10 years at a 5% yearly interest rate, and his money will grow to (1.05)10 (1000) = $1629 in ten years, which is larger than the alternative $1500 offer. In other words, after discounting the $1500 alternative for the ten year delay, ($1500)/(1.05)10 = $921, it is worth less than the immediate value of $1000. Exponential Disounting Model. There are two models for discounting that have been proposed to explain drug abuse decision making. One is based on economic theory (Becker & Murphy, 1988), whic h is called the exponential discounting model. According to this model, the value of a future reward of magnitude r that is received after a delay of d time units is given by v(r,d) = v(r,0) / (1+α)d

(1)

For example, if we set the value of an immediate gain of $1000 equal to v(1000,0) = $1000, and we set α = .05, then v($1500,10) = ($1500)/(1.0510 ) = $921. The exponential discount rate was formulated to satisfy the following temporal consistency property: if

Models of Substance Abuse 8 an individual prefers (reward r1 at no delay) to (reward r2 at delay d) then that same person should prefer (reward r1 at delay h) to option (reward r2 at delay d + h). For example, if an individual prefers ($1000 at no delay) to ($1500 at 10 year delay) then that person should also prefer ($1000 at 20 years delay) to ($1500 at 30 years delay). In other words, adding a common 20 year delay to both options should not change the person’s preference. This follows from the mathematical fact that v(r1 ,0) = v(r1 ,0)/ (1+α)0 > v(r2 , d) = v(r2 ,0) / (1+α)d à v(r1 ,h) = v(r1 ,0) / (1+α)h > v(r2 ,d+h) = v(r2 ,0) / (1+α)d+h = [v(r2 ,0) / (1+α)d ]/(1+α)h . As can be seen from the second inequality, (1+α)h cancels out of both sides, yielding the first inequality. However, this temporal consistency property has been shown to be systematically violated in animal and human choice experiments (Anslie & Haslam, 1992, Kirby & Herrsntein, 1995) – under some specific conditions, when given a choice between a small immediate reward versus a large delayed reward, individuals prefer the small immediate reward; but when a common delay is added to both options, their preferences reverse. Hyperbolic Discounting Model. Temporal inconsistency has been explained by an alternative model, called the hyperbolic discounting model, which originated from operant conditioning theory (see Anslie & Haslam, 1992), but has recently been applied to drug abuse decision making (see Bickel & Marsch, 2001, for a review). According to the hyperbolic model, the value of a future reward of magnitude r that is received after a delay of d time units is given by v(r,d) = v(r,0) / (1 + k⋅d)

(2)

Models of Substance Abuse 9 For example, if for simplicity, we set a discounting rate parameter, k, equal to 1, then v($1500,10) = $1500 / (1+ 1⋅10) = $1500/11 = $136. This model easily account for violations of temporal consistency. For example, suppose once again that k = 1, then v($1000,0) - v($1500,10) = $864, but v($1000,20) - v($1500,30) = -$1. In other words, delaying both options by the same amount reverses the preferences! It is informative to analyze the self- control decision problem in Figure 1 using the hyperbolic discounting model (illustrated in Figure 2). To be concrete, suppose the student judges the immediate value of staying out late and having fun as 45 on a scale ranging from zero to 100. The lower curve in Figure 2 was produced by setting v(r,0) = 45 and generating discounted values at 12 and 24 hour delays using Equation 2 with k = 1/12. Also suppose the student judges the immediate value of returning early and probably passing the test the next day as 80 on the same scale. The upper curve in Figure 2 was produced by setting v(r,0) = 80 and again generating discounted values at 12 and 24 hour delays using Equation 2 with k = 1/12. First consider the student’s plan on Sunday morning concerning the future decision for Sunday night at decision node D2 . On Sunday morning, he evaluates v(staying out late and having fun with 12 hour delay) = 45/(1+12⋅k) = 45/2 = 22.5 < v(returning early and passing the test 24 hours later) = 80/(1+24⋅k) = 80/3 = 26.67, and so he plans that morning to return early that night. (Compare the lower curve at time delay 12 to the upper curve at time delay 24 in Figure 2). Now consider the student’s situation on Sunday evening, after he has gone out to dinner with his friends, but before he has decided whether to return early or stay out late. On Sunday night, he revaluates v(staying out late and having fun with no delay) = 45/(1+0⋅k ) = 45 > v(returning early and passing the test 12 hours later) = 80/(1+12⋅k) =

Models of Substance Abuse 10 40, and now he decides to stay out late at this point. (Compare the lower curve at time delay zero to the upper curve at time delay 12 in Figure 2). In other words, the hyperbolic discounting function leads the student into making a bad plan to go out with his friends, thinking that he will return early and pass the test; but unfortunately, he later he changes his mind and decides to stay out late, and risks failure. Figure 2: Hyperbolic Discounting Functions used to explain the student’s decision to stay out late and abuse drugs

90 Value of Returning Early

80 Discounted Value

70

Value of Staying out Late

60 50 40 30 20 10 0 0

12

24

Time Delay in Hours

The important lesson to be learned from this example is that the hyperbolic discounting rule can mislead drug abusers into to making poor plans. This discounting rule can fool them into thinking that they will choose the larger delayed reward when they confront the choice in the future; but in fact, when they eventually reach that decision, they lose control and choose the smaller immediate reward. Empirical evidence on discounting. The discounting rate for an individual can be empirically estimated by presenting the person a time delayed outcome and asking him or her to match this with another outcome that has no delay. Formally, the person is asked to

Models of Substance Abuse 11 solve the following indifference equation: given (reward r1 , at delay d) find rm such that v(rm, 0) = v(r1 , d). For example, a cigarette smoker could be asked: what number of cigarettes would be needed immediately to match receiving a package of 60 cigarettes in 14 days. Suppose the individual judges that having 40 cigarettes immediately is equal in value to having 60 cigarettes in 14 days. Then Equation 2 is used to solve for the discounting rate parameter, v(40,0) = v(60,14) implies 40 = 60/(1+k⋅14) implies k = (6040)/(14*40) = .0357. To get a better estimate of this discounting parameter, a series of matched values is obtained from an individual, and the discounting rate parameter, k, is estimated using a nonlinear regression analysis. Bickel and Marsch (2001) review a large number and variety of laboratory studies providing evidence for both (a) higher discounting rates produced by substance abusers, and (b) preference reversals produced by substance abusers. This includes research on heroin addicts, alcoholics, problem gamblers, and even cigarette smokers. The higher discount rates for drug abusers occurred even when participants were matched for age, IQ, and income. Evidence was obtained using various types of rewards including hypothetical and real money, hypothetical heroin, and cigarettes. For example, the average discount rate obtained from heroin users was two times faster than that obtained from non – user controls when real money was at stake. Furthermore, among heroin users, the discount rate for amounts of heroin was much higher than the discount rate for money. Higher discount rates from drug abusers suggest that they are also more impulsive, and as expected, discount rates are positively correlated with personality tests of impulsivity (see Bickel & Marsch, 2001, for a review).

Models of Substance Abuse 12 3. Learning from past experience with rewards and punishments. Many students may plan poorly on their first encounter with the self- control problem shown in Figure 1, because they have never experienced the negative consequences of these actions before. However, with repeated experiences, most students can learn to anticipate the future negative consequences, and thus avoid making bad plans on later occasions. Substance abusers fail to learn to make the appropriate choices even after repeated experiences, and instead they persist in choosing the actions that ultimately lead to bad consequences. This type of decision making deficit among drug abusers has recently been demonstrated in several laboratory experiments described next. The simulated gambling task. A number of recent studies of decision making in drug abusers have employed a simulated gambling task developed by Bechara and Damasio and colleagues (see, e.g., Bechara, Damasio, Damasio, and Anderson, 1994). The task was originally developed to examine decision making deficits exhibited by orbital frontal cortex damaged individuals, but subsequently it has also proved very effective for studying decision making deficits in drug abusers (see Bechara & Damasio, 2002; Bechara, Dolan, & Hindes, 2002; Grant, Contoreggi, & London, 2000; Mazas, Finn et al., 2000; Petry, Bickel et al., 1998). The gambling task is briefly described next. The decision tree for this problem is shown in Figure 3 below. The decision maker is shown four decks of cards, labeled A, B, C, and D. On each trial, the player chooses one card from the top of one of the decks, and the monetary gain and or loss produced by choosing that card are delivered, and then the card is discarded. This procedure is repeated for one hundred or more trials, and the player’s goal is to accumulate as much money as possible across trials. Initially, the payoffs produced by

Models of Substance Abuse 13 each deck are unknown, but through experience, the player gradually learns what to expect from each deck. Figure 3: The Bechara-Damasio Gambling Task

A B D1 C D

win 100 every card lose 1250 every 10 cards win 100 every card lose 150,200,250,300,350 win 50 every card lose 50 every other card win 50 every card lose 250 every 10 cards

The four decks are designed in such a way that two of the decks (labeled C and D) are "advantageous", while the othe r two decks (labeled A and B) are "disadvantageous" in the following sense. Each "advantageous" deck produces an average gain of $25, averaged across the 40 cards within a deck; while each "disadvantageous" deck produces an average loss of $25, averaged across the 40 cards within a deck. However, the payoffs are arranged and sequenced in a manner that makes it difficult for the player to learn this fact. The "advantageous" decks always produce a small immediate win of $50, while the "disadvantageous" decks always produce a large immediate win of $100. Thus the "disadvantageous" decks appear at first sight to be superior, at least with respect to the amount to win. But this is misleading because the "disadvantageous" decks also produce larger losses than the "advantageous" decks. The "advantageous" deck C yields a loss of $250 once every block of 10 cards within its deck, whereas the "disadvantageous" deck A yields a loss of $1250 once every block of 10 cards within its deck. The other

Models of Substance Abuse 14 "advantageous" deck D yields five losses of $50 within each block of 10 cards in that deck for a total loss of $250 within each block, whereas the other "disadvantageous" deck B yields losses of $150, $200, $250, $300, and $350 within each block of 10 cards in that deck for a total loss of $1250 within each block. The common finding from a variety of studies using the simulated gambling task is that all the participants (non abusers and abusers) start out choosing from the disadvantageous decks. Soon afterwards, the non abusers gradually learn to shift their choices away from the disadvantageous decks and develop an increasingly stronger preference for the advantageous decks across training. However, the drug abusers persist in choosing from both the disadvantageous as well as advantageous decks throughout training, and never learn to avoid taking from the disadvantageous decks (the left versus middle or right panels in Figure 3, below, illustrate the general nature of the findings, although this figure is generated from a model described later). This finding has been reported with heroin abusers (Petry, Bickel et al., 1998), cocaine abusers (Bartzokis, et al. 2000; Grant, et al. 2000), alcohol abusers (Mazas, et al. 2001), and general substance abusers (Bechara et al. 2002a, 2002b). Similar findings, using a different type of gambling task, have been reported by Rogers et al. (1999) with cocaine abusers. In general, these findings have been obtained after controlling for age, and IQ. Furthermore, this decision making deficit is correlated with faster discounting in a delayed discounting task (Monterosso et al., 2001), and it is also correlated with impulsiveness and anti-social personality measures (Mazas, et al., 2001). The Bechara - Damasio gambling task was designed to examine the complex interplay between cognitive and motivational processes. Consequently, performance on

Models of Substance Abuse 15 the task depends on multiple processes including (a) motivational processes used to evaluate gains and losses, (b) associative processes used to learn action-outcome contingencies, and finally (c) response mechanisms controlling whether choices are made in an optimal or reckless manner. In short, the decision-making deficits exhibited by drug abusers may result from individual differences on any combination of the above three processes. Thus a theoretical problem arises when trying to identify the exact causes of these decision- making deficits. Because they are produced by complex cognitivemotivational interactions, it is difficult to sort out the contributions of each specific process to the observed behavioral deficits. Expectancy – Valence Model . Busemeyer and Stout (2002) developed a cognitive model of decision-making for the Bechara – Damasio gambling task, called the expectancy – valence model (EV model). This model yields three parameters that describe the motivational, learning/memory, and choice consistency components of the data. The mathematical details of the model are described later in section 4. At this point, we will only summarize the basic concepts needed to understand each of the three model parameters. Definition of the motivational parameter, w: The first assumption of the model is that the decision maker experiences an affective reaction to the wins and losses produced by each selection, called a valence. Formally, this valence is represented by a weighted average of the losses and gains produced by the chosen deck (w is the attention weight applied to the loss, and 1-w is the weight applied to the gain). The weight parameter represents the motivational significance of the loss relative to the win experienced with each selection. For example, drug abusers may have a small weight for losses and persist

Models of Substance Abuse 16 in choosing from the disadvantageous decks because they are hypersensitive to the rewards or insensitive to the large losses produced by these decks. Definition of the learning/memory parameter, a: The second assumption of the model is that the decision maker forms expectations about the consequences delivered by each deck, based on past experience. Whenever a deck is chosen, the newly experienced valence produced by that choice modifies the expectancy for that deck. Formally, the new expectancy for a deck is a weighted average of the previous expectancy and the new valence. The weight allocated to the newly experienced valence is denoted a, which is an updating rate parameter. Large values of a indicate strong recency effects and more rapid forgetting of past outcomes. Small values of a indicate the persistence of influences over longer spans of selections. For example, disadvantageous choices in drug abusers may result from a very high learning rate, which would cause them to focus on recently experienced payoffs, and forget the infrequently occurring large losses produced by these decks. Consistency of choice behavior, ?: The third assumption of the model is that the decision maker's choice on each trial depends on the consistency with which he or she applies the learned expectancies to the choice made on each trial. This consistency is represented by a parameter denoted θ. When the consistency is very low, choices are random, reckless, impulsive, and independent of the expectancies; when consistency is very high, the deck that has the maximum expectancy will almost certainly be chosen on each trial. More formally, the probability of choosing a deck is a ratio of the strength of that deck relative to the sum of the strengths of all the decks. Furthermore, the strength of a given deck is determined by the expectancy for that deck multiplied by the consistency

Models of Substance Abuse 17 parameter. For example, drug abusers may fail to select equal proportions of advantageous and disadvantageous cards because they respond inconsistently and unreliably to their expectations, and their choices are mostly random guesses, due to low values on the consistency parameter, θ. Figure 4 illustrates how different combination of parameters from the EV model can reproduce the basic decision making deficit observed by drug abusers compared to non abuser controls. Each panel presents the probability of choosing from the advantageous decks as a function of the number of cards selected. The far left panel is a simulation using parameters designed to reproduce the non abusing controls. The middle panel is a simulation using parameters designed to reproduce drug abusers who put too little attention on losses. The far right panel is a simulation using parameters designed to reproduce drug abusers with strong recency effects and with the choice consistency decreasing across training. As can be seen in the figure, fundamentally different psychological processes are able to produce similar patterns of deficits observed in experiments with the simulated gambling task. Therefore, one cannot determine the source of the decision making deficit without performing a cognitive modeling analysis.

Models of Substance Abuse 18 Figure 4: Simulations from the EV model for the Simulated Gambling Task

Conclusions from cognitive modeling analyses. The parameter values of the EV model are estimated separately for each participant by solving for the parameters that, according to the model, maximize the likelihood of each participant's data (see Busemeyer & Stout, 2002, for details). Thus, for each participant, the modeling process yields three theoretically derived parameter estimates of the three hypothesized psychological processes that influence choice behavior. These parameter estimates can be used as additional individual difference measures and correlated with other measures such as impulsiveness, antisocial personality, and severity of drug use. This cognitive modeling analysis of the simulated gambling task has been applied to three different populations of individuals. The first application (reported in Busemeyer & Stout, 2002) was to data collected by Stout, Rodawalt, and Siemers (2001) on

Models of Substance Abuse 19 individuals suffering from progressive neurodegenerative disease, Huntington Disease. The second application (reported in Stout, Busemeyer, Grant, & Bonson, 2002) was to data collected by Grant et al. (2000) on cocaine drug abusers. The third application (Stout, Busemeyer, & Damasio, in progress) was to data collected by Bechara and colleagues on individuals with orbital prefrontal cortex damage. It is worth pointing out that Huntington Disease causes damage in the basal ganglia (Stout et al., 2001), and furthermore, cocaine abuse causes damage to the dopamine circuits connecting the basal ganglia to the prefrontal cortex (see Jentsch & Taylor, 1999, for a review). Thus, because of the common neural circuitry, there are some neurological bases for expecting to find a common deficit among these populations. All three populations demonstrated a similar pattern of the decision making deficit on the Bechara – Damasio gambling task: that is the normal or healthy controls chose from the advantageous decks significantly more often than the target (Huntington, drug abuse, orbital frontal) group. The crucial question is whether the common behavioral deficit observed among all three populations is produced by a common psychological process (e.g., motivational). Alternatively, it may be the case that different psychological processes are responsible for the same behavioral deficit. The results of the analyses revealed that the latter was true. The primary process contributing to the poor performance of the Huntington Disease individuals was the learning/memory parameter, whereas the primary process contributing to poor performance for the cocaine and orbital frontal cortex damaged individuals was the motivational parameter. For all three data sets, the choice cons istency parameter also contributed to performance.

Models of Substance Abuse 20 More specifically, for the Stout et al. (2001) data, there was no significant difference between the Huntington Disease group and the control group for the motivational parameter, w; the learning/memory parameter, a, was significantly greater for the Huntington Disease group as compared to the controls, indicating stronger recency effects and more rapid forgetting for the Huntington disease individuals; and the consistency parameter was significantly lower for the Huntington Disease group, indicating that their choices were less optimal and more random as compared to the controls. For the Grant et al. (2000) data, the motivational parameter, w, was significantly smaller for the cocaine abuse group compared to the controls, indicating that this group allocated relatively less attention to the punishments (or relatively more attention to rewards) as compared to the controls; there was no significant difference in the learning/memory parameter; and the consistency parameter was significantly lower for the cocaine abusers as compared to controls. Finally, the results for the Bechara data were similar to those found with Grant et al. (2000), that is the motivational parameter, w, was significantly smaller for the orbital frontal cortex damaged individuals as compared to controls; there was no difference in the learning/memory parameter; and the sensitivity was lower for the orbital frontal individuals, although it was not significant in this case. In sum, the same behavioral deficit was produced by different underlying mechanisms, learning and memory processes for the Huntington Disease group, versus motivational processes for the cocaine abusing and orbital frontal cortex damaged groups. These findings underscore the importance of using cognitive modeling to help identify the cognitive bases of performance.

Models of Substance Abuse 21 Cognitive Modeling of Individual Differences. More generally, cognitive modeling is becoming an important new theoretical tool for identifying the psychological processes underlying complex behaviors in cognitive tasks. Actually, this approach has been used for many years, for example, in signal detection theory (Green & Swets, 1966), to estimate discriminability and bias parameters from performance on detection tasks. However, many new examples of this approach have appeared recently in a variety of application areas. Carter and Neufeld (1999) extended information-processing models developed by Townsend (1984) to identify stimulus-encoding abnormalities of schizophrenics. Riefer, et al. (2002) used their processing tree model to identify sources of memory deficits in schizophrenics. Ratcliff, Thapar, and McKoon (2001) applied their diffusion model to identify sources responsible for the slowing of responses observed in ageing populations. Nosofsky & Zaki (1998) employed their exemplar model of category learning to identify sources responsible for category learning deficits in amnesic individuals. Maddox and Filoteo (2001) used decision bound theory (Ashby & Gott, 1988) to identify sources responsible for categorization deficits observed in Parkinson Disease individuals. Sanfey et al. (2002) used SP/A theory (Lopes & Oden, 1999) to estimate utility parameters for prefrontal cortex damaged individuals. For all of these applications, mathematical models of a target task are developed and tested, and the parameters estimated from these cognitive models are used to describe the relative influences of task features on an individual’s performance. Using this approach, individual difference measures of the relevant cognitive processes are extracted from the very same performance data that one desires to explain.

Models of Substance Abuse 22 4. A general computational model for drug abuse decision making Up to this point, we have presented two separate explanations for decision making deficits in substance abusers: one is based on poor planning caused by discounting of future consequences, and another is based on poor learning caused by insensitivity to punishments or hypersensitivity to rewards. The purpose of this section is to present a general model that synthesizes these two separate components. More specifically, we present a general computational model designed to account for multistage decision tree problems that require planning, and it incorporates both the hyperbolic temporal discounting model and the EV learning model. The principles used to form this synthesis are based on a temporal difference (TD) learning algorithm used in artificial neural network models for feedback control of dynamic systems (see Sutton & Barto, 1999). Basic Concepts. To begin, we need to define the basic concepts and notation for the “temporal difference, expectancy valence” model (TD –EV model). Refer, once again, to the self-control problem illustrated in Figure 1. Each node in the tree is a new state of the decision maker’s world, which represents the decision maker’s knowledge of the past and current events up to that point. This state information serves as a cue for predicting future consequences. For example, when the student is in the state at node D2 , denoted S(D2 ), he is out on the town with his friends; later during the next morning, when he is in the state S(E1 ), he is feeling tired and ill and worried about his upcoming test; and finally, when he reaches the state S(C 2 ), he is informed that he failed the test. The decision maker experiences some immediate consequences produced by changing to a state. For example, if the student chooses to stay out late with his friends, that is the state changes to S(E1 ), then the immediate consequences are the fun that he

Models of Substance Abuse 23 experiences with his friends, and the affect of the drugs he uses that night. If the student fails the calculus test then next morning, that is the state changes to S(C 2 ), then the immediate consequences are the negative emotional experiences of failure. Valence. The first assumption is that the decision maker experiences an immediate affective reaction to the rewards and punishments when a new state, S, is reached, called the valence. The reward value associated with the immediate positive consequences is represented by a positive scale value, r(S) > 0; and the punishment value associated with the immediate negative consequences is represented by a negative scale value p(S) < 0. The valence is assumed to be a weighted average of the reward and punishment values: v(S) = (1-w)⋅r(S) + w⋅p(S).

(3)

If w < .5, then punishments are given less weight than rewards, which is consistent with the findings reported earlier for the cocaine abusers. Alternatively, if w > .5, then punishments are given more weight than rewards, which is consistent with the loss aversion principle of Tversky & Kahneman (1992) found with typical college students. Expectation Updating. The second assumption is that the decision maker forms expectations about the immediate and future consequences that are produced by each state, conditioned on information about the state of the world at that point. The current expectation for any particular state S is defined as Ev[S]. Now suppose that some action or event occurs, which changes the world to the current state S. Then, according to a temporal difference learning rule, the expectation for the current state, Ev[S], is changed by adding the following adjustment to it: ∇Ev[S] = α ⋅ { v(S) + γ(d) ⋅ Ev[S’]– Ev[S] }.

(4)

Models of Substance Abuse 24 Let us clarify this updating equation one term at a time. The coefficient α > 0 is simply a learning rate parameter; v(S) is the immediately experienced valence produced by the transition to the current state; Ev[S’] is the expectation for the next future state that follows the current state 3 ; and Ev[S] is the expectation for the current state, before the adjustment has been applied. The parameter, γ(d), is a discount that is applied to the next future state (using either the exponential model, Equation 1, or the hyperbolic model, Equation 2). The difference, {v(S) + γ(d)⋅Ev[S’] }– Ev[S], is called the ‘error signal’, as it represents the discrepancy between the decision maker’s prior expectation, Ev[S], and the target feedback, v(S) + γ(d)⋅Ev[S’]. It corresponds to the prediction error or surprise signal of the Rescorla and Wagner (1972) model in classical conditioning. Choice Rule. Each branch that flows out of each decision node corresponds to an action, and actions change the world to a unique new state. The third assumption is that each action is assigned a ‘strength.’ The strength of an action is an increasing (in particular, exponential) function of the expectation associated with the new state that it produces. The probability of choosing an action is determined by its strength relative to the strengths of all the other available actions (see, e.g., Sutton & Barto, 1999, p. 30). For example, at decision node D1 there are two actions: the self – control act of staying home and studying, which is the action that changes the states S(D1 )àS(E3 ); and going out with the rowdy friends, which is the action that changes the states S(D1 )àS(D2 ). The strength of the self-control action is defined as Q1 = exp{θ⋅Ev[S3 ]}, and the strength of

3

If S is a decision node, then S’ is the state produced by the next action selected by the choice rule described in Equation 5, and if S is a chance node, then S’ is the state produced by the next event selected with a probability that matches the learned probabilities for the events.

Models of Substance Abuse 25 the other action is defined as Q2 = exp{θ⋅Ev[D2 ]}. The probability, P, of choosing action S(D1 )àS(E3 ) over action S(D1 )àS(D2 ) is the ratio of strengths P = Q1 / (Q1 + Q2 )

(5)

The parameter, θ, in the exponential function is the choice consistency parameter. If this parameter is close to zero, then choices are almost random and independent of the expectations; and if this parameter is very large, then choice is almost always determined by the maximum expectation. In sum, the TD-EV model has four parameters: a weight parameter reflecting the amount of attention allocated to rewards versus punishments; an updating rate parameter that determines the rate of learning and recency effects; a discounting rate parameter that determines the influence of future consequences; and finally a choice consistency parameter that determines the likelihood that the chosen act is the same as the act with the maximum expectation. The main difference between the TD-EV model and the EV model is that the former is applicable to multiple stage decisions (a sequence of decisions), whereas the latter is limited to single stage decisions. For example, the simulated gambling task (Figure 3) is a single stage decision, whereas the self-control problem (Figure 1) is a two stage decision. Multiple stage decisions require planning ahead for future actions, and so they must include expectations beyond the immediate rewards. The EV model only includes the immediate feedback term, v(S), in the updating equation, whereas the temporal difference model includes a discounted expectation for future actions, γ(d)⋅Ev[S’]. If the discount parameter is set to zero, γ(d) = 0, then the future expectation

Models of Substance Abuse 26 term is eliminated, and the TD-EV model reduces to the EV model (this is equivalent to assuming an extremely rapid discounting rate). Figure 5 shows the results of two simulations generated by the TD- EV model applied to the self-control problem illustrated in Figure 1. 4 Each panel shows the learning process that develops as a function of the number of life experiences with this situation (ranging from 1 to 100 experiences). The solid curve in each panel represents the probability of choosing the self-control option (stay home and study) at decision node D1 . The dashed curve represents the probability of choosing the impulsive action at node D2 (stay out late with friends), given that self control was not selected at the first stage. The panel on the left was generated by a very low hyperbolic discount parameter (k = .01/24), and the panel on the right was generated by a very high hyperbolic discount parameter (k = 10/24). First consider the left panel, which incorporates planning for future consequences (a low discount rate). Early in training, the model most frequently chooses the selfcontrol option, because during this early phase, if the model ever reaches the second stage, then it tends to choose the impulsive choice. Later in training, the model learns not to take the impulsive choice at the second stage; now because it no longer requires the self – control option in stage 1, it tends to more frequently choose the optimal path (go out with friends and return early). Next, consider the right panel, which fails to incorporate planning for future consequences (high discount rate). In this case, self control becomes less favored with 4

The probability of passing the test, if student took time to study was set to .90; the probability of failing if the student did not study was set to .90; the payoffs for passing and failing the test were set to 100 and 0, respectively; the immediate reward for state S(E1 ) was 90; the immediate reward for state S(E2 ) was 50; the immediate reward for state S(D2 ) was 10; all other payoffs were zero; α = .25; θ = .03. This program can be downloaded from http://php.indiana.edu/~jbusemey/

Models of Substance Abuse 27 experience; instead, the impulsive choice grows in strength at the second stage. The asymptotic probability converges to a .75 probability of choosing the impulsive act (to stay out late and fail the test) whenever the second stage is reached.

Fi gure 5: Predictions generated by the TD-EV model of self-control

f1 5. Neurophysiological underpinngs of the TD-EVmodel During the past 10 years, considerable theoretical progress has been made toward identifying a neurophysiological basis for temporal difference learning (see Ho uk, Davis, & Beiser, 1995, for a general overview; and see Joel, Niv, & Ruppen, 2002, for a more recent review). Figure 6 (cf. Berke & Hyman, 2000) provides a rough guide for the basic

Models of Substance Abuse 28 mecha nisms and their interconnections. Many areas of the cerebral cortex, including numerous locations in the frontal cortex, have excitatory (glutamate) projections to the striatum. The striatum has inhibitory (GABA) projections to the internal pallidum, whic h in turn has inhibitory connections to the thalamus. Finally, the thalamus has excitatory projections to the frontal cortex, closing a cortical – basal ganglia loop.

Figure 6: Schema of neural connections between cerebral cortex and basal ganglia

Primary Reward

Cerebral Cortex

Frontal

+

Dopamine VTA/SNc

-

Ext Pallidum GPe

+

+

-

Subthalamic loop

Striatum

+

+ +

Thalamus

Pallidum GPi

Diffuse dopamine input is assumed to modulate the development of synaptic connections from the cerebral and frontal inputs to the striatum. These cortical-striatum connections are believed to guide a condition – action pattern recognition learning process. Later activation of the same pattern in the striatum then signals the onset of a behaviorally significant event, which is passed onto the frontal cortex via the thalamus, producing recurrent thalamo-cortical activation. The sustained activation in the frontal cortex serves a working memory function, maintaining this information for further processing needed for planning of actions and control of behavior. Input from these

Models of Substance Abuse 29 cortical command areas to the motor regions of the cortex ultimately lead to motor output. There is considerable evidence that the dopamine system (projecting from the VTA/SNC to the striatum) provides crucial neurochemical communication that modulates the changes in synaptic strengths in the striatum. It is hypothesized that dopamine is released in response to the error signal in the temporal difference learning model (see Schultz, 1997, for a review of evidence). Initially, during the early stages of learning, the dopamine neurons discharge in response to a reward that immediately follows an appropriate action. Later, as the reward becomes anticipated by an earlier signal, the dopamine neurons only discharge during the signal, and no longer fire in response to experience of the reward. Thus, the dopamine response gradually shifts toward earlier cues in the sequence of events leading to a reward. One of the earliest attempts to relate the temporal difference learning model to the dopamine system was formulated by Houk, Adams, & Barto (1997). This theory involves the circuits connecting the striatum to the dopamine neurons, summarized in Figure 6. According to this model, the striatum has direct, but slow acting, inhibitory projections to the dopamine neurons. It also has a fast acting, excitatory, subthalamic loop to the dopamine neurons. Initially, before the signal has been trained to predict the reward, a cue that signals the primary reward produces an eligibility trace that persists long enough to overlap with the dopamine response from the primary reward. Eve ntually, this establishes the ability of the cue to elicit this dopamine response itself. At this point, when the cue is presented, it produces an early excitatory signal to the dopamine system from the subthalamic loop, producing an early dopamine response. This is followed by a

Models of Substance Abuse 30 later inhibitory signal to the dopamine system via direct connection. The late inhibitory signal combines with the excitatory signal from the primary reward (e.g., an excitatory signal from the lateral hypothalamus), and the two cancel, leaving no dopamine response at the time of the primary reward. Now that the earlier cue is capable of generating a dopamine response, it inherits the capability of reinforcing even earlier cues, thus repeating the whole process, and working up the chain of earlier cues. With respect to the temporal difference model, the change in expectation, ∇Ev[S], corresponds to the dopamine discharge. The term, v(S), corresponds to the input to the dopamine system from the primary reward system (e.g., the lateral hypothalamus). The prediction for the next future state, Ev[S’], corresponds to the fast signal from the subthalamic loop. The negative of the current expectation, – Ev[S], corresponds to the slower inhibitory signal from the direct striatum - dopamine neuron connection. Although this theory already has been surpassed by more detailed timing mechanisms and more anatomically accurate improvements (see, e.g., Brown, Bullock, & Grossberg, 1999), the basic ideas continue to have a large influence on current theories of temporal difference learning (see Holroyd & Coles, 2002, for a recent review). 6. Implications for substance abuse. Incentive Sensitizaiton. Robinson and Berridge (1993, 2001) have accumulated a large amount of evidence indicating that through repeated drug experiences, stimuli associated with drugs develop an increased capacity for arousing drug seeking activities. Also repeated drug experience increases the salience of drugs or the amount of attention given to drugs. This is not to say that the hedonic pleasure experienced by using a drug in enhanced. On the contrary, this hedonic value may remain constant, while the

Models of Substance Abuse 31 incentive value increases through experience. Instead, the incentive value of a drug refers to the ‘wanting’ rather than the ‘liking’ of drugs, and these two aspects of drug experience may be uncorrelated. Evidence indicates that sensitization results from increased responsiveness of the mesolimbic dopamine system (Robbins & Everitt, 1999; Robinson & Berridge, 2001). (Referring to Figure 6, this refers to the projections from the VTA/SN to the ventral striatum. ) The ventral striatum includes the nucleus accumbens, which is responsible for instrumental learning of motivationally significant stimulus-response relations. The increase responsiveness of the mesolimbic dopamine system is believed to enhance synaptic plasticity (Berke & Hyman, 2000; Hyman & Malenka, 2001). According to the TD-EV model, state expectations guide choice behavior. Thus the incentive motivational value to strive for particular state S is determined by the expectation, E(S), for that state. From this point of view, sensitization reflects a growth in the expectations for states associated with drugs similar to that shown in the right panel of Figure 5. Note that these changes in choice probability for drug activity (motivation to seek drugs) are uncorrelated with the immediate reward value of drugs, which was assumed to be constant across experiences in this simulation. The increased salience of drugs or inc reased attention to drug related stimuli by drug abusers could also be represented in the model by the attention weight parameter, 1-w, in Equation 3. Changes in the dopamine system produced by drugs may alter the neurophysiological mechanisms that implement the TD learning process (Hyman & Malenka, 2001). One possibility is that repeated drug use causes the excitatory inputs to the dopamine system coming from the primary reward to overcome the direct inhibition

Models of Substance Abuse 32 contribution from the striatum (Berke & Hyman, 2000. p. 524). With respect to the TDEV model, this would cause the error signal, ∇Ev[S], to become dominated by the immediate reward input signal v(S) from the drug experience rather than the future expectation, E(S’) of the negative consequences tha t follow drug use. Under these conditions, the TD-EV learning algorithm will be driven more by the immediate as compared to the long term consequences, changing the learning trajectory from the development of self control as shown on the left panel to the impulsive behavior shown on the right panel of Figure 5. Impulsiveness. There is also evidence that drug abuse alters the dopamine system modulating learning in the frontal cortex, which is assumed to serve as an inhibitory system that prevents impulsive reactions to immediate rewards (Jentsch & Taylor, 1999). Damage to this system presumably releases this inhibition, allowing impulsive choice behavior to be expressed. With respect to the TD-EV model, alterations in the prefrontal cortex could block the ability to generate predictions for future states, Ev[S’]. This would again eliminate learning of long terms consequences, resulting in a change in the learning trajectory from self control to impulsivity. A tendency to focus attention to immediate reward experience of the drugs rather than future negative consequences of drug use can lead drug abusers to develop an increasing strong preference for substance abuse despite the long term destructive consequences (see Finn, 2002, for a similar view). Given the limited scope of this chapter, many other important aspects of drug abuse were not discussed. In particular, the dynamic nature of drug craving controlled by homeostatic regulation systems (Koob, 1998) was not covered. To incorporate these aspects into the model, we would need to reformulate the valences as a dynamic system

Models of Substance Abuse 33 rather than as fixed reward values. An initial computational model along these lines is described in Busemeyer, Stout, and Townsend (2002). Another aspect of drug abuse not covered here is the development of habitual or automatic routines in drug abuse behavior (Tiffany & Conklin, 2000). Concluding Comments: Neurophysiology, computational models, and behavior. Computational models provide a crucial theoretical purpose of mediating between the neuro-physiological mechanisms and behavior facts of drug abuse. It is very difficult to connect neuro-physiological mechanisms directly to behavior, without understanding the cognitive processes that summarize the functional significance of the neural mechanisms for behavior. In this chapter, we have shown three different types of contributions from cognitive models. First, they provide a theoretically based tool for measuring important individual differences such as impulsivity, or the salience of rewards relative to punishments by drug abusers as compared to controls. Second, they provide power for rigorously and precisely predicting the complex interactions of relevant factors on drug abuse decision making, as well as the dynamics of drug behavior over time as a function of experience. Third, they provide a means for summarizing the functional significance of various neural circuits, making it possible to link these neural mechanisms to behavior.

Models of Substance Abuse 34 References Ainslie, G. & Haslam, N. (1992) Irrationality, impulsiveness, and selfishness as discount reversals: In. Loewentsein, G. & Elster, J. (Eds.) Choice over time. pp. 57 -92. NY: Russel Sage. Ashby, F. G, & Gott, R. (1988) Decision rules in the perception and categorization of multi-dimensional stimuli. Journal of Experimental Psychology: Learning, Memory, Cognition, 14, 33-53. Bartzokis, G., Lu, P. H., Beckson, M., Rapoport, R., Grant, S., Wiseman, E. J., London, E. (2000) Abstinence from cocaine reduces high risk responses on a gambling task. Neuropsychopharmocology, 22, 102-103. Bechara, A., Damasio, H., Tranel, D., & Anderson, S. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50, 7-15 Bechara, A., & Damasio, H. (2002). Decision-making and addiction (part I): Impaired activation of somatic states in substance dependent individuals when pondering decisions with negative future consequences. Neuropsychologia, 40, 1675-1689. Bechara, A., Dolan, S., & Hindes, A. (2002). Decision- making and addiction (part II): Myopia for the future or hypersensitivity to reward. Neuropsychologia, 40(10), 16901705. Becker, G. S., & Murphy, K. M. (1988) A theory of rational addiction. Journal of Political Economy, 96, 675-700. Berke, J. D. & Hyman, S. E. (2000) Addiction, dopamine, and the molecular mechanisms of memory. Neuron, 25, 515-532.

Models of Substance Abuse 35 Bickel, W. K. & Marsch, L. A. (2001) Toward a behavioral economic understanding of drug dependence: Delay discounting processes. Addiction, 96, 73-86. Brown, J., Bullock, D., and Grossberg, S. (1999) How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. Journal of Neuroscience, 19, 10502-10511. Busemeyer, J. R., & Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task. Psychological Assessment, 14, 253-262. Busemeyer, J. R., Townsend, J. T., & Stout, J. C. (2002) Motivational Underpinnings of Utility in Decision Making: Decision Field Theory Analysis of Deprivation and Satiation. To appear in S. Moore (Ed.) Emotional Cognition. Amsterdam: John Benjamins Busemeyer, J. R., Weg, E., Barkan, R., Li, X., & Ma, Z. (2000) Dynamic and consequential consistency of choices between paths of decision trees. Journal of Experimental Psychology: General, 129, 530-545. Carter, J.R., & Neufeld, R. W. J. (1999). Cognitive processing of multidimensional processing in schizophrenia: Formal modeling of judgment speed and content. Journal of Abnormal Psychology, 108, 633-654. Clemen, R. (1996) Making Hard Decisions. Duxbury Press. Cubitt, R., Starmer, C., & Sugden, R. (2002) Dynamic decisions under uncertainty: Some recent evidence from economics and psychology.

Models of Substance Abuse 36 Finn, P. (2002) Motivation, working memory, and decision making: A cognitive – motivational theory of personality vulnerability to alcoholism. Brain and Cognitive Neuroscience Reviews, 1, 183-205. Grant, S., Contoreggi, C., & London, E. D. (2000). Drug abusers show impaired performance in a laboratory test of decision making. Neuropsychologia, 38(8), 11801187. Green, D. M. & Swets, J. A. (1966) Signal detection theory and psychophysics. NY: Wiley Holroyd, C. B & Coles, M. G. (2002) The neural basis of human error processing: Reinforcement learning, dopamine, and error-related negativity. Psychological Review, 109, 679-709. Houk, J. C., Davis, J. L., & Beiser, D. G. (1995) Models of information processing in the basal ganglia. Cambridge MA: MIT press. Houk, J. C., Adams, J. L., & Barto, A. G. (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In Houk, J. C., Davis, J. L., & Beiser, D. G. (Eds.) Models of information processing in the basal ganglia. Cambridge MA: MIT press. Ch. 13 (pp. 249-270). Hyman, S. E. & Malenka, R. C. (2001) Addiction and the brain: The neurobiology of compulsion and its persistence. Nature Review: Neuroscience, 2, 695703. Jentsch, J. D. & Taylor, J. R. (1999) Impulsivity resulting from frontostriatal dysfunction in drug abuse: implications for the control of behavior by reward – related stimuli. Psychopharmocology, 146, 373-390.

Models of Substance Abuse 37 Joel, D., Niv, Y., & Ruppin, E. (2002) Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535-547. Kanfer, F. & Karoly, P. (1972) Self-control: A behavioristic excursion into the lion's den. Behavior Therapy. Vol. 3(3), 389-416 Kirby, K. N. & Herrnstein, R. J. (1995) Preference reversals due to myopic discounting of delayed rewards. Psychological Science, 6, 83-89. Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin addicts have higher discount rates for delayed rewards than non-drug using controls. Journal of Experimental Psychology: General, 128, 78-87 Koob, G. F. (1998) Drug abuse: hedonic homeostatic dysregulation. Science, 278, 52-58. Lopes L L, & Oden G C. (1999) The role of aspiration level in risky choice: A comparison of cumulative prospect theory and SP/A theory. Journal of Mathematical Psychology, 43: 286-313. Maddox, W T; Filoteo, J V (2001) Striatal contributions to category learning: Quantitative modeling of simple linear and complex nonlinear rule learning in patients with Parkinson's disease Journal of the International Neuropsychological Society. 7, 710-727. Marr, D. (1982) Vision. San Francisco: Freeman. Mazas, C. A., Finn, P. R., & Steinmetz, J. E. (2000). Decision making biases, antisocial personality, and early-onset alcoholism. Alcoholism: Clinical & Experimental Research, 24, 1036-1040.

Models of Substance Abuse 38 Monterosso, J., Erhman, R., Napier, K. L., O’Brien, C. P., & Childress, A. R. (2001) Three decision making tasks in cocaine dependent patients: Do they measure the same construct? Addiction, 96, 1825-1837. Nosofsky, R.M., & Zaki, S.R. (1998). Dissociations between categorization and recognition in amnesic and normal individuals: An exemplar-based interpretation. Psychological Science, 9, 247-255. Petry, N. M., Bickel, W. K., & Arnett, M. (1998). Shortened time horizons and insensitivity to future consequences in heroin addicts. Addiction, 93(5), 729-738 Ratcliff, R., Thaper, A., & McKoon, G. (2001) The effects of aging on reaction time in a signal detection task Psychology & Aging. Vol 16, 323-341. Rescorla, R. A. & Wagner, A. R. (1972) A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.) Classical Conditioning II: Current research and theory. (pp. 64-99) NY: Appelton-Century-Crofts. Riefer, D. M; Knapp, B. R; Batchelder, W. H; Bamber, D.; Manifold, V (2002) Cognitive psychometrics: Assessing storage and retrieval deficits in special populations with multinomial processing tree models. Psychological Assessment. Vol 14, 184-201 Robbins, T. W., & Everitt, B. J. (1996) Neurobehavioral mechanisms of reward and motivation. Current Opinion in Neurobiology, 6, 228-236. Robinson, T. E. & Berridge, K .C. (1993) The neural basis of craving: an incentive – sensitization theory of addiction. Brain Research Review, 18, 247-291. Robinson, T. E. & Berridge, K. C. (2000) The psychology and neurobiology of addiction: the incentive sensitization view. Addiction, 95, (S 2), S91-S119.

Models of Substance Abuse 39 Rogers, R. D., Everitt, B. J., Baldacchino, A., Blackshaw, A. J., Swainson, R., Wynne, K., Baker, N. B., Hunter, J., Carthy, E., London, M., Deakin, J. F. W., Sahakian, B. J., & Robbins, T. W. (1999). Dissociable deficits in the decision- making cognitive of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: Evidence for monoaminergic mechanisms. Neuropsychopharmacology, 20, 322-339 Schultz, W. (1997) Dopamine neurons and their role in reward mechanisms. Current Opinion in Neurobiology, 7, 191-197. Skog, O. J. (2000) Addict’s choice. Addiction, 95, 1309-1314. Stout, J. C., Rodawalt, W. C., & Siemers, E. R. (2001). Risky decision making in Huntington's disease. Journal of the International Neuropsychological Society, 7, 92-101 Stout, J. C., Busemeyer, J. R., Lin, A., Grant, S. R., Bonson, K. R. (2002) Cognitive modeling analysis of the decision- making processes used by cocaine abusers. Submitted to Psychological Science. Stout, J. C., Busemeyer, J. R., & Bechara, A. (2002) Cognitive modeling of performance on a gambling task: A comparison of obital prefrontal cortex lesioned individuals, individuals with Huntington Disease, and cocaine abusers. Paper in progress. Sutton, R. S. & Barto, A. G. (1999) Reinforcement learning: An introduction. Cambridge, MA: MIT press. Tiffany, S. (1990) A cognitive model of drug urges and drug use behavior: role of automatic and non-automatic behavior. Psychological Review, 97, 147-168. Tiffany, S. & Conklin, C. A. (2000) A cognitive processing model of alcohol craving and compulsive alcohol use. Addiction, 95, S145-S153.

Models of Substance Abuse 40 Thaler, R. (1981) Some empirical evidence on dynamic inconsistency. Economic Letters, 8, 201-207. Townsend, J. T. (1984) Uncovering mental processes with factorial experiments. Journal of Mathematical Psychology, 28, 363-400. Tversky, A. & Kahneman, D. (1991) Loss aversion in riskless choice: A reference dependent model. Quarterly Journal of Economics, 107, 1039-1061. West, R. (2001) Theories of addiction. Addiction, 96, 3-13.

Models of Substance Abuse 41

Author Notes This research was supported by NIDA grant DA-R01 014119. Requests for reprints should be directed to Jerome R. Busemeyer at [email protected].

Suggest Documents