Power Function Forgetting Curves as an Emergent Property of ...

INTERNATION AL JOU RN AL OF PSYC HO LOGY, 19 99, 34 (5/6), 460± 464

Pow er Function Forgetting Curves as an Em ergent Property of Biologically Plausible Neural Netw ork Models Sverk er Sik stroÈ m U niversity of Toronto, Canad a

E mpirical forgetting curve data have been shown to follow a power function . In contrast, many connectionis t models predict either an exponential decay or ¯ at forgetting curves. This paper simulates power functio n forgetting curves in a Hop® eld network modi® ed to incorpo rate the more biologically realistic assu mption s of bounde d weights and a distribution of learning rates. The modi® ed model produces power functio n forgetting curves. The bounde d weights introduce exponential decay for individu al weights, and a power functio n forgetting curve when sum ming exponential decays with differen t learning rates. Because these assu mption s are biologically reasonable, power functio n forgetting curves may be an emergent property of biological networks. The results ® t empirical data and indicate that forgetting curves restrict possible implementation of models of memory. Il a eÂ teÂ deÂ montreÂ que les donn eÂ es associeÂ es aÁ la courbe d’ oubli suivent une fonctio n de puissance. Par ailleurs, plusieu rs modeÁ les connexionnistes preÂ disent soit un esto mpage exponentiel, soit des courbes d’ oubli aplaties. Cet article simule les courbes d’ oubli suivant une fonctio n de puissance dans un reÂ seau Hop® eld modi® eÂ a® n d’ incorporer les hypoth eÁ ses les plus reÂ alistes possibles au plan biologique. Le modeÁ le modi® eÂ produit des fonction s de puissance co mme courbes d’ oubli. Les coef® cients limites introdu isent un esto mpage exponentiel pour les coef® cient individu els et une fonctio n de puissance lorsque l’ on fait la somme de ceux-ci avec des ryth mes d’ apprentissage diffeÂ rents. Parce que ces postulats sont raisonnables au plan biologiq ue, les courbes d’ oubli en fonction s de puissance pourraient eÃ tre une proprieÂ teÂ eÂ mergente des reÂ seaux biologiq ues. Les reÂ sultats concordent avec les donneÂ es empirique s et indique nt que les courbes d’ oubli restreignent les modeÁ les de meÂ moire possible.

L aboratory stud ies o f recognition and autobiog raphica l data show a lin ear relation ship betw een the logarith m of a measurement of me mory (e.g. d 9 ) and the logarith m of the time since the items w ere encoded, indicating a pow er fun ction forgetting curve. Ru bin and Wenzel (1996 ) gathered a database of 210 published data sets of forgettin g curves, which ha d ® ve or mo re d atapoints, were s mooth , and to which at lea st one functio n could be ® tted w ith a correlation coef® cient of .90. T hey ® tted the database w ith 105 different function s an d fo und that the pow er fun ction accounted for more variance th an the exponential fun ction, or several other function s. H owever, on e mp irical grounds th ey found it dif® cult to distinguish betw een th e power functio n and three other function s, namely the logarith m, th e expon ential in the square root of time, a nd th e hyperbola in the squa re root of time. In accordance, this paper makes no stron g claim whether the p ower functio n or o ne of the o ther three fun ctions suggested by Rub in and Wenzel is the true function on e mp irical grounds.

C rovitz and Schiffman (1974 ) suggested that a memory measurement (M ) cou ld be su m marized by a power function of time (t): M 5

b

2 t a

(1)

where a and b are positive consta nts. It follows fro m E quation (1) that the logarith m of a me mory measurement [log(M )] is a linear relation ship w ith the log arith m of the ti me p assed since encoding. In autobiograp hical memory, M is the number o f memories p er tim e unit. In laboratory studies M is d 9 , which is a measurement proportio nal to th e underlying trace stren gth. To achieve a high degree o f accounted varian ce a large range o f perfo rman ce seems to b e n ecessary. For example, the pow er function in a utobiographical mem ory, where the range typica lly is th ree or fo ur magnitud es, has b een found to account for .95 to 1.00 of the variance. A nderson an d Tweney (1997) argued that the experimental power fu nction curves may be a n artefa ct due to averaging over subjects. H ow ever, W ixted a nd Ebbesen (1997 ) showed that the pow er function also ® ts better than the expo-

Requests for reprints should be address ed to Sverke r SikstroÈ m, PhD, Depart ment of Psychology, U niversit y of Toronto , 100 St Georg e Street, Toronto, Ontario, Ca nada M5 S 3G 3 (Fax: 1 1 1 416 97 8 4811; Tel: 1 1 1 416 978 4518; E-mail: sverker@ psych.u toronto.c a; Ho mepage: http://ww w.psych.u toronto.c a/~sverker/sikstroÅ m.ht ml).

q

1999 Internationa l U nion of Psychological Science

POW ER FUN CTION FORG ETTIN G CU RVES

nential functio n when data fro m the individual subjects are ® tted . T his indicates th at the power functio n is n ot an artefa ct due to averagin g over subjects. T he purpose of th is paper is to show that a H op® eld network modi® ed in two aspects show s power function fo rgetting curves. T he mo di® catio ns are bound ed w eights and a va riance in the distribution of learning rates. T he bo unded weights make the weigh ts d ecay exponentially. Power function fo rgetting curves are found when the exp onential decays are sum med over a distribution of learning rates. Bo unded weights and a distribution of learning rates are biologically reasonable assu mption s. T herefore, it is a rgu ed that a p ower fun ction forgetting curve may be an emergent property of a biologica l neural network. T he model is consisten t w ith TE C O (SikstroÈ m, 1996 a , b, 1998 ) that has been applied to a wide set of me mory pheno mena (a full sum mary of T EC O is beyo nd the scope of th e present paper).

A MODEL FOR POWER FUNCTION FORGETTING CURVES F irst, the H op® eld n etwork (H op® eld, 1982 , 198 4 ) is described. A ctivated nodes correspond to features in the represented infor mation. Ite ms a re represented in patterns of activation. Ea ch pattern consists of N nodes. t T he activation of no de i at time t ( x i ) can be in one of two states. T he active state is represented as 1 1 and the ina ctive state as 0. T he probability that a node is active is a (0 < a < 1). E ach pattern is created by rando mly setting exa ctly a N nodes to an active state and th e other nod es to an inactive state. A ll nodes are connected to all oth er nodes in the network an d a weight is atta ched to each conn ectio n. t T he weight betw een n ode i and n ode j at time t is w ij . t T he w eigh t change ( D w ij ) for i 5 j is zero, and for i ¹ j is calcu lated by:

D

w

t

1

5

ij

(x

N

t

i

- a )( x

t

j

- a)

(2)

A stan dard H op® eld network p roduces ¯ at forgetting curves b ecause the probability of retrieval is independent of when the patterns are stored. To account fo r forgetting curves it is sugg ested that the weig hts should be b ounded and that the learning rates shou ld be different for each weigh t ( h ij ). Th e boun daries are created by setting the weigh ts to the maxi mu m boundary (b) if the weight is above b. If the weight is below the minimu m b oundary (-b) then the weight is set to -b: w

t1

1

ij

5

t

i

S 5

a

ij

1 2

w ij

x

t

(3)

j

T he encoded p attern can then be retrieved by synchrono usly activating the nod es to 1 1 if the arg u ment is po sitive and o therwise to 0 .

t

h 1

ij

ij

D

t

w ij , b ], -b ]

(4)

5

1 2

M in

M in

[

h

ij

[

h

D

ij

w

,1

2b

2[a(1 2

a) ]

]5

2

,1

bN

]

(5)

where 0 < a ij < 1. L et t rep resent the lag, or the encoded items between encoding and retrieval. G iven a constant number of ite ms encoded at each time period then t can be rega rded as the ti me betw een encoding and retrieval. Perfor ma nce over time for a sing le w eight can be written as an exponential fun ction of time. T he perfo r ma nce over time (d 9 (t)) for the w hole network can then be w ritten a s the average of the expon ential fu nctions with different decay p ara meters: d 9 (t) 5

1

d 9 (0)

d 9 (0)

N

j= 1

M ax[ M in[ w

where M in[ ] takes th e minimu m o f the two argu ments and M ax[ ] ta kes the maxi mu m of the two argu ments. T he slope of the fo rgetting curves are zero as long as the weight does not reach the b oundary. W hen the bo undary is reached the slope of the forgetting curves beco me n egative because the bou ndary interferes w ith the weight chang es. Forgetting due to the bo undary at time (t) is equal to the probability that the w eights ``bu mp’ ’ into the bou ndary. This is equ al to the expected 1 value o f the absolute w eigh t change ( h ij D w) divided by the distance between the low and th e high bo undaries (2b). T he p robability that the bo undary is not reached ( a ij ) is then:

Retrieval fro m the netwo rk is conducted by p resentin g a pattern to the network. T he retrieved pattern can be fo und by calculating the net input for each no de: net

461

N

1 N

2

2

N

N

i= 1

j= 1

S S

N

N

i= 1

j= 1

S S

e

ln ( a

a

ij

t/2 ij

) t/2

5

(6)

where d 9 (0 ) is the d 9 at time 0 . T hus, it is predicted that bo unded w eigh ts yield an exponential forgetting curve. T his network w ill show th e slowest possible forgetting curve (co mpletely ¯ at) if the learn ing rate (in relatio n to 2

The expected absolute weight change ( D w) is the absolute weight change (abs( D w i )) times the probabili ty for each weight change (p i ) su mmed over the four possibl e co mbinations of weight changes: 1

D

w = = 2

S

4

i= 1

1 N

(abs( D w i ) p i ) 5 [2 a (1 - a)]

2

1 N

[abs((0 - a)(0 - a))(1 - a)

2

1

abs((0 - a)(1 - a))(1 - a)a 1

abs((1 - a)(0 - a))a(1 - a) 1

2

abs((1 - a)(1 - a))a ] (9)

The factor one half in the exponen t is included because the weight changes are dependent. The standard deviation of the depende nt mean weight changes is equal to the square roo t of the expected value of the independent weight changes.

462

SIK STROÈ M

the boun dary) is so slow that the bound ary is n ever reach ed. The network can also show the fastest possible forgetting curve if the learning rate is so large that only the last ite m is stored in the weights. Inter mediate fast forgetting curves can be fo und by u sin g the learning rates that are in b etween the fastest and the slowest possible. It should therefore be possible to ® n d a distribution of weight changes that show s a pow er functio n forgettin g curve by co mbining slow an d fast forgetting rates in a suitable way. Th e question is what distributio n of learning rates yields a pow er function forgetting curve for th e su m of expo nential. M athe matically, th is is not an easy qu estion . B y using L aplace transfor mations, N ew ell and Rosenbloo m (1981 ) argued that a rectan gular distribution of expo nential results in an aggregate pow er functio n. H ow ever, mo re recently Kahan a (personal com mun ication , 6 June 1998 ) argued that it can be shown mathe matically that any smooth probab ility distribu tion of d ecay parameters ( a ij ) in expon ential functions yields an agg regated pow er function. This is also w hat is fou nd in the simulations below.

SIM ULAT ION A simu latio n was run to study how forgetting curves depend on boun ded weights and the distribution of h ij . T he following settings were used: the number of encoded patte rns (p) 64, the number of n odes (N ) 60 , the activation level (a) 0.2, and the bo undary (b ) 0.000 6 7. Initially all weights w ere set to a zero. First, 64 to-be-en coded patterns and 64 lure patterns were created. T hen each o f th e 64 patterns (p ) was encod ed once in a temporal o rder. A ll weights in the network were changed usin g the sa me learn ing rule (as speci® ed earlier). T he 1, 2, 4, 8, 16, 32, a nd 64 latest encod ed p atterns w ere retrieved. N o lea rning occurred during retrieval. Each simulation w as rep eated 500 times. Th e fa miliarity ( m ) of a retrieved pattern w as calcut lated by the dot product betw een the net inpu t (net i ) a nd the activatio n of the encoded pattern scaled so that the t expected value is zero ( x i - a) sum med over the N number of nod es:

m 5

S

N

net i ( x t

i= 1

t

2

i

a)

(7)

T he results are presented as d 9 calcu lated fro m the fa miliarity of the targets ( m t ), th e fa miliarity of the lure ( m d ) and the stan dard deviation of fa miliarity o f the lure ( s d ): d9 5

m

t

s

m

2

d

(8)

d

T he learning rate ( h ij ) was varied a s follows. M o st p rominent me mory mod els use a constant learning rate, a nd several models have unbo unded w eigh ts (e.g. C H AR M , M etcalfe, 1991) , whereas some are boun ded (co ntext to items association in C happell and H u mphrey s’ model, 1994) . T he effect of constant w eight chang es w as simulated in M odel 1A , 1B, and 1C . In M odel 1A the weight

change was set to a large va lue so on ly the last ite m can be recalled fro m the netwo rk ( h ij 5 2). In M odel 1C the weight change was set so slow that the bo undaries can not be reached, i.e. the weight change w as p ractically unbo unded ( h ij 5 0.008) . In M odel 1B the learn ing rate was set to an arbitrarily cho sen inter mediate level ( h ij 5 0.04). In M odel 1D, 1E , and 1F th e learning rates were different for each connectio n. T he distribution was set to a linear distribution (M odel 1D, h ij 5 h ), an exponential distribu tion [M od el 1E, h ij 5 exp( h )], and a power fu nction (M o del 1F, h ij 5 h -1 ), w here h is a rando m variable w ith a rectangular distribution, bound ed so that h ij falls betw een 1 an d 0.008 . The learn ing rates were updated for each subject. H owever, the learning rate has to be constant during the simulation of each subject to p roduce app ropriate forgetting curves.

RESULTS T he results fro m the simulation s are presented in F ig. 1a and 1 b. It w as predicted that the bou nded w eights shou ld yield exponential forgetting curves, which is evident by a linear curve on log-linear plot. T his was also found for Simulation s 1A , 1B , and 1C (F ig. 1a). M odel 1B used an inter mediate b oundary so that several ite ms could be stored. T he explained variance on exponential functio n is 1.000 . T he predicted slop e according to E quation 6 wa s a = 0.94, and th e simulated slo pe w as a = 0.93 , indicating a reasonably go od ® t between pred icted an d simulated slo pes. T he forgetting w as faster th an a pow er 2 function. T he explained variance R on a pow er functio n wa s 0.78. M od el 1C, where the w eights w ere unbounded (i.e. due to a very low learn ing rate) show s a ¯ at ``fo rgettin g’ ’ curve independent of the time of learning. Th is forgetting curve is predicted fro m th e M atrix M odel (H u mphreys, B ain, & Pike, 1989 ) and M IN E RVA II (H intzman , 1987) , among others. M odel 1A w ith max ima lly bou nded weights show s a fo rgettin g curve where only the last encoded ite m can be retrieved (notice that the logarithmic curve is cut off so that very low d 9 are no t displayed in th e ® gure). T hus, th e models with bound ed w eigh ts and a constant learning rate show exponential forgetting curves. In M odel 1D to 1F, the learning rates were different fo r each w eig ht and distribu ted as linear, exponential, and power fun ctions. T he results a re show n in F ig. 1b. A good ® t w ith a pow er fun ctio n is in this graph represented by a linear curve w ith a negative slope. The results ® t a lin ear relatio nship o n the log-log scale well (M o del 2 2 2 1D R 5 0.9 93, M o del 1 E R 5 0.997 , M odel 1F R 5 0.993) . T hus, these models are consistent w ith pow er function forgetting curves often foun d emp irically in long-ter m memo ry. To su m marize, power fun ction forgetting curves were fo und using bo unded w eights and several different distribution s of learning rates. M odels w ith unbo unded

POW ER FUN CTION FORG ETTIN G CU RVES

463

(a) 2. 5 2 1. 5 1 ln ( d’)

0. 5 0 -0.5 0 -1

10

20

30

40

50

60

70

1 A F a ste s t

-1.5

1B In te rme diate

-2

1 C S lo w e s t

-2.5

#

(b) 3.00

2.00

ln ( d’)

1.00

0.00 1

2

4

8

16

32

64

#

-1.00 1D linear -2.00

-3.00

1E exp 1F power

FIG. 1. (a) Simulated data for a constant distributio n of learning rates: Th e y-axis shows the log e (ln) of d 9 and the x-axis the number of items encoded before the item w as retrieved on a linear scale (i.e. time). M odel 1A shows the results for the fastest possible learning rate, M odel 1B the results for an in termediate (i.e. h i j = 0.1 learning rate, and M odel 1C the results for the slowest possible learnin g rate. (b) Simulated data using three different distributio ns of learning rates: Th e y-axis shows the lo g e of d 9 and the x-axis the number of items encoded before the item was retrieved (i.e. time) on a log 2 scale. The distributions are linear (1D), exponential (1E), and power (1F).

weights o r a constant learning rate did not show a pow er function forgettin g curve.

DISCUSSI ON A modi® ed H op® eld mo del was proposed to accou nt for the p ower fun ctio n fo rgetting curves by assu ming bo unded weig hts and a distribution o f learning rates. T he bounded weights introduce expo nential d ecays for individu al weights. Su m ming expo nential decays w ith different learning rates (or decay pa rameters) yields a power function forgettin g curve consistent w ith empirical data.

A n i mp ortant aspect o f the mo del is that the assumptions a re biolog ically plausible. Weigh ts may be conceived as synaptic p lasticity. It is likely that synap tic plasticity in biological cells is boun ded. T he assu mption of a positive distribution of learning rates in the present model is also a plausible assu mption in neurological system. S ince th ese assu mptio ns are likely to be true in biological neural networks, it is also reason able to conclude that a pow er function forgettin g curve may be an e mergent p roperty of b iological networks. However, neurological data show ing unbou nded synaptic plasticity, or no distribu tion of synap tic plasticity, may po tentially falsify the p resen t theory. T he a utho r is unaware of exp licit neurologica l d ata on the distribution o f time

464

SIK STROÈ M

constants for synaptic plasticity, or bounda ries for synaptic plasticity. M ost pro minent me mory models (e.g. th e M atrix M o del, Hu mphrey s et al., 1989 ; the auto-a ssociator in the mod el o f C happell & Hu mphrey s, 1994 ; M IN E RVA , H intz man, 1987 ; C HA R M , M etcalfe, 1991 ) simply a dd the contribution of a n ewly encoded ite m to th e me mory vecto r and use unboun ded ``weights’ ’ w ith a constant learn ing rate. T hese models do not differentiate between the time of encoding, so that all encoded ite ms have the sa me expected probability o f retrieval independently of when they were encod ed. Th ese models p redict a co mpletely ¯ at ``forgettin g’ ’ curve. N one of the existing models referred to here have different learning rates in th eir current implementation. Ch appell an d H u mphreys’ (1994 ) model is the o nly model referred to in th is paper that uses weight b oundaries. T his model h as bou nded weights in the connections from th e representation of context to the ite ms whereas the ite m to item connections are unboun ded. It may be p ossible to modify other models by introdu cing weig ht bo undaries and a distribution of the lea rning rates. A lthough the present p aper has dealt with long -ter m me mory, the theory may be exten ded to short-ter m me mory an d serial position effects. For example, the recency effect may simp ly be a special case of the theory, whereas the primacy effect may be modelled by changing the lea rning according to the novelty of the encoding context.

REFERENCES A lbert, M .S., Butters, N., & Levin, J. (1979). Temporal gradients in the retrograde amnesia of patients with alcoholic Korsakoff ’s disease. Archives of N eurology, 36, 211± 216. A nderson, R.B., & Tweney, R.D. (1997). A rtifactual power curves in forgetting. M emory and Cognition, 25(5), 724± 730. Chappell, M ., & Hu mphreys, M .S. (1994). An auto-associative neural network for sparse representations : A nalysis and application to models of recognition and cued recall. Psychological Review, 101(1), 103± 128.

Crovitz, H .F., & Schiffm an, H. (1974). Frequency of episodic memories as a function of their age. Bulletin of the Psychonomic Society, 4, 517± 518. Hintzman, D.L. (1987). Recognition and recall in M IN ERVA 2: Analysis of the ``recognition failure’ ’ paradigm. In P. M orris (Ed.), M odelling cognition (pp. 215± 229). London : W iley. Hintzman, D.L. (1988). Judgement of frequency and recognition memory in a multiple trace memory model. Psychological Review, 95, 528± 551. Hop® eld, J.J. (1982). N eural networks and physical systems with emergent co mputation al abilities. Proceeding of the National Academy of Sciences, USA, 81, 3088± 3092. Hop® eld, J.J. (1984). Neurons with graded responses have collective co mputational abilities. Proceedings of the Na tional Academy of Science U SA, 81, 3008± 3092. Hu mphreys, M .S., Bain, J.D., & Pike, R. (1989). Different way to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208± 233. M etcalfe, J. (1991). Recognition failure and the co mposite memory trace in CH AR M . Psychological Review, 98, 529± 553. M urdock, B.B. (1993). TO DAM 2: A model for the storage and retrieval of item, associative, and serial-order info r mation. Psychological Review, 100, 183± 203. Newell, A., & Rosenbloo m, P.S. (1981). M echanism of skill acquisition and the law of practice. In J.R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1± 55). Hillsdale, NJ: Lawrence Erlbau m Associates Inc. Rubin , D.C., & Wenzel, A.E. (1996). O ne hundred years of forgetting: A quantitative description . Psychological Review, 103, 734± 760. SikstroÈ m, P.S. (1996a). The TECO connectionis t theory of recognition failure. European Journal of Cognitive Psychology, 8, 341± 380. SikstroÈ m, P.S. (1996b). TECO : A connectionist theory of successive episodic tests. D octoral Thesis, U meaÊ University. SikstroÈ m, P.S. (1998). A connectionist model for novelty and familiar ity in episodic me mory. M anuscript sub mitted for publication. W ixted, J.T., & Ebbesen, E.B. (1997). G enuin e power curves in forgetting: A quantitative analysis of individu al subject forgettin g functio ns. M emory and Cognition, 25(5), 731± 739

Power Function Forgetting Curves as an Emergent Property of ...

Power Function Forgetting Curves as an Emergent Property of ...

Suggest Documents

Connectivity as an Emergent Property of Geomorphic

Melodic Accent as an Emergent Property of Tonal Motion

Resilience as an emergent property of human-infrastructure ... - Plos

Chemotaxis as an Emergent Property of a Swarm - PubMed Central ...

Dissociated Overt and Covert Recognition as an Emergent Property of ...

Microbial Virulence as an Emergent Property - Semantic Scholar

System Safety as an Emergent Property in Composite Systems

Transcriptional synergy as an emergent property defining cell ... - Nature

Microbial Virulence as an Emergent Property - Semantic Scholar

Consciousness as the Emergent Property of the ... - Google Sites

Is Matter an emergent property of Space-Time? - Google Sites

Rainfall threshold for hillslope outflow: an emergent property of flow ...

Frequency Distribution Curves as an Indicator of

Cosmic Time as an Emergent Property of Cosmic ...www.researchgate.net › publication › fulltext › Cosmic-Ti

1 Natural Selection as an Emergent Process

Intelligence as an Emergent Phenomenon - CiteSeerX

Leadership As A Function Of Power

Einstein Gravity as an emergent phenomenon?

Increased Firing Irregularity as an Emergent

Retrieval-induced forgetting in an - Bjork Learning and Forgetting Lab

Syntax as an Emergent Characteristic of the Evolution of Semantic ...

Syntax as an Emergent Characteristic of the Evolution of Semantic

An Emergent Tuning as an Organizational Molecular Mode ... - Inflexions

A property of the derivative of an entire function - CiteSeerX