A New Entorhinal Cortex Model - Google Sites

1 downloads 166 Views 1MB Size Report
Abstracts of the Society for Neuroscience Annual Meeting (Miami, FL) 25 ... For the source code, please visit http://ale
A New Entorhinal Cortex Model Alessandro D. Gagliardi ABSTRACT We propose a revision to Myers et al. (1995) model of the entorhinal cortex. We have retained the original function of the model while modifying the implementation. Both implementations perform the function of redundancy compression. Redundancy compression appears necessary in a number of conditioning paradigms including sensory preconditioning and compound preexposure. Our new implementation of the redundancy compression model differs from the Myers et al. (1995) model most significantly in that it simulates a refractory period in order to differentiate between tonic (contextual) stimuli and phasic (conditional) stimuli. The refractory period dampens the excitation of the tonic stimuli so that phasic stimuli appear more salient. This allows the system to present a unique representation of a phasic stimulus configuration to other systems (such as the hippocampus and ultimately the neo-cortex). Future work will involve the development of a model that differentiates the predictive qualities of the compressed representation provided by the entorhinal cortex simulation. Medicine has long known that the hippocampal region plays a critical role in learning and memory (Scoville and Milner, 1957; Mishkin, 1982). Researchers have developed numerous theoretical models for how the hippocampal region contributes to learning and memory (Gluck and Myers, 1993) as well as how various anatomical structures within the hippocampal region contribute functionality to the system as a whole (Myers et al., 1995; Myers, 1996; Lisman and Otmakhova, 2001; Hasselmo et al., 2002; for a review see Gluck and Myers, 2001). The model presented here fits into a larger theory of how the hippocampal region mediates the representation of stimuli in learning and memory (Gluck and Myers, 1993, 2001). Figure 1 shows how the distinct anatomical structures within the hippocampal region connect to one another. Here we distinguish the hippocampal formation (which includes the dentate gyrus, CA1 through CA4, and the subiculum) from the hippocampal region (which includes the perirhinal, parahippocampal and entorhinal cortices as well as everything in the hippocampal formation). Sensory information passes from the limbic cortex through the perirhinal and parahippocampal cortices to the superficial layers of the entorhinal cortex. From the superficial layers of the entorhinal cortex, information travels through the dentate gyrus and the hippocampus proper (CA4 through CA1) to the subiculum. From the subiculum, the signal passes through the deep layers of the entorhinal cortex and back to the limbic cortex. We believe that each structure plays a distinct functional role in the processing of information. We have identified two fundamental steps in information processing necessary to account for the how the hippocampal region contributes to conditioning:

redundancy compression and predictive differentiation (Gluck and Myers, 1993, 2001). Redundancy compression occurs when the system compresses or “chunks” redundant stimuli into a single representation. In other words, if A and B always occur together, redundancy compression will have a single representation, X, that accounts for both A and B. Predictive differentiation occurs when the system differentiates a signal from noise so that it can predict another stimulus. Predictive differentiation requires “supervised” training. Redundancy compression does not. Empirical studies suggest that the superficial layers (II and III) of the entorhinal cortex may suffice for redundancy compression (Shohamy et al., 1999).

Figure 1. Schematic representation of the major information flow pathways in the hippocampal region. The entorhinal cortex receives sensory input via the perirhinal and parahippocampal cortices. The perforant pathway carries information processed by layer II of the entorhinal cortex to the dentate gyrus and hippocampus (CA3 and CA1). From the hippocampus, information passes through the subiculum through the deep layers (V and VI) of the entorhinal cortex and on to the association cortex and other parts of the brain.

Model Overview In order to isolate the functionality of the entorhinal cortex, we have constructed a model of the hippocampal region, focusing on the entorhinal cortex, wherein the

hippocampal formation (the dentate gyrus, CA3, CA1, and the subiculum) is functionally static. In neural network terms, that means that the weights on for the inputs to the hippocampal formation are fixed. Figure 2 shows how the inputs (shown here as nodes for stimuli A and B as well as a node representing “context”) stimulate both the entorhinal cortex and the association cortex. The entorhinal cortex builds distinct representations for each stimulus configuration through unsupervised training (see details below). The output of the entorhinal cortex is translated by the hippocampal formation into a training signal for the association cortex. The association cortex, trained by the hippocampal formation (we have ignored the deep layers of the entorhinal cortex here), builds a novel representation of the stimuli for further training. The association cortex—based on the Rescorla-Wagner (1972) model—then “learns” which stimulus representations predict an unconditional stimulus. The output of the association cortex simulates a conditioned response. In order to simulate an animal without a functioning entorhinal cortex, we disable the training signal from the hippocampal formation to the association cortex. This way the entorhinal cortex has no effect on learning in the association cortex.

Figure 2. A functional representation of how the entorhinal cortex modulates data representation in the association cortex and other areas.. To simulate an animal without a functioning entorhinal cortex, we disable the training signal from the hippocampal formation to the association cortex.

Entorhinal Cortex Model Details For further details on the reasoning behind the design of this model, please refer to Myers et al. (1995). Here we will discuss the differences between the current model and the 1995 model. Figure 3 demonstrates the functioning of the new entorhinal network per se. The entorhinal network consists of multiple patches of nodes. In this simulation, we used five patches of 20 nodes each. Each input stimulates each entorhinal node with a weight between 0 and 1. At the beginning of the simulation, the average weight is about 0.06. In each trial, the network computes the most active node in each patch. This “winning” node represents that particular stimulus configuration and outputs 1 while the others output 0. The system strengthens the connections between the active inputs and that node and weakens the connections between the active inputs and all other nodes. During the initialization phase, we expose the network to a stimulus configuration representing “context” (Fig. 3a). These stimuli remain on throughout the experiment. As the simulation runs, the network remembers which node was maximally activated in the previous trial. In each trial, each patch dampens the input to the extent that the previous input stimulated the “winning” node. This dampening simulates a refractory period for that node. Each patch dampens or “masks” the input separately from each other patch. In order to maintain the association between an entorhinal node and a particular stimulus configuration, that node most strengthen its connections to that stimulus configuration while that stimulus is dampened. Figure 3b demonstrates this. The amount to which the stimulus configuration representing context stimulates the entorhinal network gradually drops while the weight of that stimulation increases. In this way, the contextual stimulus continues to activate the same nodes. In Figure 3c, we see what happens when a new stimulus configuration appears. The tonic stimulus representing the context continues to stimulate the entorhinal network, but they are dampened. The phasic stimulus representing the conditional stimulus can thus compete with the contextual stimulus. In some cases, a patch may not respond uniquely to the phasic stimulus if the weights for the tonic stimulus are particularly high and the initial (random) weights for the phasic stimulus are too low (Fig 3c: top patch). However, some patches will respond differently (Fig. 3c: bottom patch) and gradually build a strong relationship between the phasic stimulus and the entorhinal node it activates.

Figure 3. Entorhinal Network. (a) activity before initialization. (b) activity at the end of initialization (c) activity in response to conditional stimulus.

Simulation Results We tested this model in a number of conditioning paradigms to measure how the model would behave with or without a functioning entorhinal network. We trained it in the following paradigms: acquisition, discrimination, sensory preconditioning, and compound preexposure. Acquisition and discrimination are both simple conditioning paradigms that animals can easily perform with or without their entorhinal cortices. Sensory preconditioning and compound preexposure appear to require redundancy compression, as our model predicts. It remains to be seen whether or not animals can perform under these conditioning paradigms with a selective lesion to the entorhinal cortex. a)

b)

Figure 4. Acquisition. Average percent conditioned response over ten simulations with 100 trials of context followed by 900 trials of A+. (a) Entorhinal network does not train association network. (b) Entorhinal network trains association network.

The acquisition paradigm is the simplest conditioning paradigm. In it, we train the system to give a conditioned response to a conditional stimulus when paired with an unconditional stimulus. Figure 4 shows how the model responds to the acquisition paradigm with (Fig. 4b) or without (Fig. 4a) a functioning entorhinal cortex. Figure 4a represents the behavior of the association cortex with no help from the hippocampal region whatsoever. Figure 4b represents the behavior of the association cortex with training from the entorhinal cortex (mediated by a static hippocampal formation). Both simulations perform reasonably well, though training from the entorhinal cortex does significantly accelerate the rate at which the model reaches criterion (> 80%) response [t(18) = 3.94, p < 0.01]. The discrimination paradigm requires the system to discriminate between two conditional stimuli. We reinforce one with an unconditional stimulus (A+) but not the other (B-). Discrimination occurs when the system gives a conditional response (> 80%) to A+ and not (< 20%) to B-. Figure 5 shows the behavior of the model with (Fig. 5b) and without (Fig. 5a) a functioning entorhinal cortex under this paradigm. As with the

acquisition paradigm, the model demonstrates successful discrimination with or without a functional entorhinal cortex. a)

b)

Figure 5. Discrimination. Average percent conditioned response over ten simulations of discrimination task. First 100 trials: context only; last 1400 trials: A+B-. (a) Entorhinal network does not train association network. (b) Entorhinal network trains association network.

Sensory preconditioning (Thompson, 1972) and compound preexposure are more complex forms of conditioning and require building associations between the conditional stimuli that the association cortex cannot do on its own. In both paradigms, we compare behavior under the “exposure” condition to that under the “sit” condition. In both sensory preconditioning and compound preexposure, the “exposure” condition consists of numerous (400) trials of un-reinforced exposure to the compound stimulus: AB-. “Successful” training requires building an association between A and B. In contrast, the “sit” condition consists of the same amount of time spent in context with no (phasic) stimulation. In the sensory preconditioning paradigm, after exposure to the compound stimulus or to context alone, we train the system to respond to A+. Finally, we test the system’s response to B-. Figure 6 demonstrates the behavior of the system under this paradigm. The top figures (Figs. 6a and b) show the behavior in the exposure condition. The bottom figures (Figs. 6c and d) show behavior in the sit condition. The left figures (Figs. 6a and c) show behavior when the entorhinal cortex does not train the association cortex. The right figures (Figs. 6b and c) show the behavior when the entorhinal cortex does train the association cortex. We predict that under the exposure condition, with a functioning entorhinal cortex, the system will show a significant response to B-. Under the sit condition, the system should not respond to B-, nor should it respond to B- if we have disabled the entorhinal cortex. We compared 30 simulations in each condition and show a significantly higher response to B- in the exposure condition with a functioning entorhinal cortex to either simulations in the sit condition with a functioning entorhinal cortex [first response: t(58) = 5.56, p < 0.001; highest response: t(58) = 4.81, p < 0.001]

or the exposure condition without a functioning entorhinal cortex [first response: t(58) = 8.12, p < 0.001; highest response: t(58) = 12.77, p < 0.001]. For comparison, the difference between the exposure condition and the sit condition without a functioning entorhinal cortex was not significant [first response: t(58) = 1.46, p > 0.1; highest response: t(58) = 1.46, p > 0.1]. a)

b)

c)

d)

Figure 6. Sensory Preconditioning. Average percent conditioned response over 30 simulations. First 400 trials, either exposure or sit. Middle 400 trials, A+ exposure. Last 400 trials, B- test. (a) Exposure condition: first 400 trials, AB- exposure; entorhinal network does not train association network. (b) Exposure condition: first 400 trials, AB- exposure; entorhinal network trains association network. (c) Sit condition: first 400 trials, context only; entorhinal network does not train association network. (d) Sit condition: first 400 trials, context only; entorhinal network trains association network.

a)

b)

c)

d)

Figure 7. Compound preexposure. Average percent conditioned response over 30 simulations. First 400 trials, either exposure or sit. Last 2000 trials, alternating A+B-. (a) Exposure condition: first 400 trials, AB- exposure; entorhinal network does not train association network. (b) Exposure condition: first 400 trials, ABexposure; entorhinal network trains association network. (c) Sit condition: first 400 trials, context only; entorhinal network does not train association network. (d) Sit condition: first 400 trials, context only; entorhinal network trains association network.

The compound preexposure paradigm involves the same “exposure” and “sit” conditions as described in the sensory preconditioning paradigm. The only difference is that instead of separate blocks for A+ and B- exposure, we interleave A+ and B- as per the discrimination paradigm. The astute reader will notice that the only difference between the discrimination paradigm and the sit condition of the compound preexposure paradigm is that in the latter, the system has a longer exposure to context alone before receiving training on A+B-. Figure 7 demonstrates the behavior of our model under this

paradigm. As before, the top figures (Figs. 7a and b) correspond to the exposure condition while the bottom figures (Figs. 7c and d) correspond to the sit condition. The left figures (Figs. 7a and c) correspond to no training from the entorhinal cortex while the right figures (Figs. 7b and d) correspond to behavior with training from the entorhinal cortex. We ran 30 simulations under each condition (120 in all). Without the entorhinal cortex, the model ceased responding to B- (< 20% CR) after 20 trials in the sit condition (Fig. 7c) and after 76 trials in the exposure condition (Fig. 7a) on average. With the entorhinal cortex, the model ceased responding to B- after 431 trials in the sit condition (Fig. 7d) on average; but in the exposure condition, the average response remained above 74% after 2000 A+B- trials.

Discussion Gluck and Myers (1993, 2001) identified two key functions of the hippocampal region in conditioning studies: redundancy compression and predictive differentiation. Our model of the entorhinal cortex simulates redundancy compression. Thus, by way of elimination, we hypothesize that the hippocampal formation performs predictive differentiation. The next step will be to model that process. We believe we can do this by adapting Myers’ (1996) model of the dentate gyrus to respond to neuromodulators such as dopamine and acetylcholine. In primates, Schulz et al. (1993) noted that “dopamine neurons respond phasically to alerting external stimuli with behavioral significance whose detection is crucial for learning and performing delayed response tasks.” Research has shown that dopamine facilitates plasticity in the dentate gyrus, (Kusuki et al., 1997; Kulla and ManahanVaughan, 2000). Similarly, research has demonstrated a link between acetylcholine and memory formation such that cholinergic antagonists such as scopolamine inhibit memory encoding (but not retrieval) (Ghoneim and Mewaldt, 1975). In particular, acetylcholine also seems to play a critical role in modulating the plasticity of neurons in the dentate gyrus (and other areas of the hippocampal region) (Hasselmo, 1995). The existing (Myers, 1996) dentate gyrus model effectively differentiates information. By modulating the learning rate of the dentate gyrus in the presence of novel (unpredicted) stimuli, the model should differentiate predictive (conditional) stimuli more than non-predictive stimuli. Should this prove to be the case, that would suggest that the two key functions previously identified, redundancy compression and predictive differentiation, can be performed entirely outside of the hippocampus proper (CA4 through CA1). It remains to be seen whether or not these predictions will hold in empirical studies. However, if they do, this would suggest a significant reinterpretation of early studies of the hippocampal region. Functions previously believed to require the hippocampus proper may not depend upon the hippocampus at all.

References Ambros-Ingerson J, Granger R, Lynch G (1990) Simulation of paleocortex performs hierarchical clustering. Science 247:1344-1348 Ghoneim M, Mewaldt S (1977) Studies on human memory: The interactions of diazepam, scopolamine and physostigmine. Psychopharmacology 52:1-6. Gluck MA, Myers CE (2001) Gateway to memory. Cambridge, MA: MIT Press. Granger R, Ambros-Ingerson J, Lynch G (1989) Derivation of encoding characteristics of layer II cerebral cortex. Journal of Cognitive Neuroscience 1:61-87 Hasselmo ME (1995) Neuromodulation and cortical function: Modeling the physiological bassis of behavior. Behavioural Brain Research 67:1-27. Hasselmo ME, Bodelón C, Wyble BP (2002) A proposed function for hippocampal theta rhythm: separate phases of encoding and retrieval enhance reversal of prior learning. Neural Computation 14:793-817. Kulla A, Manahan-Vaughan D (2000) Depotentiation in the dentate gyrus of freely moving rats is modulated by D1/D5 dopamine receptors. Cereb Cortex 10:614-620 Kusuki T, Imahori Y, Ueda S, Inokuchi K (1997) Dopaminergic modulation of LTP induction in the dentate gyrus of intact brain. Neuroreport 8:2037-2040. Lisman JE, Otmakhova NA (2001) Storage, recall, and novelty detection of sequences by the hippocampus: elaborating on the SOCRATIC model to account for normal and aberrant effects of dopamine. Hippocampus 11:551-568. Mishkin M (1982) A memory system in the monkey. Philosophical Transactions of the Royal Society of London: Series B 298:85-92. Myers CE (1996) Overview of dentate gyrus modeling, through Oct 1996. Unpublished. Myers CE, Gluck MA, Granger R (1995) Dissociation of hippocampal and entorhinal function in associative learning: A computational approach. Psychobiology 23:116138. Resocrla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Classical conditioning II: current research and theory (Black AH, Prokasy WF, eds), pp 64-99. New York: Appleton-Century-Crofts. Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900-913. Scoville WB, Milner B (1957) Loss of recent memory after bilateral hippocampal lesions. J Neurol Neurosurg Psychiat 20:11-21. Shohamy D, Allen M, Gluck MA (1999) Dissociating entorhinal and hippocampal function in latent inhibition of the classically conditioned eyeblink response. Abstracts of the Society for Neuroscience Annual Meeting (Miami, FL) 25,40.14. Thompson R (1972) Sensory preconditioning. In: Topics in Learning and Performance (Thompson R, Voss J, eds), pp 105-129. New York: Academic Press

APPENDIX: SIMULATION DETAILS This appendix describes the computational details of the model proposed in this paper.

Stimuli Input to the system consists of a 16-bit stimulus vector; the first 8 bits represent phasic (conditional) stimuli (CS) while the last 8 bits represent tonic (contextual) stimuli (context). Context X (no CS): CS A (in X): CS B (in X): CS AB (in X):

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0

Training occurs in blocks of 20 trials. Each trial is paired with a bit representing the presence or absence of an unconditional stimulus (US). At the start of each experiment, we initialize the system with 200 blocks of context only trials. Thereafter, each block consists of 10 trials of context, followed by one trial with the CS, followed by 9 more trials of context. In these experiments, we always paired context with an inactive US. The CS may be paired with an active US (i.e., A+) or an inactive US (i.e. B-) depending on the training paradigm. Note that the tonic stimuli representing context are always present, even when a CS appears. For the acquisition paradigm, we exposed the system to 700 blocks with CS A after the 200 initialization blocks. We paired the occurrence of CS A with a US of 1.0 and measured how many blocks it took before the model reached criterion performance (80% or higher). For the discrimination paradigm, we exposed the system to 1400 blocks of alternating CS A and CS B (700 blocks each). We paired CS A with a US of 1.0 and CS B with a US of 0.0. For the sensory preconditioning paradigm, we had two conditions. In the exposure condition, we exposed the system to 400 blocks of CS AB with no US (0.0), followed by 400 blocks of CS A with a US of 1.0, followed by 400 blocks of CS B with no US. In the sit condition, we exposed the system to 400 blocks of context only with no US followed by 400 blocks of CS A with a US and 400 blocks without. We compared these by measuring the peak response to CS B in the third stage of the experiment. For the compound preexposure paradigm, we had the same two conditions: exposure and sit. The exposure condition consisted of the same 400 blocks of CS AB with no US followed by the same 1400 blocks of alternating CS A and B as in the discrimination paradigm. The sit condition consisted of 400 blocks of context only followed by the same 1400 blocks of alternating CS A and B. (Note: the only difference between the discrimination paradigm and the sit condition of the compound preexposure paradigm is that in the former we initiate the model with 200 context only blocks and in the latter we expose the model to 600 (200 + 400) context only blocks before exposing it to any CS.) For the source code, please visit http://alessandro.gagliardi.name/HPC/Stim.ml.html

Entorhinal Network We adapted the entorhinal network from the piriform clustering model (AmbrosIngerson et al., 1990; Granger et al. 1989) and the previous entorhinal network model (Myers et al., 1995). The network consists of five nonoverlapping patches of 20 nodes (100 nodes in all). The patches operate in parallel. Each bit from the input (see above) stimulates each node in each patch. (16 input nodes x 100 entorhinal nodes = 1600 connections in all.) We initialize the weight of each connection to a random decimal between 0 and 2/[the number of inputs] or (0.0, 0.125) so that the combined weight of all connections to a single entorhinal node will add up to about 1.0. Each patch also has a “mask” vector that it applies to the input vector. This mask dampens the activation of the input vector. We initialize each value in each mask to 0.0. Subsequently, the system updates the mask to equal the weight vector for the connections to the most activated entorhinal node in that patch. Each node calculates its activation yn as: yn = ∑i win (Ii – mi) where Ii is the activation of the ith element of the input vector, mi is the mask for the ith element of the input vector, and win is the weight from that element to node n. After the system determines the activation of each entorhinal node in a patch, it selects the “winner” as the node with the maximum activation yn. The system sets the output of winning node to 1.0 and sets all other nodes to output 0.0. The system then saves the weights of the winning node as the mask for the next trial. The winning node also updates its weights as: Δwin = f’(win) α+ Ii (1.0 – yn) and all the other nodes update their weights as: Δwin = f(win) α– Ii (0.0 – yn) where f(w) = w (1.0 – w) and α+ = 0.05 and α– = 0.005. The important differences between this and the prior implementation by Myers et al. (1995) consist of: 1) initializing all weights to values between 0 and 1 (more specifically, between 0.0 and 0.125), 2) masking inputs to simulate a refractory period, 3) modifying the weight modification function by f so that the weight will never exceed 1 nor fall below 0. (As w approaches 1 or 0, f(w) approaches 0.) For the source code, please visit http://alessandro.gagliardi.name/HPC/EC.ml.html

Hippocampal Network As discussed in the text, the hippocampal network in this model functions only to translate the output of the entorhinal cortex into a training signal for the association cortex. It does not do any processing. For this network, we use five nodes. Each node in the entorhinal network stimulates each node in the hippocampal network. (100 entorhinal nodes x 5 hippocampal nodes = 500 connections in all.) We initialize the weight of each connection to a random value between –0.3 and +0.3. Each hippocampal node also has one bias term that acts as a constant stimulus. The bias term also has a weight initialized to a random value between –0.3 and +0.3. The system calculates the output of the hippocampal network as: on = bn + ∑i win Ii where bn represents the weight of the bias term, win the weight of the connection between entorhinal node i and hippocampal node n and Ii represents the output of entorhinal node i. No weight changes occur in this network. This network trains the internal layer of nodes in the association cortex, of which there are 10. Two random nodes in the hippocampal network train each internal node in the association network. (Two different hippocampal nodes for each association node: 20 connections in all.) We initialize the weight of one of these connections to a random value between –0.3 and +0.3. We initialize the weight of the other connection to 1 minus the weight of the first connection, yielding a value between 0.7 and 1.3. The total weight of these two connections to the association node always comes to 1. These weights do not change either. The output of the hippocampal network to the association network acts as a “training signal” in the manner of a Rescorla-Wanger (1972) model as described below. For the source code, please visit http://alessandro.gagliardi.name/HPC/Conn.ml.html

Association Network We adapted the association network from the cortical network in the Gluck and Myers (1993) model. We used a two-layer network with full connectivity between the 16 inputs (the same 16 inputs that stimulate the entorhinal network) and 10 internal nodes and between the internal nodes and 1 output node. As with the hippocampal network, we initialize the weight of each connection to a random value between –0.3 and +0.3. The system calculates the output of each node as: on = ∑i win Ii There is no bias term. The system updates weights as: Δwin = β (λn – on) Ii

where λ is the training signal and β = 0.5 when λ = 1.0 and β = 0.05 otherwise. For the internal nodes we determine λ based on the activity of the hippocampal network (see above). For the output node, λ = 1.0 when the US is present and λ = 0.0 when it is not. For the source code, please visit http://alessandro.gagliardi.name/HPC/Cer.ml.html For details on how we ran the experiments, you can view the source code at http://alessandro.gagliardi.name/HPC/main.ml.html. You can find links to the source code for each module used in this simulation at http://alessandro.gagliardi.name/HPC/Makefile.html

Suggest Documents