2.3.5 Replay and Catastrophic forgetting . .... 2007). When information of an object is recollected, mainly the hippocampus and posterior .... are not considered (Káli and Dayan 2004; Fiebig and Lansner 2014) or there is lack of a loop structure of the .... Incremental learning: All connections are trained with individual inputs.
University of Amsterdam Bachelorproject Alternative circuits and interleaved learning in the hippocampus: A computational model
Author:
Supervisor:
Ábel Ságodi
Prof. J.M.J. Murre
10325948
June 26, 2015
Abstract The hippocampus plays a critical role in system memory consolidation. New techniques provide new experiments and data to test in a computational framework. The trisynaptic architecture of the hippocampus is generally considered to provide the main contribution to learning and memory, with the CA1 as main output. However, new proposals state that other circuits can make substantial contributions. In this paper, a new computational model of the involvement of the hippocampus in episodic memory consolidation is presented. In this model a more biologically plausible anatomical architecture of the hippocampus is employed. With this model, the eect of interleaved learning of previously encountered objects will be addressed. Also, alternative circuits and their impact on the ability to consolidate are considered. Furthermore, the importance of replay for memory consolidation is analyzed. The results indicate that hippocampus dependent learning is improved by interleaved learning and replay of hippocampal representations. Additionally, considering alternative output regions of the hippocampus seems to be promising, based on the results for enhancing the learning capability of the neocortex. Besides, the model was able to reproduce the Ribot gradient and memory decay as function of cell loss in the parahippocampal gyrus and dentate gyrus. Finally, the model could be used to investigate the eect of lesions of the hippocampus on mental imagery.
Keywords:
systems memory consolidation, trisynaptic hippocampal circuitry
Acknowledgements I thank prof. dr. J.M.J. Murre for his time and valuable suggestions. I would also like to thank Leander de Kraker for his time and discussions. i
Contents 1 Introduction
1
1.1
Episodic memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.3
The Hippocampus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3.1
Hippocampal circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3.2
Alternative circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3.3
Recurrence, replay and interleaved learning . . . . . . . . . . . . . . . .
5
1.3.4
Alzheimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3.5
Computational models of consolidation . . . . . . . . . . . . . . . . . . .
6
2 Methods
7
2.1
The Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2
The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2.1
Network Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2.2
Training the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3
Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1
Learned stimuli and sampling . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2
Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3
Dierent learning rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.4
Semantic consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.5
Replay and Catastrophic forgetting . . . . . . . . . . . . . . . . . . . . . 14
2.3.6
Pattern separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.7
Lesion studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.8
Imagining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Results
17
3.1
Helping learning - Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2
Learning rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1
Semantic consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3
Replay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4
Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5
Eects of lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6
3.5.1
Retrograde Amnesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.2
Anterograde Amnesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Mental imaginary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 ii
4 Discussion 4.1
27
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.1
Technical Complications . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2
Biological Plausibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.3
Computational issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2
Underlying biological mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3
Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Reference list
32
Appendix A Technical Specications
39
A.1 Learning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 A.2 Multiple Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix B Additional Results
40
iii
1 Introduction 1.1 Episodic memory The main focus of this article is the formation of episodic memories, which enables us to recollect specic events and the spatiotemporal context in which they were formed. By stating that the anatomical substrate of episodic memory is the hippocampus and referring to the neural activities which relate to the causal mechanisms by which the psychological states and processes can be explained, Werning and Cheng (2014) conclude that episodic memory is a natural kind, and therefore can be regarded as real. Episodic memory in humans can be observed through verbal recall, but Werning and Cheng (2014) show results from which they conclude that rats also demonstrate behaviour that can be interpreted as reecting episodic memories. Often a distinction is made between familiarity and recollection (Greve et al. 2010; Yonelinas et al. 2010; Diana et al. 2007). Recollection is the process of recognizing an item on the basis of the retrieval of specic contextual details, whereas familiarity is the process of recognizing an item without retrieval of any specic details about the encountered episode (Diana et al. 2007). When information of an object is recollected, mainly the hippocampus and posterior parahippocampal gyrus are involved, whereas activity in the anterior parahippocampal gyrus is correlated with familiarity (Diana et al. 2007).
1.2 Consolidation Memory consolidation either refers to synaptic consolidation as a result of gene-expression dependent transformation of information, or to systems consolidation consisting a slower postencoding reorganization of long-term memory over dierent brain circuits (Dudai 2012; Kandel et al. 2014). The engagement of the cortico-hippocampal system in the formation of declarative memory is often studied in computational models (Káli and Dayan 2000; Norman and O'Reilly 2003; Becker 2005; Norman 2010; Fiebig and Lansner 2014). Takehara-Nishiuchi and McNaughton (2008) showed that acquired associations initiate subsequent gradual processes, that result in lasting changes in the medial prefrontal cortex without continued training. Complementary learning systems theory was originally described in McClelland et al. (1995). McClelland et al. (1995) proposed that new episodic memories are initially represented in the hippocampus, and that recalling these memories from the hippocampus helps to form semantic memories. Complementary learning systems theory assumes that the hippocampus supports the recollection of episodic associations, whereas the surrounding medial temporal lobe supports familiarity based recognition (Gluck et al. 2003; Norman and O'Reilly 2003; O'Reilly et al. 2011). The neocortex slowly forms representations 1
that do not incorporate all details in a single episode, but instead extracts regularities over longer time intervals (Gluck et al. 2003). System level memory consolidation is interpreted in dierent ways in the complementary learning system and in the transformation hypothesis frameworks. Standard consolidation theory states that the hippocampus has a limited role only in the formation of long-term memories. The ndings regarding temporally graded retrograde amnesia are in line with this position (Squire 1992; Nadel and Moscovitch 1997; Winocur et al. 2010; Nadel et al. 2012). Consolidated long-term memories should not be susceptible to lesions or amnesic agents, which disrupt hippocampal functioning (Winocur et al. 2010; Winocur and Moscovitch 2011). Both standard consolidation theory and the transformation hypothesis state, that neocortical structures store the representations of memories that initially form in the hippocampus, although the two approaches oer dierent interpretations of these representations (Winocur et al. 2010). However, these theories disagree on what kind of representations are stored; standard consolidation theory holds that neocortical memories are identical to those originally expressed in the hippocampus during learning. However, the transformation hypothesis argues that these memories are schematic versions of the original memories with dierent characteristics (Winocur et al. 2010). Experimental results of Copara et al. (2014) are consistent with the transformation hypotheses.
1.3 The Hippocampus The medial temporal lobe plays a critical role in memory and is the anatomical substrate of episodic memory (Eichenbaum et al. 2012; Werning and Cheng 2014). Medial temporal lobe structures that are critical for long-term memory include the hippocampal system, along with the surrounding hippocampal region consisting of the parahippocampal gyrus and neocortical regions. The hippocampal system refers to a system of interrelated brain regions that play a special role in learning and memory. The most important part of the hippocampal region is the hippocampus itself, which consists of the Dentate Gyrus, the CA1-3 elds of Ammon's Horn and the subicular complex (see Figure 1).
1.3.1 Hippocampal circuitry There is anatomical and neurophysiological evidence for a big-loop recurrency in how the hippocampus processes information received from the entorhinal cortex (van Strien et al. 2009; Binicewicz et al. 2015). The most studied loop linking the hippocampus with neocortical areas is the trisynaptic circuit. This pathway relays information from the entorhinal cortex via the perforant path to the dentate gyrus, from the dentate gyrus to CA3, CA3 to CA1 and
2
eventually from CA1 back to the entorhinal cortex. The hippocampus is thought to function in two modes; retrieval or encoding (Meeter et al. 2004). The hippocampus is proposed to switch between memory encoding and retrieval by continually computing the overlap between what is expected and what is encountered (Duncan et al. 2012). Becker (2005) states that dierent pathways are used in the trisynaptic loop during the two dierent modes.
Figure 1: Information streams in the hippocampal system (Avila et al. 2015).
Input
The primate hippocampus receives information from the entorhinal cortex (mostly
layer II), the parahippocampal gyrus, perihinal cortex and from processing stream in the cerebral association cortex, prefrontal cortex and parietal cortex (Rolls 2013). The entorhinal cortex receives mostly input from the parahippocampal gyrus (which is used as starting point in the computational model of Káli and Dayan (2000); Káli and Dayan (2004)). The parahippocampal gyrus and perihinal cortex are believed to be involved in representing different components of the presented episode. Kesner and Rolls (2015) suggest that a possible function of the perihinal cortex may be to provide a visual short-term memory, while the representations in the parahippocampal gyrus contain information of the spatial content of an episode.
Dentate Gyrus
A projection from the entorhinal cortex to the contralateral dentate gyrus
of the hippocampus has been demonstrated in rats (Goldowitz et al. 1975). Furthermore, electrophysiological analysis showed that stimulation of the contra-lateral entorhinal cortex resulted in discharges of granule cells in the dentate gyrus (White et al. 1976). Pattern separation is operationally dened as the deviation from the linear transformation of the input (∆ input = ∆ output, Yassa and Stark (2011)). Yassa and Stark (2011) conclude from electrophysiological data of rodents that pattern separation occurs in the dentate gyrus. Bakker et al. (2008) showed the feature for the dentate gyrus with high-resolution fMRI. The principal reason to actually assume that pattern separation occurs in the dentate gyrus is its relative size to the other substructures of the hippocampus. Amaral et al. (2007) state in their review that the dentate gyrus has not been subject to substantial phylogenetic modication. 3
Finally, neurogenesis in the dentate gyrus is believed to have a role in pattern separation (Aimone et al. 2011).
CA3
The CA3 is important for acquisition and memory consolidation, as shown in mice
(Florian and Roullet 2004). The projection from the dentate gyrus is used to present new pattern separated representations to CA3 neurons to reduce interference and support new learning (Lörincz and Buzsáki 2000; Yassa and Stark 2011). Leutgeb et al. (2007) support the principle of pattern separation in the dentate gyrus and CA3 with experimental results in rats. There are neurons in layer II of the entorhinal cortex which have collaterals that directly reach CA3, bypassing the dentate gyrus, creating divergent information streams from the entorhinal cortex to CA3 (Lörincz and Buzsáki 2000). The direct perforant path from layer II EC neurons can be used to provide a cue for recall (Yassa and Stark 2011; Rolls 2013).
CA1
Besides the input from the CA3, CA1 neurons are also excited by direct inputs from the
entorhinal cortex, but this is rather weak. Duncan et al. (2012) showed with high-resolution fMRI that the CA1 area functions as a match/mismatch detector. Using high-resolution fMRI in combination with neural pattern similarity analysis, this specialized encoding mechanism to human CA1 was localized (Schlichting et al. 2014). The CA1 is believed to be the main output gate of the hippocampus (Leutgeb et al. 2004).
Output
The EC compares the dierence between neocortical representations (primary in-
put) and feedback information conveyed by the hippocampus (the "reconstructed input") (Lörincz and Buzsáki 2000; Mizumori 2013). As Lörincz and Buzsáki (2000) proposed and Szirtes et al. (2005) veried, temporal integration of hippocampal output occurs in layer II of the entorhinal cortex and the dierence between the representation in the entorhinal cortex and the representation from the hippocampus is calculated. Furthermore, it is believed that the hippocampus communicates with the parahippocampal gyrus and the perihinal cortex through the entorhinal cortex.
1.3.2 Alternative circuit The CA2 area is often neglected in (computational) models. However, Chevaleyre and Siegelbaum (2010) and Kohara et al. (2014) provide experimental results which indicate that CA2 receive input from the entorhinal cortex (layer II) and dentate gyrus and can strongly excite CA1 neurons in another layer than CA3 cells do. Also anatomical ndings indicate alternative output projections apart from the CA1; sparse basal dendrites appear to provide a feedback from the dentate gyrus (Freund and Buzsáki 1996).Lörincz and Buzsáki (2000) propose that 4
multiple hippocampal outputs with minimized mutual information can train synapses in the entorhinal cortex and neocortical circuits. Finally, Scharfman (2007) described a backprojection mechanism from the CA3 to the dentate gyrus which is often neglected.
1.3.3 Recurrence, replay and interleaved learning Kumaran and McClelland (2012) proposed that recurrence in the hippocampus and between the hippocampus and neocortex could support ecient learning. Moreover, recurrent collaterals in the CA3 can hold more items simultaneously, which could be the basis for temporal order memory (Rolls 2013). Sutherland et al. (2010) proposes two dierent processes which would enhance the neocortical representation; through a small amount of direct projections and through a series of replays (for some hours after each episode). Memory performance is positively inuenced by the number of reactivation events in tasks such as spatial learning (Fiebig and Lansner 2014). Finally, o-line replay maintains declarative memories and avoids fast decay of memory traces in a computational model (Káli and Dayan 2004). In interleaved learning new information is repeatedly presented interleaved with known information (McClelland et al. 1995; McClelland 2013). Interleaving promotes gradual assimilation of new information into connections among the network's neuron-like units with a minimum of interference (McClelland 2013). Tse et al. (2007, 2011) provide experimental results to support the hypothesis that semantic representations can be build up over time in networks outside the hippocampus and once such a schema has been acquired, new knowledge can be added to it quickly.
1.3.4 Alzheimer When memories are consolidated, generalization is preserved but recognition performance is impaired (Lörincz and Buzsáki 2000; Winocur et al. 2010). New ndings in preclinical Alzheimer's disease patients and mouse models of the disease suggest, that lateral entorhinal cortex is susceptible to tau pathology early in Alzheimer's disease (Yassa 2014). Furthermore, it is more probable that the CA2 pathway than the CA3 circuit is impaired in the development of Alzheimer disease (Avila et al. 2015). Assuming that interleaved learning indeed is a component of the learning schema in consolidation predicts, that older memories should be relatively spared with hippocampal damage, because they will have had time to be consolidated through interleaved replay into the distributed neocortical system (Zola-Morgan et al. 1986; McClelland et al. 1995; O'Reilly et al. 2011; McClelland 2013). CA3 size predicts the precision of memory recall (Chadwick et al. 2014), which may play an important role of how memory degrades in aging and pathological
5
disorders. Chadwick et al. (2014) also found that recalling highly similar episodes leads to overlapping representations in the CA3 which results in interference.Finally, Wilke et al. (2014) showed that mossy ber synapses between dentate gyrus and the CA3 disrupts functioning in Familiar Alzheimer Disease.
1.3.5 Computational models of consolidation Investigation of the functions of the hippocampus itself and its substructures is problematic in living animals and therefore computational studies are popular. Articial neural networks were designed to approximate functions. Later they were used to model the memory storage function of biological neural networks. In computational models both roles of the hippocampus are used:the role in helping complete partial input patterns before consolidation is complete and the role in training the cortex to perform appropriate completion by itself (Káli and Dayan 2000; Káli and Dayan 2004; Becker 2005). Eichenbaum et al. (2012) conclude from behavioral, lesion and recording approaches in animals, that dierent components of the medial temporal lobe make distinct contributions to the memory capacities. Existing computational models lack one or more of the above mentioned ndings of experimental and anatomical investigation. For example interleaved learning is not included (Káli and Dayan 2004; Gluck et al. 2003; Kesner and Rolls 2015), dierent circuits are not considered (Káli and Dayan 2004; Fiebig and Lansner 2014) or there is lack of a loop structure of the hippocampus (Fiebig and Lansner 2014), which is rather hierarchical (Káli and Dayan 2004; Kumaran and McClelland 2012). The aim of this paper is to investigate new proposals of the functioning of the hippocampal
To investigate the proposed recurrency in the trisynaptic hippocampal circuit rather that the usual unidirectional ow (Kumaran and McClelland 2012), a new computational model is presented. apparatus.
While prior work suggests that the hippocampus is involved in linking memories experienced at dierent times, the involvement of specic subelds in this process remains unknown. To verify the hypothesized functioning of the substructures of the hippocampus, additional simulation experiments will be conducted. These experiments concern the orthogonality of representations in the dentate gyrus, CA3 and CA1 as described in Yassa and Stark (2011). The huge discrepancy in time scales of the hippocampus and the neocortex will be addressed (Fiebig and Lansner 2014). Furthermore, the eect of lesioning dierent substructures of the network will be investigated. With these lesion studies the progress of Alzheimer can be addressed in a gradually degrading hippocampal system. Finally, the necessity of the hippocampus in mental imagery will qualied.
6
2 Methods 2.1 The Restricted Boltzmann Machine The algorithm for the network is based on a widely accepted learning algorithm, the Restricted Boltzmann Machine (RBM). Boltzmann Machines were introduced as bidirectionally connected network with probabilistic graphical units (Ackley et al. 1985). With a restriction on the network topology, Smolensky (1986) introduced the RBM with a simplied learning algorithm. Hinton and colleagues invented a faster learning algorithm for the RBM (Hinton et al. 1995; Brown 2000). Both single and stacked RBMs can be interpreted as stochastic neural networks and after learning as feed-forward neural networks. Hinton et al. (2006) use such a stacked RBM, i.e. Deep Boltzmann Machine to extract higher-order representations to improve the learning of data distributions. After the introduction of the RBM as an ecient learning machine it was quickly used to model memory (Káli and Dayan 2000). The RBM is composed of distinct layers, which can be categorized either visible or hidden. The visible layer consists of "neurons", or input nodes, which actually receive or "see" the input vector, where in the case of the cortex only a tiny fraction of the neurons receive direct inputs from the sensory organs. The training of a RBM consists of two phases (Della Sala et al. 2010; Hinton et al. 1995). These phases could be interpreted as an "awake" phase, while actually perceiving the object (vector), and a "sleeping" phase, while the system makes representations of the therefore observed object (Della Sala et al. 2010). During the "awake" phase the input is propagated to the second layer with the activations calculated as in Equation (1) and the strength of the connections are increased in proportion to the dot product to the activations on both sides. During the "sleeping" phase the representations in the second layer are backpropagated to the rst layer and the object is reconstructed from the representation in the second layer and with this reconstruction the representations in the second layer are calculated again. At the end of the "sleeping" phase the connections are decreased as proportion to the dot product to the new activations on both sides. These proportions represent the learning rate of each update of the connections. The "sleeping" phase could also be interpreted as the replays as described in Dudai (2012) during sharp wave ripples. These networks thus show a consolidation phase following initial learning, during which the internal representations develop further (Hinton et al. 1995). The "sleeping" phase can be extended to include more cycles in which the activations in the rst layer are reconstructed (Hinton 2010). Nevertheless, a single cycle is shown to be suciently ecient in practice. No connections exists between neurons in the same layer, but layers are fully connected with each other, meaning that each neuron in one layer is connected to all neurons in another. 7
The classical RBM is hierarchical, meaning that each layer is at most connected to two other layers. The connections between neurons are symmetrical, implying that if xij is the weight for the projection of neuron i on neuron j then xij = xji . The activation function of a neuron in a stochastic network as the probability to be turned on, like the RBM, is described by the sigmoid function:
σ(x) =
1 1 + e−x
(1)
with total input x. Hidden h1
Visible
h2
v1
h3
v2
h4
v3
Figure 2: Visualization of the connections between the two-layers in a Restricted Boltzmann Machine with 3 visible units and 4 hidden units. The RBM algorithm operates with unsupervised learning of hierarchical representations, making it biologically more plausible. The algorithm requires symmetric connections between input and hidden layer, represented as a single value bottom-up recognition and top-down generation. However, there is no evidence that in the brain the strength of real bottom-up and top-down connections are symmetric. Davelaar (2011) and Carpenter et al. (2003) believe that the brain is able to nd higher level coding of essential features that represent lower level activations. This kind of correlational learning is how a RBM adjusts its weights to be able to represent the input vectors. A more detailed description of Restricted Boltzmann Machines can be found in Fischer and Igel (2012).
2.2 The model Meeter et al. (2007) and Ashby and Helie (2012) state that an ideal computational cognitive network model should not make any assumptions that are known to contradict the current neuroscience literature. At the same time it should provide good accounts of behavior and at least some neuroscience data. A neural model of the hippocampus should be based on, or at least not against the general view, the information given in Section 1. The model presented here is based on a combination of evidence from amnesia and functional imaging in humans, as well as lesion and single neuron recording studies in animals.
8
Hippocampus
CA3
Dentate
CA1
Gyrus
Parahippocampal gyrus
Neocortex Figure 3: Network architecture with the trisynaptic loop as described in Gluck et al. (2003); Becker (2005). A part of the network could be interpreted a hierarchical Deep Belief Network consisting of stacked RBMs. The hippocampus consists of the dentate gyrus, CA3 and CA1. The arrows show in which direction information ows. The solid arrows represent RBMs, while the dashed arrow symbolizes the Multiple Perceptron projecting from the CA1 to the entorhinal cortex.
2.2.1 Network Properties The ratios between the structure sizes are adopted according to anatomical studies in mice. (Becker 2005) used the sizes as in table 1. Table 1: The default properties of the network
Number of neurons in model Learning Rates
Parahippocampal gyrus
Dentate gyrus
CA3
CA1
200
1000
400
300
0.005
0.1
0.1
0.1
A visualization of the network with a hippocampal structure can be seen in Figure 3. A part of the network is actually a deep belief network with ve RBM layers with an alternative learning rule. For the rst hidden layer, which is analogous to the entorhinal cortex in the network, handwritten digits from the MNIST database were used as input. These digits composed of 28 by 28 oats and were normalized to allow the RBM to learn their distribution. The entries 9
of the image matrix could be interpreted as the dierent dimensions in which we perceive the world and represent a single episode in this case. Since the entries are continuous, it could also be interpreted as e.g. distance to an object in the perceived episode. For a great part of the simulation a single example of each digit is used, however, when more are included, the same digits are believed to represent episodes that are similar to each other.
2.2.2 Training the network During training the representations of one layer are used as input for the next one. To be able to recreate the continuous nature of the input, the output of the network is the probability vector rather than the activations with Equation (1). The described hippocampal network diers from a classic hierarchical stacking of RBMs in the way that the layer that plays the role of the CA3 receives input from two preceding layers, namely the entorhinal cortex and the dentate gyrus (Figure 3). Montavon and Müller (2012) introduce a new architecure for the Botzmann Machine, the locally connected Deep Boltzmann Machine, which consists of two separate RBMs which are connected to each other in a deep hidden layer. Cleaving and subsequently concatenating representations were shown to result in an improvement in the encoding of information (Montavon and Müller 2012). Interleaved learning in the model is addressed by propagating all the stimuli that are perceived before to the CA3 and then letting the RBM learn with the obtained matrix, which includes all stimuli vectors. Another deviation from a classic deep belief network, is the linking of the highest layer, the CA1, with an intermediary layer, the entorhinal cortex. The connections between the CA1 an the entorhinal cortex are learned with a dierent learning rule. Because the connections between the entorhinal cortex and the CA1 are believed to be very exible, they are updated critically. For each neuron in the parahippocampal gyrus a Perceptron is used and all the connections from the CA1 to the parahippocampal gyrus are reerd to as th Multiple Perceptron. One Perceptron receives the activations of a single neuron from all of representations of the stimuli in the CA1 and is clamped to the activations belonging to that neuron in the entorhinal cortex. Combining the prediction of all the Perceptrons together gives the activations in the parahippocampal gyrus. The algorithm to train the Multiple Perceptron is given in Section A. Simultaneously, the representations in the dentate gyrus are backpropagated to the parahippocampal gyrus. This replay initiated in the hippocampus will be modeled by the representations from either the CA1, dentate gyrus of a combination of both activations to the parahippocampal gyrus. The way in how the activations of both CA1 and dentate gyrus in the parahippocampal gyrus are combined, corresponds to the error prediction signal as discussed in Lörincz and Buzsáki (2000) and Mizumori (2013). The way of the merging of 10
the two signals relies on the assumption, that the main output of the hippocampus is through the CA1. The merging happens by favoring the activations by the dentate gyrus which are also activated by the CA1. If the nal representation of the hippocampus is noticeably different from the activations in the parahippocampal gyrus, additional learning will occur as compared to a representation that is similar. The connections between the parahippocampal gyrus and the neocortex are trained with the RBM algorithm. As the two-layer network replays all stimuli individually, the learning rate is corrected by dividing it with the number of stimuli. To make the actual algorithm for a deep belief network architecture for Restricted Boltzmann Machines more biologically plausible, an alternative approach was used. In stead of training the Restricted Boltzmann Machines layerwise, every layer is updated after a stimulus was shown. This would mean that a stimulus actually had the possibility to change all the weights in the network in a single propagation. Summary of how the network is trained, step by step: Incremental learning: All connections are trained with individual inputs. This part of the algorithm corresponds to actually perceiving the objects. Propagation: All stimuli are propagated as a matrix to the CA3, in order to allow interleaved learning. Interleaved Learning: Extra learning of the connections CA3-CA1. With the representations of all the learned objects, correlations are learned in an interleaved manner as proposed by Kumaran and McClelland (2012). Training the CA1-Parahippocampal gyrus connections: The Multiple Perceptron is trained Replay: Hippocampal replay is calculated from activations by the dentate gyrus and the CA1. Finally, extra learning of the connections to the neocortex
2.3 Simulation Methodology 2.3.1 Learned stimuli and sampling During each time that the dataset is shown, the parahippocampal gyrus is presented with a 784-element input pattern, specifying whether each cell in the neocortex is ring or is silent, and produces a 1000-element output pattern based on the activation function of Equation (1). Considering that recurrency should have a positive eect, just like in attractor networks, input should be allowed to circulate through the network. Indeed, it showed to be favorable to let 11
the input circulate through the network for both the two-layer and the hippocampal network up to a certain number of times. To ensure that all networks output a sequence that is as close as possible to the input (calculated with Equation (2)), the circulation stops when the distance is larger than in the previous cycle. All stimuli were shown to the networks 50 times. To measure the ability of the network for pattern completion, the learned stimuli were partially masked by setting a portion of of the image matrix to zero. This could either be interpreted as a partially perceived object or lesion in the neocortex. The learned objects were masked for 50% (unless indicated otherwise). Because the the non-zero components of the input were centered in the middle of the vector, random masking of the images could result in a bias when measuring performance. Taking this into consideration, the rst half of the components of the input were set to zero when masking them. Whereas the main output of the hippocampus is through the projections of the CA1 to the entorhinal cortex (Becker 2005; Kesner and Rolls 2015), alternative output streams were tested. Performance of learning ability was measured for output streams of the hippocampus to the parahippocampal gyrus through:
1. projection of CA1 2. projection of dentate gyrus 3.
combination of the projections of the CA1 and dentate gyrus . Ideally, direct backprojections of the CA3 to the parahippocampal gyrus should be used instead of that of the dentate gyrus, however technical diculties did not allow this. For biological plausibility, it is necessary for a learning rule to be local, meaning that the updating of a weight or connection depends only on the information (activations) on either side of the connection (Davelaar 2011). Besides, new patterns should be learned without using the patterns that have also been used for training (Gluck et al. 2003; Davelaar 2011). Because learning should be incremental (Davelaar 2011), the stimuli are all presented individually. Nevertheless McClelland et al. (1995); Kumaran and McClelland (2012); McClelland (2013) shows the importance of interleaving new stimuli with codes already stored in the brain and especially the hippocampus. In the model this is achieved through propagation of all learned stimuli until the CA3 and than presenting all the CA3 representations as a matrix to train the weights between the CA1 and CA3. The dierence between solely incremental and hippocampal interleaved learning are going to be addressed.
2.3.2 Consolidation To show that the network indeed helps the connections between the neocortex and parahippocampal gyrus to learn the stimuli better both networks were tested for dierent numbers of stimuli shown; and dierent learning rates. Whether the hippocampal structure improves learning is addressed in two dierent ways; rst, as described above RBMs have a natural tendency to replay representations which they experienced during learning and second, hip12
pocampal representations are replayed between presentations of actual stimuli.
2.3.3 Dierent learning rates Roxin and Fusi (2013) found that the inclusion of larger number of memory regions results in longer memory lifetime; and that initial memory strength of partitioned model of memory can be orders of magnitude larger than that of non-partitioned models. Plastic synapses are good at storing new memories but bad at retrieving old ones, while slow adapting synapses can preserve old memories but cannot store new ones in detail (Roxin and Fusi 2013). Fiebig and Lansner (2014) describe that ideally the ratio of time scales of the cortex to the hippocampus is at least 1:20 in simulation. In this paper, it is assumed that dierences in time scale can be represented by changing the learning rates of the RBMs. To test this result dierent learning rates will be tested for the connections between the parahippocampal gyrus and neocortex. First of all, the ability of the hippocampus to help consolidate memories within the connections between the parahippocampal gyrus and the neocortex was measured. To test the degree of how well the stimuli are learned in dierent networks, the learned stimuli were shown to the network as input. The average distance, as described in Greve et al. (2010) (equation 2) between the input stimuli and output representations of the networks, of all stimuli were compared. To verify whether a pattern is learned properly Greve et al. (2010) calculate the normalized dot product between the input vector ~vin and output vector ~vout : ~vin · ~vout 1 1− 2 |~vin ||~vout |
(2)
. As the stimuli have only positive components, it can be deduced from Equation (2) that the distance by which two vectors are totally unrelated is 0.25. This means that a network which is fully lesioned or which did not learn yet, the average distance between in- and output will be around 0.25. It was observed that it is desirable to let an input circulate repeatedly through the network, which could be interpreted as letting an attractor network with recurrent connections process the input. Hereby, the input is propagated to the hippocampus; and the projection of the hippocampus to the parahippocampal gyrus is taken as a new input for the hippocampus. Hippocampal networks (with three dierent learning programs as described above) are tested against a two-layer networks (with just one weight matrix) with the same layer sizes as the rst two-layers hippocampal networks and a hierarchical network (as in Káli and Dayan (2000); Káli and Dayan (2004), without the feedback mechanism) with the same layer sizes. For the two-layer network two dierent learning programs were developed for control. A two-layer network was trained as a RBM, while another two-layer network was trained with additional training of the weights with representations of the stimuli in the parahippocampal gyrus. 13
2.3.4 Semantic consolidation In addition, to assess the degree of semantic consolidation, new stimuli were presented (nonmasked), which the networks did not see before, and the ability to reproduce them was measured. The ability to deduce semantic information of the perceived stimuli will be measured for the hippocampal and two-layer network for dierent learning dataset sizes. Furthermore, it will be adopted that the degree in which a network can reproduce stimuli which it did not see previously is a measure for familiarity. These stimuli are very similar to the stimuli that are learned, they are both handwritten digits. After learning dierent numbers of examples of each digit, the networks had to reproduce the new digits. Besides, the anterior parahippocampal gyrus is associated with familiarity (Diana et al. 2007), the non-learned stimuli will only propagate through the parahippocampal gyrus, the second layer of the network. The networks will be tested with the same number of new stimuli as with which they were trained.
2.3.5 Replay and Catastrophic forgetting As the representations from the hippocampus are replayed the hippocampal network could be interpreted as a backpropagation network, however it does not include the loss function gradient. Catastrophic Forgetting is the phenomenon by which a neural network completely loses the ability to recall previously learned objects when exposed to a new dataset and plays a role in backpropagation networks. Whether the problem of catastrophic forgetting, as proposed by Fiebig and Lansner (2014), was tackled, was addressed by presenting the hippocampal and two-layer network random vector as stimuli to learn. Showing random stimuli could be interpreted as the elapsing of time, in which new stimuli are perceived, whose representations also has to be stored. Káli and Dayan (2004) showed in an RBM-based model that replay can preserve memory traces. McClelland (2013) argue that especially including interleaving will allow to deal with catastrophic forgetting. Besides, as Fiebig and Lansner (2014) argue that a computational memory model should include an intrinsic mechanism that can facilitate autonomous replay an drive consolidation, the eect of replaying representations of previously encountered objects were examined.
2.3.6 Pattern separation To verify the ability to orthogonalize the representations of the dierent stimuli in the dentate gyrus and the CA3, the Hamming distance was used as described in Myers and Scharfman (2011). The Hamming distance is dened as the number of mismatching elements in two activation patterns Ix and Iy . Pattern separation was measured as the change in the average
14
normalized Hamming distance across all pairs of patterns: P Normalized Hamming distance =
x6=y |Ix − Iy | S · ( N (N2−1) )
· 100
(3)
with S the size of the structure of which orthogonality is measured and N the number of stimuli (Myers and Scharfman 2011). The Normalized Hamming distance in CA3 is generally the same as in the dentate gyrus, while in both structures it is signicantly higher than in the input (Myers and Scharfman 2011). Besides, in (Yassa and Stark 2011) detailed expectations according orthogonality in the dentate gyrus, CA3 and CA1 are discussed. Yassa and Stark (2011) describe that the representations in the dentate gyrus show great dierences for both distant and close stimuli. Furthermore, representations in the CA3 are believed to be close for nearby stimuli, but orthogonal for stimuli that are distant. Finally, the representations in the CA1 are believed to follow a linear transformation for input-output representations. The dierence in Hamming distance will be calculated between all two pairs of stimuli, and their representations in the dentate gyrus, CA1 and CA3. According to Yassa and Stark (2011) the representation in the dentate gyrus should show the highest Hamming distances.
2.3.7 Lesion studies The eect of lesions for dierent forms of amnesia were tested as described in Fiebig and Lansner (2014). First of all, the eects of lesioning the whole hippocampus was evaluated. The performance of the lesioned network and the non-lesioned will be compared to see whether consolidations was successful. In the computational model of Fiebig and Lansner (2014), the temporal gradient of retrograde amnesia was measured as the resulting change in recall rates, whereas the eects of anterograde amnesia were assessed by lesioning before learning and comparing achieved performance of damaged system against the unlesioned control. The effect of progressive degrees of hippocampal damage was estimated by disabling (randomly) an increasing ratio of hippocampal units (Fiebig and Lansner 2014). Lesions were performed in the substructures of the hippocampus before and (with a gradual character) after learning. Second, the hippocampal and two-layer network were tested to reproduce the Ribot gradient. This was done by gradually lesioning connections between the input and parahippocampal gyrus, and between the parahippocampal gyrus and the hippocampus. The gradient of memory decay will be measured at a single lesion size. To make a distinction in new and old memories, rst ten stimuli were learned to the network for fty presentations, these would become the old memories. For the new memories, another ten stimuli were presented to the network for the same number of times, while also replay of the hippocampal representation 15
of the stimuli learned before happened between presentation of the new stimuli. Lesioning of all substructures should produce a decrease in the ability to reproduce learned stimuli to some degree. With lesions in the parahippocampal gyrus the eects of the Alzheimer's disease can be investigated (Yassa 2014; Wilke et al. 2014). In addition, dierent percentages of the learned inputs will be investigated, which could be interpreted as lesioning the neocortex.
2.3.8 Imagining Besides its important role in memory consolidation, other functions of the hippocampus should not be overlooked (Maguire and Mullally 2013). Especially in humans, the hippocampus provides the possibility to make representations of the future based on previously encountered events (Byrne et al. 2007). To investigate the importance of the hippocmpus in this type of mental imagery, random stimuli were showed to the network to let it develop inner representations. Allowing the network to react to random stimuli, could be interpreted as reecting imagination. After learning the hippocampal network will be tested to form representations of the learned stimuli both unlesioned and with a lesioned hippocampus. Presenting input that does not resemble the learned stimuli, is dierent from recall. Even when several stimuli are learned sucient, presenting random input does not guarantees that the network will output one of the learned stimuli. A network that is able to output a previously learned stimulus without a cue, could be seen as a model for mental imagery.
16
3 Results 3.1 Helping learning - Consolidation Table 2: The eect of the inclusion of a hippocampal region with a trisynaptic structure in system level memory consolidation. Average distances between learned data and output of the networks (for both original vectors and 50% masked as input). Each network was sampled for ten dierent training simulations, standard deviations are shown. Network
Masked data error
Learned data error
Hippocampal, combined
0.020 (3)
0.017 (1)
Hippocampal, CA1
0.061 (4)
0.044 (3)
Hippocampal, dentate gyrus
0.075 (3)
0.054 (1)
Hippocampal, non-interleaved
0.041 (2)
0.027 (2)
Two-layer
0.055 (3)
0.037 (3)
Two-layer with replay
0.042 (3)
0.027 (2)
Hierarchical
0.051 (4)
0.035 (3)
Table 2 shows the results of the four dierent networks with a hippocampal system; 1. with replay of the combined representations of the dentate gyrus an the CA1, 2. with replay of the representation of the CA1, 3. with replay of the representations of the dentate gyrus and 4. with a stimuli apart replayed , the two two-layer networks; with and without replay and the hierarchical network. The smaller errors for the hippocampal network with combined learning, compared to the twolayer and the hierarchical hippocampal network, indicates that the hippocampal structure in this model helps to complete masked stimuli. Replay based on the combined activation of the parahippocampal gyrus from the dentate gyrus and CA1 is signicantly better, than replay based on either activation from the dentate gyrus or the CA1 (Table 2). Also in Figure 4 a grater learning gradient can be seen for the hippocampal network compared to the two-layer network with replay. The hierarchical architecture performs in the same range as normal the two-layer network (Table 2). However, learning is also improved in the two-layer network when representations in the parahippocampal gyrus are replayed (Table 2). Finally, learning is faster in the hippocampal combined replay network, than in the two-layer network (Table 2). Surprisingly, average distances for the the networks with replay rules based on either CA1 and CA3 are signicantly higher than the two-layer and the hierarchical networks. However, when
17
interleaved learning is omitted in the hippocampal model, no signicant dierence in performances compared with the two-layer network were found. In the following, when referring to the hippocampal network, the network in which the combined representations of the CA1 and detnate gyrus was used for replaying hippocampal representations is meant.
Figure 4: An example of how the hippocampal and the two-layer network learn the stimuli. The hippocampal network shows a greater negative gradient than the two-layer network.
3.2 Learning rates The ability of the hippocampal network is improved as the ratio between the learning rates of the connections between the neocortex and parahippocampal gyrus and the connections in the hippocampus increases (Figure 5). This correlation seems to continue to approximately a ratio of twenty, after that improvement is limited (Figure 5). As there is no signicant dierence for learning with learning rate ratios below the 1:5, this indicates that the hippocampal system is more exible; or in other words acts on a dierent time scale than the neocortex.
3.2.1 Semantic consolidation Both the hippocampal and the two-layer network consolidate general information of the encountered stimuli (Figure 6). With this general information they can reproduce even nonlearned stimuli. Interleaved learning, replay and a hippocampal system promote semantic consolidation (Figure 6).
18
Figure 5: Dependence of consolidation on the ratio of the learning rates of the neocortex and the hippocampus. Average distances between the learned vectors and the outputs of the hippocampal and two-layer network after learning for dierent learning rate ratios of the hippocampal system and the neocortex.
3.3 Replay Replay considerably improves the ability of the connections between neocortex and the parahippocampal gyrus to complete masked objects, see Figure 7 and Figure 13. Furthermore, there seems to be a positive correlation between the number of replay and performance up to ve replays (Figure 7). However, this trend is inverted when considering higher replay numbers (Figure 7). The positive eects are also present when considering memory decay as time elapses (Figure 8). Memory traces are less likely to decay in the hippocampal network and are almost completely preserved when hippocampal representations are replayed (Figure 8).
19
Figure 6: Semantic consolidation as the result of perceiving more examples of the same kind of stimulus. The average distance between in- and output for both learned and non-perceived stimuli. The ability to reproduce not-learned stimuli improves with the number examples showed for each digit. The ability to reproduce the learned stimuli in detail decreases when more stimuli has to learn in the same time frame.
3.4 Orthogonality According to Yassa and Stark (2011) a regions can be considered to pattern separate the input if the representations in the region are further are further apart than the input representation (distances calculated with Equation (3)). From it would Figure 9 follow that, as they are on the upper half of the linear transformation. Surprisingly, the dentate gyrus seems the least pattern separating structure (Figure 9b). The Normalized Hamming distance was 26.2(5) in the CA3 and 39.7(6) in the dentate gyrus, while it was 17.9 for the original stimuli.
20
Figure 7: Dependence of consolidation on the frequency of replayed representations in the hippocampus. Average distances over all stimuli between input and output after learning for the hippocampal and two-layer networks.
3.5 Eects of lesions 3.5.1 Retrograde Amnesia First of all, when the hippocampus was removed as a whole, the performance of the hippocampal network remained the same, meaning that the representations of the stimuli are consolidated in the neocortex. When considering the initial performance for pattern completion of the hippocampal network and performance when 50% of the connections from the neocortex is lesioned or damaged, the Ribot gradient can be recognized (see Figure 10). The trend described above for lesions of the parahippocampal gyrus seems also to apply to lesion of the dentate gyrus (Figure 11).
21
Figure 8: Decline of memory traces over time in a hippocampal network with and without replay and in a two-layer network. After learning the actual stimuli, the random stimuli were learned in the networks with the same algorithm. At each time step the average distance over all learned stimuli was calculated. For one of the hippocampal networks the representations of the learned stimuli were replayed at each time step.
(a) Distances between representations
(b) Distances between representations
when 10 stimuli were learned.
when 80 stimuli were learned.
Figure 9: Input/output transfer in hippocampal subelds CA1, CA3 and dentate gyrus (DG).Distance between eacht two representations.
3.5.2 Anterograde Amnesia The ability to learn stimuli is preserved surprisingly long with a shrinking dentate gyrus (Figure 12). The size of the dentate gyrus could be reduced to ve times its original size to acquire a signicant dierence in performance. Even with a size of one tenth of the original 22
Figure 10: Decline of memory traces due to lesion of the parahippocampal gyrus. After removal of each neuron the average distance was calculated over all learned stimuli.
Figure 11: Decline of memory traces due to cell loss in the dentate gyrus. The ability to recall (pattern complete) objects declines faster for new than for old memories when cells in the dentate gyrus gradually are lost. the hippocampal network performed better than the two-layer network. This would mean that the ability to learn is not harmed with cell loss in the dentate gyrus, with for example
23
Figure 12: The ability to learn with smaller dentate gyri. The red line indicates the average performance of ten two-layer networks of normal sizes. The dentate gyrus has to be lesioned for 75% to impair the learning ability signicantly and it has to be damaged for 95% to decrease the learning ability to the level of the two-layer network. Alzheimer's disease. Surprisingly, the ability to reproduce the input remains the same for the hippocampal network, even if the input is masked for 50% (Figure 13), while this ability is damaged in the two-layer network even for small masking percentages. This is again a conrmation of the contribution of the hippocampal structure to learning.
24
Figure 13: Preservation of the ability to recollect episodes when input from the neocortex is reduced. The average distance between input and output is determined for dierent lesion sizes.
3.6 Mental imaginary
(a) Randomly generated image with the
(b) Randomly generated image with a
inclusion of the hippocampal system
lesioned hippocampal system
Figure 14: Randomly generated output of the network. The hippocampal network produced output as in Figure 14a as response to random input. However, when the hippocampus was removed, the ability to recall learned objects from imagery was lost (Figure 14b). When considering that the ability to complete masked input remained with hippocampal lesion, we can identify this feature of the network as dierent from recall. The two-layer network produced output as in Figure 14b as response on random stimuli, 25
which supports the view that this is indeed an attribute that is special to the hippocampal network.
26
4 Discussion First of all, replay of hippocampal representations showed to improve consolidation (see Table 2 and Figure 4) and protect episodic memories from extinguishing (see Figure 8). Interleaved learning and recurrency showed to make an important contribution to successful consolidation. However, this was only found for a hippocampal circuit, including dierent output streams from the hippocampus, which is not in line with the generally accepted unidirectional ow (see Table 2);. Besides, the network was able to reproduce results of lesion studies. Memory decay, as the eects of the progressive neural damage in the dentate gyrus in Alzheimer's disease, could be reproduced. When considering Nevertheless, the network was not able to reproduce orthogonality as proposed for the dentate gyrus and CA3 (see Figure 9 and Section 3.4 for further results). In addition, the network showed the importance of the hippocampus in mental imagery. Finally, the results concerning semantic consolidation points toward the transformation theory (Figure 6).
4.1 Limitations 4.1.1 Technical Complications When considering the backprojection of the hippocampus to the parahippocampal gyrus, updating the connections with the Restricted Boltzmann Machine algorithm was not possible as the representations between the two layers had to been linked to each other. Due to the usefulness of Hebbian learning
1
for encoding applications this algorithm was rst considered
is quite useful for learning applications. However, when this algorithm was implemented it did not allow the network to recall learned objects as expected (data not shown). To make use of the whole sequence, when updating individual connections, a learning algorithms correlating two distributions of vectors was considered. The Neural Network Vector Predictor enables activations, as result of the projection of the CA1, to be linked to the parahippocampal gyrus (Mihalik and Labovsky 2000). The Neural Network Vector Predictor algorithm is further explained in Section A. Finally, the approach of multiple Perceptrons was used as described in Section 2.2.2. It is possible that this solution, which update the connections between CA1 and the parahippocampal gyrus (the Perceptrons), caused the hippocampal network with the CA1 ow to perform worse than the two-layer network. When trying to implement the CA3 backpropagation to the dentate gyrus (Scharfman 2007) another complication arose. The RBM associated with connections to the CA3 was updated with concatenated activations of both the parahippocampal gyrus, and the dentate gyrus. 1
Increasing connections on which either side the two neurons are ring and decreasing connections of neurons which activations does not correspond (Hebb 1949).
27
However, when attempting to train parahippocampal gyrus-neocortex connections, updating the connections with these representations was disadvantageous. When the activations were considered, the backprojections from the corresponding RBM had to be split,which resulted in the loss of valuable correlations. Backpropagating representations from the dentate gyrus to the parahippocampal gyrus did cause problem of backprojecting concatenated data, so for technical considerations this solution was chosen. Lesioning the CA3 and CA1 showed to have less eect than expected, see Figure 15 and 16 in Section B. It seems possible from the lesion data in Figure 15 and 16 that in the hippocampal network, with the combined representation replay, the role of the CA3 and CA1 was less than assumed. This could indicate either less contribution from these structures than assumed, or other structures could be impaired in the case of Alzheimer's disease, as described in Avila et al. (2015). Yassa (2014) describes the entorhinal cortex to be very susceptible in Alzheimer's disease and Figure 10 shows that impairing this region leads to memory decay. Finally, when replaying representations of the hippocampus in the neocortex, the same issues arose when attempting learning with either Hebbian learning or with the Neural Network Vector Prediction. Fiebig and Lansner (2014) describe, replay originating from the hippocampus reaches cortical areas to induce Long Term Potentiation, for which Hebbian learning would be the best solution. However, with this method learning was not successful and interleaved learning was problematic.
4.1.2 Biological Plausibility First of all, although many similarities anatomical structure of the rodent, primate and human (Squire 1992), there are dierence which are not considered in the model. Besides, Table 1 actually concerns the size of the entorhinal cortex instead of the parahippocampal gyrus, compared to the dentate gyrus, CA3 and CA1. However, the size which was used was sucient to allow learning and consolidation. The direct backprojection of the dentate gyrus could be criticized on anatomical grounds (Amaral et al. 2007). Furthermore, the Restricted Boltzmann Machine have a natural tendency to project back to the layer from which it receives activations. This meaning that the basic architecture is in disagreement with the generally accepted unidirectional ow in the hippocampal circuitry. However, tentative suggestions are made of alternative output areas (Freund and Buzsáki 1996) and non-unidirectional ow (Scharfman 2007). Another essential deviation of the medial temporal lobes anatomical structure is the emission of the projection of the entorhinal cortex (or parahippocampal gyrus) to the CA1 (Kohara et al. 2014). However, this is believed to be less strong than the input from the CA3 (Avila et al. 2015). 28
The orthogonality, measured with the Hamming distance, did not equal in the CA3 and the dentate gyrus as described in Myers and Scharfman (2011). This could be caused by the fact that all layers are fully connected to each other, which is dierent from the connection ratios of
4% (between the dentate gyrus and CA3) as in Myers and Scharfman (2011). Furthermore, the assumed functions for the linear transformation of the CA3, CA1 and dentate gyrus showed rather dierent results; the dentate gyrus showed the least orthogonalized representations. Yassa and Stark (2011) point out the limitation of fMRI in making specic statements; this technique is still unable to distinguish between CA3 and the dentate gyrus, which would make the ndings of Myers and Scharfman (2011) less plausible. Interpretation of the exact analogy between the structures in the model and anatomical structures must be approached with caution. Here, the dierent RBMs are interpreted as corresponding to a real structure of the hippocampal network. However, the dierent structures include neurons of dierent types with dierent characteristics (Aimone et al. 2011; Amaral et al. 2007). Nevertheless, the backprojection of the dentate gyrus could be interpreted as another circuit in general. There are still no biological ndings supporting RBM type of correlational learning (Davelaar 2011). Maguire (2014) describes ndings which suggest that the recall of autobiography memories does not necessarily involve the re-instantiation of the same neurons that were involved when learning was in progress. This would imply that in this case dierent learning principles would apply than Hebb proposed (Hebb 1949). In Figure 6 the decline of the ability to recollect learned stimuli and the increase of the ability to reproduce non-learned stimuli can be interpreted as the transition of episodic to semantic memory, when greater number of stimuli are perceived. These results are in line with the transformation hypothesis and the ndings of Copara et al. (2014). This theory describes that in the surrounding structures of the hippocampus, like the parahippocampal gyrus, general or semantic information is stored (Winocur et al. 2010). Lastly, the ability to complete partial input could also be interpreted as the ability to compensate in neurodevelopmental disorders with declarative memory (Ullman and Pullman 2015).
4.1.3 Computational issues According to Fiebig and Lansner (2014) a good model of memory should tackle the following four challenges; (i) replay (ii) working memory (iii) a temporal scope and (iv) catastrophic forgetting . Replay, dierent time scales and catastrophic forgetting for the hippocampus and the neocortex were addressed successfully, see Figure 7, 5 and 8, respectively. However, in the model it is assumed for simplicity that representations in the hippocampus do not change; the same representations are used over the whole course of learning in Figure 7 and 8. 29
The backpropagation mechanism of the hippocampus as in Káli and Dayan (2004) could have been included for the hierarchical hippocampal system. Including the replay mechanism in the hierarchical network would allow to investigate the exact role of the trisynpatic circuit in the model. Rennó-Costa et al. (2014) describe that the CA3 has some Attractor Network features, which is in line with the described recurrency within the CA3 regin in Kumaran and McClelland (2012) and McClelland (2013). In the model, it was not possible to allow recurrence within one layer, however, the hippocampus as a whole showed this kind of Attractor Network dynamics. Winocur et al. (2010) note that the hippocampus shows a decrease in activity and structural growth as cortical regions begin to form long-term memories. This is actually happening during learning in the model; the learning curve becomes less steep as the stimuli are learned, see Figure 4.
4.2 Underlying biological mechanisms In an unknown environment no stored hippocampal representation matches the activation pattern in the entorhinal cortex. This mismatch is believed to lead to a decreased activation in CA3 and CA1, which in turn leads to a decrease in acethylcholine; the hippocampus goes to learning mode (Meeter et al. 2004). After the stimuli are all perceived, as would happen during the day, the second part of the learning algorithm includes the reactivations of hippocampal representations, which is believed to happen during sleeping, especially with the short wave ripples (Dudai 2012). Sullivan et al. (2011) present results in which during a sharp wave ripple a fraction of the neurons in the CA3, CA1 and entorhinal cortex discharge synchronously. Error-driven learning is associated with theta-phase dynamics according to Norman and O'Reilly (2003); Norman (2010). In their computational framework the system is believed to constantly attempt to recall information which is relevant to the current situation Norman (2010).
4.3 Future Directions Finally, some suggestions for further research will be considered. First of all, as the hippocampus, and especially the CA1, has a central role in the brain (Mi²i¢ et al. 2014; Binicewicz et al. 2015) modelling of the role of the hippocampus in learning should involve even more areas. As mentioned above, Fiebig and Lansner (2014) propose that a working memory should be included in a model that addresses memory consolidation. This structure would function with a dierent time scale than the hippocampus and the neocortex Fiebig and Lansner (2014).
30
Becker (2005) discuss the dierent roles of the substructures of the hippocampus in retrieval and encoding. In the model, no distinction is made to examine the eects of learning rates of the dierent substructures of the hippocampus. On the other hand, the connections from the CA1 to the parahippocampal gyrus are updated too fast. The eect of dierent conditions and circumstances could have an important role on memory consolidation and learning in general. When addressing a more complete model of memory and consolidation also association dependent memory traces should be considered (Preston and Eichenbaum 2013). This kind of storage should also include schema dependent memory acquisition and the following of rules (Preston and Eichenbaum 2013). Moreover, the duration of consolidation is dependent on species and form of memory Preston and Eichenbaum (2013). Diana et al. (2007) describe the eects of uency manipulations such as subliminal masked priming, for which familiarity is showed to be sensitive. Besides, the role of the social setting on memory consolidation could be considered (Hitti and Siegelbaum 2014). Likely, McGaugh (2015) discuss the role of emotions on consolidation. Igarashi et al. (2014) suggests a mechanism in which the CA1 cells alternates between a state where it is primarily dependent on direct input from the entorhinal cortex and a state where the cells are controlled by the integrated input from the CA3. This mechanism could be implemented in future models. Maguire and Mullally (2013) and Maguire (2014) discuss new techniques to memory consolidation. With the techniques considered the role of the backprojections of the dentate gyrus to the parahippocampal gyrus and of the CA3 to the dentate gyrus could be investigated.
31
5 Reference list Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for boltzmann machines. Cognitive science, 9(1):147169. Aimone, J. B., Deng, W., and Gage, F. H. (2011). Resolving new memories: A critical look at the dentate gyrus, adult neurogenesis, and pattern separation. Neuron, 70(4):589596. Amaral, D. G., Scharfman, H. E., and Lavenex, P. (2007). The dentate gyrus: Fundamental neuroanatomical organization (dentate gyrus for dummies). Progress in Brain Research, 163:3790. Ashby, F. G. and Helie, S. (2012). The neurodynamics of cognition: A tutorial on computational cognitive neuroscience. Journal of Mathematical Psychology, 55(4):273289. Avila, J., Perry, G., Strange, B. A., and Hernandez, F. (2015). Alternative neural circuitry that might be impaired in the development of alzheimer disease. Frontiers in Neuroscience, 9. Bakker, A., Kirwan, C. B., Miller, M., and Stark, C. E. L. (2008). Pattern separation in the human hippocampal CA3 and dentate gyrus. Science, 319(5870):16401642. Becker, S. (2005). A computational principle for hippocampal learning and neurogenesis.
Hippocampus, 15(6):722738. Binicewicz, F. Z. M., van Strien, N. M., Wadman, W. J., van den Heuvel, M. P., and Cappaert, N. L. M. (2015). Graph analysis of the anatomical network organization of the hippocampal formation and parahippocampal region in the rat. Brain Structure and Function, pages 115. Brown, A. D. (2000). Spiking boltzmann machines. In Advances in Neural Information
Processing Systems 12: Proceedings of the 1999 Conference, volume 12, page 122. MIT Press. Byrne, P., Becker, S., and Burgess, N. (2007). Remembering the past and imagining the future: A neural model of spatial memory and imagery. Psychological Review, 114(2):340. Carpenter, G. A., Grossberg, S., and Arbib, A. M. (2003). Handbook of Brain Theory and
Neural Networks. The MIT press. Chadwick, M. J., Bonnici, H. M., and Maguire, E. A. (2014). CA3 size predicts the precision of memory recall. Proceedings of the National Academy of Sciences, 111(29):1072010725.
32
Chevaleyre, V. and Siegelbaum, S. A. (2010). Strong CA2 pyramidal neuron synapses dene a powerful disynaptic cortico-hippocampal loop. Neuron, 66(4):560572. Copara, M. S., Hassan, A. S., Kyle, C. T., Libby, L. A., Ranganath, C., and Ekstrom, A. D. (2014). Complementary roles of human hippocampal subregions during retrieval of spatiotemporal context. The Journal of Neuroscience, 34(20):68346842. Davelaar, E. J. (2011). Connectionist Models of Neurocognition And Emergent Behavior:
From Theory to Applications (Progress in Neural Processing). World Scientic Publishing Company. Della Sala, S. et al. (2010). Forgetting. Psychology Press. Diana, R. A., Yonelinas, A. P., and Ranganath, C. (2007). Imaging recollection and familiarity in the medial temporal lobe: A three-component model. Trends in Cognitive Sciences, 11(9):379386. Dudai, Y. (2012). The restless engram: Consolidation never ends. Annual Review of Neuro-
science, 35:227247. Duncan, K., Ketz, N., Inati, S. J., and Davachi, L. (2012). Evidence for area ca1 as a match/mismatch detector: A high-resolution fMRI study of the human hippocampus. Hip-
pocampus, 22(3):389398. Eichenbaum, H., Sauvage, M., Fortin, N., Komorowsk, R., and Lipton, P. (2012). Towards a functional organization of episodic memory in the medial temporal lobe. Neuroscience
Biobehavioral Review, 36(7):159716708. Fiebig, F. and Lansner, A. (2014). Memory consolidation from seconds to weeks: A threestage neural network model with autonomous reinstatement dynamics. Frontiers in Com-
putational Neuroscience, 8. Fischer, A. and Igel, C. (2012). An introduction to restricted Boltzmann machines. In Progress
in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 1436. Springer. Florian, C. and Roullet, P. (2004). Hippocampal CA3-region is crucial for acquisition and consolidation in Morris water maze task in mice. Behavioural Brain Research, 154(2):265 374. Freund, T. F. and Buzsáki, G. (1996). Interneurons of the hippocampus. Hippocampus, 6(4):347470. 33
Gluck, M. A., Meeter, M., and Myers, C. E. (2003). Computational models of the hippocampal region: Linking incremental learning and episodic memory. TRENDS in Cognitive Sciences, 7(6):269276. Goldowitz, D., White, W. F., Steward, O., Lynch, G., and Cotman, C. (1975). Anatomical evidence for a projection from the entorhinal cortex to the contralateral dentate gyrus of the rat. Experimental Neurology, 47(3):433441. Greve, A., Donaldson, D. I., and van Rossum, M. C. W. (2010). A single-trace dual-process model of episodic memory: A novel computational account of familiarity and recollection.
Hippocampus, 20(2):235251. Hebb, D. O. (1949). The Organization of Behavior. Wiley & Sons, New York. Hinton, G. (2010). A practical guide to training restricted boltzmann machines. Momentum, 9(1):926. Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). The "weak-sleep" algorithm for unsupervised neural networks. Sciences, 268(5241):11581161. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:15271554. Hitti, F. L. and Siegelbaum, S. A. (2014). The hippocampal ca2 region is essential for social memory. Nature, 508(7494):8892. Igarashi, K. M., Ito, H. T., Moser, E. I., and Moser, M.-B. (2014). Functional diversity along the transverse axis of hippocampal area ca1. FEBS letters, 588(15):24702476. Káli, S. and Dayan, P. (2000). Hippocampally-dependent consolidation in a hierarchical model of neocortex. In In Proceedings of Neural Information Processing Systems 2000. p 2430. Káli, S. and Dayan, P. (2004). O-line replay maintains declarative memories in a model of hippocampal-neocortical interactions. Nature Neuroscience, 7(3):286294. Kandel, E. R., Dudai, Y., and Mayford, M. R. (2014). The molecular and systems biology of memory. Cell, 157(1):163186. Kesner, R. P. and Rolls, E. T. (2015). A computational theory of hippocampal function, and tests of the theory: New developments. Neuroscience & Biobehavioral Reviews, 48:92147.
34
Kohara, K., Pignatelli, M., Rivest, A. J., Jung, H.-Y., Kitamura, T., Suh, J., Frank, D., Kajikawa, K., Mise, N., Obata, Y., et al. (2014). Cell type-specic genetic and optogenetic tools reveal hippocampal CA2 circuits. Nature Neuroscience, 17(2):269279. Kumaran, D. and McClelland, J. L. (2012). Generalization through the recurrent interaction of episodic memories: A model of the hippocampal system. American Psychological
Association, 119(3):573616. Leutgeb, J. K., Leutgeb, S., Moser, M.-B., and Moser, E. I. (2007). Pattern separation in the dentate gyrus and CA3 of the hippocampus. Science, 315(5814):961966. Leutgeb, S., Leutgeb, J. K., Treves, A., Moser, M.-B., and Moser, E. I. (2004). Distinct ensemble codes in hippocampal areas CA3 and CA1. Science, 305(5688):12951298. Lörincz, A. and Buzsáki, G. (2000). Two-phase computational model training long-term memories in the entorhinal-hippocampal region. Annals New York Academy of Sciences, 911:83111. Maguire, E. A. (2014). Memory consolidation in humans: New evidence and opportunities.
Experimental Physiology, 99(3):471486. Maguire, E. A. and Mullally, S. L. (2013). The hippocampus: A manifesto for change. Journal
of Experimental Psychololgy: General, 142(4):11801189. McClelland, J. L. (2013). Incorporating rapid neocortical learning of new schema-consistent information into complementary learning systems theory. Journal of Experimental Psychol-
ogy: General, 142(4):1190. McClelland, J. L., McNaughton, B. L., and O'Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. American Psychological As-
sociation, 102(3):419457. McGaugh, J. L. (2015). Consolidating memories. Annual Review of Psychology, 66:124. Meeter, M., Jehee, J., and Murre, J. M. J. (2007). Neural models that convince: Model hierarchies and other strategies to bridge the gap between behavior and the brain. Philosophical
Psychology, 20(6):749772. Meeter, M., Murre, J. M. J., and Talamini, L. M. (2004). Mode shifting between storage and recall based on novelty detection in oscillating hippocampal circuits. Hippocampus, pages 722741. 35
Mihalik, J. and Labovsky, R. (2000). Neural network approaches for predictive vector quantization of an image. Neural Network World, 11(1):3348. Mi²i¢, B., ni, J. G., Betzel, R. F., Sporns, O., and McIntosh, A. R. (2014). A network convergence zone in the hippocampus. PLoS Computational Biology, 10(12):110. Mizumori, S. J. Y. (2013). Context prediction analysis and episodic memory. Frontiers in
Behavioral Neuroscience, 7:110. Montavon, G. and Müller, K.-R. (2012). Deep Boltzmann machines and the centering trick. In Neural Networks: Tricks of the Trade, pages 621637. Springer. Myers, C. E. and Scharfman, H. E. (2011). Pattern separation in the dentate gyrus: A role for the CA3 backprojection. Hippocampus, 21(11):11901215. Nadel, L., Hupbach, A., Gomez, R., and Newman-Smith, K. (2012). Memory formation, consolidation and transformation. Neuroscience & Biobehavioral Reviews, 36(7):16401645. Nadel, L. and Moscovitch, M. (1997). Memory consolidation, retrograde amnesia and the hippocampal complex. Current Opinion in Neurobiology, 7(2):217227. Norman, K. A. (2010). How hippocampus and cortex contribute to recognition memory: Revisiting the complementary learning systems model. Hippocampus, 20(11):12171227. Norman, K. A. and O'Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary-learning-systems approach. Psychological
Review, 110(4):611. O'Reilly, R. C., Bhattacharyya, R., Howard, M. D., and Ketz, N. (2011). Complementary learning systems. Cognitive Science, 38(6):12291248. Preston, A. and Eichenbaum, H. (2013). Current Biololgy, 23(17):764773. Rennó-Costa, C., Lisman, J. E., and Verschure, P. F. (2014). A signature of attractor dynamics in the CA3 region of the hippocampus. PLoS Computational Biology, 10(5):e1003641. Rolls, E. T. (2013). A quantitative theory of the functions of the hippocampal CA3 network in memory. Frontiers in Cellular Neuroscience, 7. Roxin, A. and Fusi, S. (2013). Ecient partitioning of memory systems and its importance for memory consolidation. PloS Computational Biology, 9(7):113. Scharfman, H. E. (2007). The CA3 "backprojection" to the dentate gyrus. Progress in Brain
Research, 163:627637. 36
Schlichting, M. L., Zeithamova, D., and Preston, A. R. (2014). CA1 subeld contributions to memory integration and inference. Hippocampus, 24(10):12481260. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the microstructure of cognition,
Volume 1: Foundations, chapter 6. The MIT Press. Squire, L. R. (1992). Memory and the hippocampus: A synthesis from ndings with rats, monkeys, and humans. Psychological Review, 99(2):195. Sullivan, D., Csicsvari, J., Mizuseki, K., Montgomery, S., Diba, K., and Buzsáki, G. (2011). Relationships between hippocampal sharp waves, ripples, and fast gamma oscillation: Inuence of dentate and entorhinal cortical activity. The Journal of Neuroscience, 31(23):86058616. Sutherland, R. J., Sparks, F. T., and Lehmann, H. (2010). Hippocampus and retrograde amnesia in the rat model: A modest proposal for the situation of systems consolidation.
Neuropsychologia, 48(8):23572369. Szirtes, G., Póczos, B., and Lörincz, A. (2005). Neural kalman lter. Neurocomputing, 65:349 355. Takehara-Nishiuchi, K. and McNaughton, B. L. (2008). Spontaneous changes of neocortical code for associative memory during consolidation. Science, 322(5903):960963. Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., and Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821):76 82. Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., and Morris, R. G. (2011). Schema-dependent gene activation and memory encoding in neocortex.
Science, 333(6044):891895. Ullman, M. T. and Pullman, M. Y. (2015). A compensatory role for declarative memory in neurodevelopmental disorders. Neuroscience & Biobehavioral Reviews, 51:205222. van Strien, N., Cappaert, N., and Witter, M. (2009). The anatomy of memory: An interactive overview of the parahippocampal-hippocampal network. Nature, 10:272282. Werning, M. and Cheng, S. (2014). Is episodic memory a natural kind? A defense of sequence analysis. In 36th Annual Meeting of the Cognitive Science Society, Quebec City, Canada, pages 2326.
37
White, W. F., Goldowitz, D., Lynch, G., and Cotman, C. W. (1976). Electrophysiological analysis of the projection from the contralateral entorhinal cortex to the dentate gyrus in normal rats. Brain Research, 114(2):201209. Wilke, S. A., Raam, T., Antonios, J. K., Bushong, E. A., Koo, E. H., Ellisman, M. H., and Ghosh, A. (2014). Specic disruption of hippocampal mossy ber synapses in a mouse model of familial Alzheimer's disease. PloS One, 9(1):e84349. Winocur, G. and Moscovitch, M. (2011). Memory transformation and systems consolidation.
Journal of the International Neuropsychological Society, 17(05):766780. Winocur, G., Moscovitch, M., and Bontempi, B. (2010).
Memory formation and long-
term retention in humans and animals: Convergence towards a transformation account of hippocampalneocortical interactions. Neuropsychologia, 48(8):23392356. Yassa, M. A. (2014). Ground zero in Alzheimer's disease. Nature Neuroscience, 17(2):146147. Yassa, M. A. and Stark, C. E. L. (2011). Pattern separation in the hippocampus. Trends in
Neurosciences, 34(10):515525. Yonelinas, A. P., Aly, M., Wang, W.-C., and Koen, J. D. (2010). Recollection and familiarity: Examining controversial assumptions and new directions. Hippocampus, 20(11):11781194. Zola-Morgan, S., Squire, L. R., and Amaral, D. (1986). Human amnesia and the medial temporal region: Enduring memory impairment following a bilateral lesion limited to eld CA1 of the hippocampus. The Journal of Neuroscience, 6(10):29502967.
38
A Technical Specications A.1 Learning Rules A Neural Network Vector Predictor consists of three layers; input, hidden and expected (output). The Neural Network Vector Prediction learning rules use that vector x has to be linked to an expected vector x ˆ. Just as for the RBM, the total input to one neuron nj is:
nj =
X
xi ∗ wij
(4)
i
with weight matrix w and input vector x. De activation function is as in Equation (1). These are the activations in the hidden layer. The activations in the output layer are calculated in the same manner with Equation (4) with the activations in the hidden layer. The updates of the connections are calculated when a vector was propagated from the input to the output layer. After a vector, with prediction vector x ˆ, was show for t times to the network, the learning rules become for t+1: t−1 t+1 t t ) − vjk + ηhj (xk − x ˆk )ˆ xk (1 − x ˆk ) + µ(vjk = vjk vjk
t+1 t wij = wij + ηxi hj (1 − hj )
X t−1 t (xk − x ˆk )ˆ xk (1 − x ˆk ) + µ(wij − wij )
(5)
(6)
k
with the v and w the weight matrices for the connections hidden-output and input-hidden respectively, h the hidden layer activations, x the actual activations in the output layer and η and µ learning rates.
A.2 Multiple Perceptrons def learn(self, dataset, outputset): for i,classifier in enumerate(self.Classifier_list): output_i = list(outputset[:,i]) try: classifier.fit(dataset, output_i) except: self.Classifier_list[i] = [output_i[0]]*(dataset.shape[0]-1)
The except is used when all the labels are either 0 or 1; then the 'Perceptron' should return always 0 or 1.
39
B Additional Results
Figure 15: Anterograde amnesia as result of gradual impairment of the CA1
Figure 16: Anterograde amnesia as result of gradual impairment of the CA3
40