Evolving an Optimal De/Convolution Function for the Neural Net Modules of ATR's Arti cial Brain Project Hugo de Garis1 , Norberto Eiji Nawa1, Michael Hough2 and Michael Korkin3 1
2
Evolutionary Systems Dept., ATR - Human Information Processing Research Laboratories, 2-2 Hikari-dai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan fxnawa,
[email protected], http://www.hip.atr.co.jp/fxnawa,degarisg Computer Science Department Stanford University, Stanford, 94305, CA, USA.
[email protected], http://www.stanford.edu/mhough 3 Genobyte, Inc., 1503 Spruce Street, Suite 3, Boulder CO 80302, USA
[email protected], http://www.genobyte.com
Abstract. This paper reports on eorts to evolve an optimum de/convo-
lution function to be used to convert analog to binary signals (spike trains) and vice versa for the binary input/output signals of the neural net circuit modules evolved at electronic speeds by the so-called "CAMBrain Machine (CBM)" of ATR's Arti cial Brain Project [1, 2, 3]. The CBM is an FPGA based piece of hardware which will be used to evolve tens of thousands of cellular automata based neural network circuits or modules at electronic speeds in about a second each, which are then downloaded into humanly architected arti cial brains in a large RAM space [2, 3]. Since state-of-the-art programmable FPGAs constrained us to use 1 bit binary signaling in our neural model (the "CoDi-1Bit" model [4]), an ecient de/convolution technique is needed to convert digital signals to analog and vice versa, so that "evolutionary engineers" (EEs) who evolve the many modules, can think it terms of analog signals when they need to, rather than in terms of the abstract incomprehensible spike trains (where the information in the signal is contained in the spacing between spikes). An earlier convolution function, digitized from a text book gure [7] (shown in Fig. 1), gave only moderate accuracy between actual and desired output results. By applying a genetic algorithm to the evolution of the de/convolution function we were able to make it nearly twice as accurate. Accuracy is important so as to reduce cumulative errors when the output of one neural net module becomes the input of another, in long sequential chains of modules in arti cial brain architectures consisting of 10,000s of modules. The CBM can handle up to 32000 modules (of maximum about 1000 arti cial neurons each) and will be delivered to ATR by the spring of 1999.
1 Introduction This paper reports on our eorts to evolve an optimum de/convolution function which is used to convert analog to binary signals (spike trains) and vice versa for the neural net modules evolved by ATR's CAM-Brain Machine (CBM) [5]. The
CBM is an FPGA based piece of specialist hardware which grows and evolves cellular automata (CA) based neural network circuits (modules) at electronic speeds. It updates the CA at a rate of 150 Billion a second and can evolve a neural net module (i.e. a complete run of a genetic algorithm (GA), with tens of thousands of circuit growths and tness evaluations) in about a second. However, the constraints imposed by the FPGA chips used in the CBM are such that only 1 bit neural signaling is possible. Hence an interpretation problem arose. What do the 1 bit signals entering and exiting the modules mean? We tried various interpretations (representations) [6] and eventually chose a convolution approach, where the spike train (a binary string) is convoluted with a "convolution function" that we digitized from a gure in a text book [7]. (See Fig. 1, where the horizontal axis shows the number of clock cycles, and the vertical axis shows the values of the convolution function). The resulting convoluted analog waveforms were only moderately accurate (i.e. the actual waveform convoluted from a spike train evolved with a simulation of the CBM diered from the target waveform by roughly 10%, which was thought not to be particularly good. Hence we were motived to see if a more accurate de/convolution function could be evolved. The reason it is important to get greater accuracy, is because if each module has a margin of error of about 10%, then when the time comes to assemble tens of thousands of modules into arti cial brains, in hierarchical modular structures, the errors will accumulate as they pass sequentially from one module to another, to the point of rendering the arti cial brain inoperative. We therefore set out to see by how much we could improve the accuracy between the actual outputs and the target outputs. The success of the CAM-Brain Project (ATR's Arti cial Brain Project) may depend upon attaining a greater accuracy. Note, that there are two sources of error when evolving CA based neural nets using the CBM. One is due to the limitations of the de/convolution functions when users supply analog values as inputs and analog target outputs to the CBM. (The CBM itself uses purely digital inputs and outputs, so if the user wants to think in terms of analog waveforms, which is much easier to do than abstract spiketrains, then deconvolution and convolution transformations will be necessary). The other source of error is due to the quality of the evolved neural net module itself. In this paper we do not address ourselves to this second source of error. There have already been several papers published on the quality of evolved modules [2, 3]. In an attempt to reduce the rst kind of error, we took the following approach. An initial arbitrary sinusoidal analog wave form (curve) of the form shown in Fig. 2 (the smooth curve) was input into a deconvolution function (as explained in section 3 below) which resulted in a spike train (bit string) output (Fig. 3). The deconvolution (spiking) algorithm was based on a convolution function which did the reverse, namely converted the spike train back to an analog output (as explained in section 2). In theory, the result of the nal convolution output result should be the same as the original analog input wave form. This two step approach is shown in Fig. 4. A genetic algorithm was used to evolve the digitized values of the convolution function (an analog waveform) which gave the
best matching to the target waveform (i.e. the original input analog waveform). 70
60
50
40
30
20
10
0
−10
−20
−30
0
5
10
15
20
25
30
35
40
45
50
Fig. 1. The original convolution function, 48 cycles wide.
450
400
350
300
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Fig. 2. An arbitrary sinusoidal analog test wave form (smooth curve).
111100010001101111110100010111110110100010101110100100010011010100 100010101010100101001010110001101010011001101011010101011101110101101
Fig. 3 The spiketrain (bitstring) resulting from deconvolution of Fig. 2.
Analog Curve --> Deconvoluted to a Spiketrain --> Convoluted back to the Original Analog Curve
Fig. 4 The deconvolution/convolution sequence. The remainder of this paper consists of the following sections. Section 2 explains the convolution approach we use (called "SIIC" = "Spike Interval Information Coding"), which converts a spiketrain (a bitstring) into an analog waveform. Section 3 explains the reverse process, i.e. deconvolution (or spiking) which converts an analog waveform into a spiketrain. Our deconvolution algorithm (called "HSA" = "Hough Spiker Algorithm" [3]) uses the digitized values of the SIIC convolution function. Section 4 explains how we used these two operations to optimize the digitized values plus the "window width" of the convolution function, which when incorporated into the deconvolution function, gave the closest t between the original analog input wave form and the nal output analog wave form. Section 4 also reports on the experimental results. Having obtained what we thought was a reasonably optimal de/convolution function, we tested it in section 5 to its limits with other, more "demanding" waveforms, e.g. having higher and higher frequencies, until the de/convolution function could no longer cope. Section 6 summarizes and oers some nal remarks.
2 Convolution with the "SIIC" (Spike Interval Information Coding) Representation The constraints imposed by state-of-the-art programmable (evolvable) FPGAs in 1998 are such that the CA based model (the CoDi model) had to be very simple in order to be implementable within those constraints. Consequently, the neural signaling states in the model were made to contain only 1 bit of information (as happens in nature's "binary" spike trains). The problem then arose as to interpretation. How were we to assign meaning to the binary pulse streams (i.e. the clocked sequences of 0s and 1s which are a neural net module's inputs and outputs? We tried various ideas such as a frequency based interpretation, i.e. count the number of pulses (i.e. 1s) in a given time window (of N clock cycles). But this was thought to be too slow. In an arti cial brain with tens of thousands of modules which may be vertically nested to a depth of 20 or more (i.e. the outputs of a module in layer "n" get fed into a module in layer "n+1", where "n" may be as large as 20 or 30) then the cumulative delays
may end up in a total response time of a robot kitten (that the CAM-Brain is to control) being too slow (e.g. if you wave your nger in front of its eye, it might react many seconds later). We wanted a representation that would deliver an integer or real valued number at each clock tick, i.e. the ultimate in speed. The rst such representation we looked at we called "unary" i.e. if N neurons on an output surface are ring at a given clock tick, then the ring pattern represented the integer N, independently of where the outputs were coming from. We found this representation to be too stochastic, too jerky. Ultimately we chose a representation which convolves the binary pulse string with the convolution function shown in Fig. 1. We call this representation "SIIC" (Spike Interval Information Coding) which was inspired by [7]. This representation delivers a real valued output at each clock tick, thus converting a binary pulse string into an analog time dependent signal. Our team has already published several papers on the results of this convolution representation work [6]. The CBM will implement this representation in the FPGAs when measuring tness values at electronic speeds.
2.1 Details The convolution algorithm we use takes the output spiketrain (a bit string of 0s and 1s) from a CBM module, and runs the pulses (i.e. the 1s) by the convolution function shown in the simpli ed example below. The output at any given time t is de ned as the sum of those samples of the convolution lter that have a 1 in the corresponding spiketrain positions. The example below should clarify what is meant by this.
Simpli ed Example Convolve the spiketrain 1101001 (where the left most bit is the earliest, the right most bit, the latest) using the convolution lter values f 1 4 9 5 -2 g. The spiketrain in this diagram moves from left to right across the convolution lter. Alternatively, one can view the convolution lter (window) moving across the spiketrain. The number to the right of the colon (:) shows the value of the convolution sum at each time t. time-shifted spike train : 1 0 0 1 0 1 1 ---> (moves left to right) convolution filter : 1 4 9 5 -2 1
0
1
0
0
1
1
0
0
0
1
0
1
0
1
1
1
0
0
0
0
0
0
: 0
t = -1
1 1
0
0
0
0
: 1
t = 0
1 1
1 4
0
0
0
: 5
t = 1
1
0
1
0
0
1
1
0
0
1
0 0
1 4
1 9
0
0
: 13
t = 2
1 1
0 0
1 9
1 5
0
: 15
t = 3
0 0
1 4
0 0
1 1 5 -2
: 7
t = 4
0 0
0 0
1 9
0 1 0 -2
: 7
t = 5
1 1
0 0
0 0
1 5
0 0
: 6
t = 6
0
1 4
0 0
0 1 0 -2
: 2
t = 7
0
0
1 9
0 0
0 0
: 9
t = 8
0
0
0
1 5
0 0
: 5
t = 9
0
0
0
1 0 -2
: -2
t = 10
Hence, the time-dependent output of the convolution lter takes the values (0, 1, 5, 13, 15, 7, 7, 6, 2, 9, 5, -2). This is a time varying analog signal, which is the desired result.
3 Deconvolution with the "HSA" (Hough Spiker Algorithm) Section 2 above explained the use of the SIIC (Spike Interval Information Coding) representation which provides an ecient transformation of a spike train (i.e. string of bits) into a (clocked) time varying "analog" signal. We need this interpretation in order to interpret the spike train output from the CoDi modules to evaluate their tness values (e.g. by comparing the actual converted analog output waveforms with user speci ed target waveforms). However, we also need the inverse process, i.e. an algorithm which takes as input, a clocked (digitised, i.e. integer numbered) time varying "analog" signal, and outputs a spike train. This conversion is needed as an interface between the motors/sensors of the robot bodies (e.g. a kitten robot) that the arti cial brain controls, and the brain's CoDi modules. However, it is also very useful to users, i.e. EEs (evolutionary
engineers) to be able to think entirely in terms of analog signals (at both the inputs and outputs) rather than in abstract, visually unintelligible spiketrains. This will make their task of evolving many CoDi modules much easier. We therefore present next an algorithm which is the opposite of the SIIC, namely one which takes as input, a time varying analog signal, and outputs a spike train, which if later is convoluted with the SIIC convolution lter, should result in the original analog signal. A brief description of the algorithm used to generate a spiketrain from a time varying analog signal is now presented. It is called the "Hough Spiker Algorithm" (HSA) [3] and can be viewed as the inverse of the convolution algorithm described above in section 2. To give an intuitive feel for this deconvolution algorithm, consider a spiketrain consisting of a single pulse (i.e. all 0s with one 1). When this pulse passes through the convolution function window, it adds each value of the convolution function to the output in turn. A single pulse: (100000 : : : ! t = +1) will be convoluted with the convolution function expressed as a function of time. At t = 0 its value will be the rst value of the convolution lter, at t = 1 its value will be the second value of the convolution lter, etc. Just as a particular spiketrain is a series of spikes with time delays between them, so too will the convolved spiketrain be the sum of the convolution lters, with (possibly) time delays between them. At each clock tick when there is a spike, add the convolution lter to the output. If there is no spike, just shift the time oset and repeat. The same example. spike train 1 1 0 1 0 0 1 convolution filter 1 4 9 5 -2 t -> 0 1 2 3 4 5 6 7 8 9 10 out: 1 1 4 9 5 -2 1 1 4 9 5 -2 0 0 0 0 0 0 1 1 4 9 5 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 1 4 9 5 -2 ------------------------------1 5 13 15 7 7 6 2 9 5 -2
In the HSA deconvolution algorithm, we take advantage of this summation, and in eect do the reverse, i.e. a kind of progressive subtraction of the convolution function. If at a given clock tick, the values of the convolution function are less than the analog values at the corresponding positions, then subtract the convolution function values from the analog values. The justi cation for this is that for the analog values to be greater than the convolution values, implies
that to generate the analog signal values at that clock tick, the CoDi module must have red at that moment, and this ring contributed the set of convolution values to the analog output. Once one has determined that at that clock tick, there should be a spike, one subtracts the convolution function's values, so that a similar process can be undertaken at the next clock tick. For example, to deconvolve the convolved output (using the same values of the convolution function as in the simple example of the previous section. 1 5 13 15 compare: 1 0 compare: 0 compare: 0 compare: 0 compare: 0 compare: 0 compare: 0
7 4 1 1 0 0 0 0 0 0
7 6 2 9 5 -2 9 5 -2 conv.valsDUR) break;
if(wave[i+j]< conv_fn[j]) error+=conv_fn[j]-wave[i+j]; } /* if error is okay, SPIKE! */ if(error< thresh) { bits[i]=1; for(j=0;j< N_CONV;j++) { if(i+j>DUR) break; wave[i+j]-=conv_fn[j]; } } else bits[i]=0; } }
Figs 2 and 3 show an example of the HSA in action. The original input analog signal is the smooth curve in Fig. 2. The spiketrain output generated by the HSA deconvolutor (the HSA "spiker") is shown in Fig. 3. The spiketrain resulting from each analog input is sent into the SIIC convolutor (whose convolution function is shown in Fig. 1). The resulting analog output should be very close to the original analog input curve. The jerky curve in Fig. 2 is the resulting output curve. The HSA works well when the values of the waveforms are large (e.g. greater than 200, and less than 1000) and do not change too quickly relative to the time width of the convolution lter window. One can simply add a constant value to incoming analog signals before spiking them, and ensure that the analog signals do not change too rapidly (i.e. one can lter out the high frequency components of signals with a low frequency band pass lter).
4 Evolving the Optimal De/Convolution Function, Experimental Results How to evolve a more accurate de/convolution function? The initial approach we tried was to evolve the integer values of the digitized convolution function of the type shown in Fig. 1. for xed window widths (e.g. 48 in Fig. 1). We initialised a population of integer strings (actually, each integer was a constant of 70) which were the integer values of the digitized convolution function, and used the average percentage error of the actual analog output and the target analog output as the genetic algorithm tness function for each member of the GA population. These tness scores served as the basis for comparison to see if we could improve upon these scores by evolving the convolution function for progressively smaller sizes of its window width. To do so, we arbitrarily constructed two sinusoidal type wave forms namely : F1 (t) = 300 + 100 sin(
2t 2t 2t 2t 100 ) + 50 sin( 80 ) + 30 sin( 70 ) ? 20 sin( 60 )
(1)
F2 (t) = 300 ? 100 cos(
2t ) + 50 sin( 2t ) + 30 sin( 2t ) ? 20 sin( 2t ) 80 70 60 40
(2)
and evolved the values of the convolution function so that the average error (between actual and target wave form values) over these two wave forms was least. Since the computational load per window width case was fairly light, we were able to run through 3000 generations of the GA in less than a minute on an IBM Thinkpad 133 MHz machine, so this seemed a practical idea. The GA chromosome representation was a list of positive integers (the HSA deconvolution algorithm does not work with negative values, hence we simply added a large constant to keep everything positive), one per digitized value of the convolution function. We chose the following GA parameter values and genetic operators :- population size of 17, no crossover, the mutation operators were integer incrementors and decrementors of the integer convolution values. A random position along the convolution values was generated and with a 50/50 probability, the convolution value was incremented or decremented by 1. The integer convolution values list was used in the convolution function and the deconvolution function to perform the experiment. The tness de nition was the inverse of the sum of the absolute dierences between the actual output values and the target values (the values of the original wave function). Table 4 shows the percentage errors for various window lengths averaged over the two curves. The third column shows a truncated percent error, meaning that the rst dozen or so output values of the 136 clock cycles used were ignored. The output curve, generated by the convolution function takes time to climb to the target curve generating excessive errors. See the jerky curve of Fig. 2 for an example of this climbing phenomenon.
Table 1. Percentage Errors for Various Window Widths Window Width Percent error Percent error truncated 48 5.61 2.25 44 5.32 2.03 40 5.30 2.12 36 5.12 2.07 32 5.08 1.66 28 5.13 1.78 24 5.52 2.36 20 4.95 1.46 16 5.63 2.82 12 5.66 3.35 8 9.04 7.37 4 14.44 13.86
We chose the convolution function with window width of 20. Its truncated percentage error was only about 1.5%. Its convolution function took the values (8, 16, 26, 35, 44, 52, 59, 64, 65, 64, 61, 57, 52, 45, 37, 29, 21, 13, 7, 4) depicted graphically in Fig. 3. 70
60
50
40
30
20
10
0
0
2
4
6
8
10
12
14
16
18
20
Fig. 3. The evolved convolution function, 20 cycles wide. Now armed with this evolved convolution function we then tested it against the original 48 width convolution function on a pair of new arbitrary test wave forms as shown below. F3 (t) = 300 + 100 sin(
2t 2t 2t 2t 100 ) ? 45 sin( 80 ) + 35 sin( 70 ) ? 20 cos( 60 )
(3)
2t ) + 50 sin( 2t ) + 30 sin( 2t ) + 20 sin( 2t ) 80 70 65 40
(4)
F4 (t) = 300 + 100 cos(
Table 4 shows the results of the comparison between the two cases, the original 48 width convolution function taken from the text book, and the evolved 20 width convolution function. Accuracy improved with the evolved convolution function by a factor of nearly two, thus justifying the eort. From now on, we will be using the evolved convolution function. Fig. 2 above shows the target and actual output for the waveform F3 , using the evolved convolution function.
Table 2. Percentage Error Comparison between Textbook and Evolved Cases Window Width Percent error Percent error truncated 48 (Textbook Convolution Function) 8.42 4.36 20 (Evolved Convolution Function) 6.25 2.27
5 Further Tests Having obtained what seemed like a reasonably optimal convolution function of window width 20, this function was then applied to various sinusoidal wave forms of increasing frequency, to see how it coped. Figs. 4 to 7 show how well the 20 window width convolution function coped with tracing a sinusoid of the form Y (t) = 300+100 sin( 2Xt ), where X took values, 100, 60, 20, 10. The gures show that the resulting output wave forms approximated the initial wave forms quite closely. The truncated percentage errors (i.e. the percentage dierence between the actual and the target waveforms averaged over the number of clock cycles, 136) were in the 4 cases ( 3.13%, 3.26%, 2.33%, 29.44%). As expected, the case of X = 10, was bad, since the frequency of the waveform changed faster than the window width of the convolution function. However, one can see that the other cases are quite good, showing that the convolution function seems to give quite good results up to frequencies whose wavelengths are equivalent to the window width of the convolution function. Presumably, one could trade o the quality of tracing by lowering the window width, so as to cope with higher frequency wave forms. We felt that a window width of only 20 clock cycles, is adequate for our purposes in the CAM-Brain project.
6 Summary and Final Remarks State-of-the-art programmable (and hence evolvable) hardware constrained us to use a 1 bit neural signaling model (the CoDi-1Bit model [4]) in the neural net modules we will be evolving at electronic speeds in our CAM-Brain Machine (CBM) [5] (starting February of 1999). Immediately an interpretation problem arose. What do the bitstring "spiketrains" which enter and leave the CBM neural net modules mean? For the user (the "evolutionary engineer") spiketrains are often rather useless and uninterpretable. It would be very useful to have a means of converting spiketrains to time dependent analog waveforms and vice versa so that the user has the option to think in terms of visualisable analog waveforms or constants. This paper shows how the dual conversion is possible using a convolution process we call SIIC (Spike Interval Information Coding [6]), and its inverse, the HSA (Hough Spiker Algorithm [3]). Once these two transformations were discovered and shown to work, the next question which arose
400
350
300
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Fig. 4. Target and actual sinusoid of wavelength 100 cycles (error 3.13%). was which convolution function to use. At rst we simply digitized a convolution function taken from a text book [7]. We began with a window width of 48 clock cycles and got quite good results, with percentage errors between actual and target waveforms of around 4%. By applying a genetic algorithm to the evolution of a convolution function of various window widths, we managed to reduce the width from 48 to 20, and nearly doubled the accuracy. When the time comes to assemble large numbers of evolved neural net modules (maximum 32000 in the CBM) into arti cial brains, the accuracy of each module's output is important, so as to reduce cumulative errors in sequential chains of modules, where the output of one module becomes the input of another. In practice, once a module is evolved, its spiketrain output will be fed directly to become the spiketrain input of another module, without any conversion. But, to evolve the module in the rst place, it is often the case that the evolutionary engineer will use analog input and output waveforms. Hence the spiketrains that correspond to the analog waveforms should correlate closely. Hence the optimization work undertaken in this paper, is important for the CAM-Brain Project.
References 1. Hugo de Garis. An arti cial brain : ATR's cam-brain project aims to build/evolve an arti cial brain with a million neural net modules inside a trillion cell cellular automata machine. New Generation Computing Journal, 12(2), July 1994. 2. Hugo de Garis, Felix Gers, Michael Korkin, Arvin Agah, and Norberto Eiji Nawa. Building an arti cial brain using an fpga based 'CAM-brain machine'. Arti cial Life and Robotics Journal, 1999. to appear.
400
350
300
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Fig. 5. Target and actual sinusoid of wavelength 60 cycles (error 3.26%) 3. Hugo de Garis, Michael Korkin, Felix Gers, Norberto Eiji Nawa, and Michael Hough. Building an arti cial brain using an fpga based 'CAM-brain machine'. Applied Mathematics and Computation Journal, Special Issue on Arti cial Life and Robotics, Arti cial Brain, Brain Computing and Brainware, 1999. to appear. 4. Felix Gers, Hugo de Garis, and Michael Korkin. Codi-1 Bit: A simpli ed cellular automata based neuron model. In Proceedings of AE97, Arti cial Evolution Conference, October 1997. 5. Michael Korkin, Hugo de Garis, Felix Gers, and Hitoshi Hemmi. CBM (CAMBrain Machine): A hardware tool which evolves a neural net module in a fraction of a second and runs a million neuron arti cial brain in real time. In John R. Koza, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max Garzon, Hitoshi Iba, and Rick L. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, July 1997. 6. Michael Korkin, Norberto Eiji Nawa, and Hugo de Garis. A 'spike interval information coding' representation for atr's CAM-brain machine (cbm). In Proceedings of the Second International Conference on Evolvable Systems: From Biology to Hardware (ICES'98). Springer-Verlag, September 1998. 7. Fred Rieke, David Warland, Rob de Ruyter van Steveninck, and William Bialek. Spikes: exploring the neural code. MIT Press/Bradford Books, Cambridge, MA, 1997.
This article was processed using the LaTEX macro package with LLNCS style
400
350
300
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Fig. 6. Target and actual sinusoid of wavelength 20 cycles (error 2.33%).
400
350
300
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Fig. 7. Target and actual sinusoid of wavelength 10 cycles (error 29.44%).